sets of models and prices of uncertainty
Post on 17-Jan-2016
232 Views
Preview:
DESCRIPTION
TRANSCRIPT
Sets of Models and Prices of Uncertainty
Lars Peter Hansen Thomas J. Sargent∗
March 8, 2015
Abstract
A representative consumer expresses distrust of a baseline probability model by using
a convex set of martingales as likelihood ratios to represent other probability models.
The consumer constructs the set to include martingales that represent particular
parametric alternatives to the baseline model as well as others representing only
vaguely specified models statistically close to the baseline model. The representative
consumer’s max-min expected utility over that set gives rise to equilibrium prices
of model uncertainty expressed as worst-case distortions to the drifts in his baseline
model. We calibrate a quantitative example to aggregate US consumption data.
Key words: Risk, uncertainty, uncertainty prices, Chernoff entropy, robustness, shock
price elasticities, affine stochastic discount factor
∗We thank Scott Lee, Botao Wu, and especially Lloyd Han and Paul Ho for carrying out the computa-tions.
1 Introduction
Specifying a set of probability distributions is an essential part of applying the Gilboa and
Schmeidler (1989) max-min expected utility model. This paper proposes a new way to
imagine that a decision maker forms that set and provides an application to asset pricing.
When a representative investor describes risks with a set of probability models, uncer-
tainty premia augment prices of exposures to those risks. We describe how our method for
specifying that set affects prices of model uncertainty.
Our experiences as applied econometricians attract us to robust control theory. We
always regard our own quantitative models as approximations to better models that we had
not formulated. This is also the attitude of the robust decision maker modeled in Hansen
and Sargent (2001) and Hansen et al. (2006). The decision maker has a single baseline
probability model with a finite number of parameters. He wants to evaluate outcomes
under alternative models that are statistically difficult to distinguish from his baseline
model. He expresses distrust of his baseline model by surrounding it with an uncountable
number of alternative models, many of which have uncountable numbers of parameters. He
represents these alternative models by multiplying the baseline probabilities with likelihood
ratios whose entropies relative to the baseline model are less than a bound that expresses
the idea that alternative models are statistically close to the baseline model.
The decision theory presented in this paper retains the starting point of a single baseline
model but differs from Hansen et al. (2006) in how it forms the set surrounding the baseline
model. A new object appears: a quadratic function of a Markov state that defines alter-
native parametric models to be included within a set of models surrounding the baseline
model. The decision maker wants valuations that are robust to these models in addition
to other “vaguely specified” models expressed as before by multiplying the baseline model
by likelihood ratios. The quadratic function can be specified to include alternatives to the
baseline model including ones with fixed parameters, time varying parameters, and other
less structured forms of model uncertainty.
For asset pricing, a key object that emerges from the analysis in Hansen and Sargent
(2010) is a vector of worst-case drift distortions to the baseline model. The negative of
the drift distortion vector equals the vector of market prices of model uncertainty that
compensate the representative investor for bearing model uncertainty. The effects that our
new object – the quadratic function indexing particular alternative models – has on mar-
ket prices of uncertainty are all intermediated through this drift distortion. We show how
1
the quadratic function can produce drift distortions that imply stochastic discount factors
resembling ones attained by earlier authors under different assumptions about the sources
of risks. For example, models that posit that a representative consumer’s consumption
process has innovations with stochastic volatility introduce new risk exposures in the form
of the shocks to volatilities. Their presence induces time variation in equilibrium compen-
sations for exposures to shocks that include both the stochastic volatility shock as well as
the “original” shocks whose volatilities now move. By way of contrast, we introduce no
stochastic volatility and no new risks. Instead, we amplify the prices of exposures to the
“original” shocks. We induce fluctuations in those prices by modeling how the representa-
tive consumer struggles to confront his doubts about the baseline model. We extend these
insights to the analysis of uncertainty prices over alternative investment horizons.
Section 2 describes a representative consumer’s baseline probability model and martin-
gale perturbations to it. Section 3 describes two convex sets of martingales that perturb
the baseline model. Section 4 uses one of these sets to form a robust planning problem that
generates a worst-case model that we use to calibrate key parameters measuring the size
of a convex set of models. Section 5 constructs a recursive representation of a competitive
equilibrium. Then it links the worst-case model that emerges the robust planning problem
to competitive equilibrium compensations that the representative consumer earns for bear-
ing model uncertainty. This section also describes a term structure of these market prices
of uncertainty. By borrowing from Hansen and Sargent (2010), section 6 describes a quan-
titative version of a baseline model as well as a class of models that particularly concern the
robust consumer and the robust planner. Section 7 uses the quantitative model to compare
the set of models that concern both our robust planner and our representative consumer
with two other sets featured in Anderson et al. (2003) and Hansen and Sargent (2010),
one based on Chernoff entropy, the other on relative entropy. Section 8 offers concluding
remarks. Six appendices provide technical details.
2
2 The model
2.1 Mathematical framework
A representative consumer cares about a stochastic process for Y.= {Yt : t ≥ ∞} described
by the baseline model1
d log Yt = (.01) (µ+Xt) dt+ (.01)α · dWt
dXt = φdt− κXtdt+ σ · dWt, (1)
where W is a multivariate Brownian motion, X is a scalar process initialized at a random
variable X0 governed by probability distribution Q, and µ+Xt+.012|α|2 is the date t growth
rate of Y expressed as a percent. The quintet (µ, κ, φ, α, σ, Q) characterizes the baseline
model.2
Because he doesn’t trust the baseline model, the consumer also cares about Y under
probability models obtained by multiplying probabilities associated with the baseline model
(1) by likelihood ratios. We represent a likelihood ratio by a stochastic process Zh that is
a positive martingale with respect to the baseline model and that satisfies3
dZht = Zh
t ht · dWt, (2)
or
d logZht = ht · dWt −
1
2|ht|
2dt, (3)
where h is adapted to the filtration F = {Ft : t ≥ 0} associated with the Brownian motion
W and satisfies ∫ t
0
|hu|2du <∞ (4)
with probability one. Imposing the initial condition Zh0 = 1, we express the solution of
stochastic differential equation (2) as:
Zht = exp
(∫ t
0
ht · dWt −1
2
∫ t
0
|hu|2du
). (5)
1We let X denote the stochastic process, Xt the process at date t, and x a realized value of the state.2In earlier papers, we sometimes referred to what we now call the baseline model as the decision maker’s
approximating model or benchmark model.3James (1992), Chen and Epstein (2002), and Hansen et al. (2006) used this representation.
3
When we want to allow Z0 to be drawn from an unknown probability distribution, we take
a Borel measurable function g > 0 satisfying Eg(X0) = 1 and consider martingales equal
to
g(X0)Zht .
Let G denote the collection of all such functions g(x). A pair (g, h) represents a perturbation
of the baseline model (1).
Definition 2.1. Z denotes the set of all martingales g(X0)Zh constructed via representa-
tion (5) with some process h adapted to F = {Ft : t ≥ 0} and satisfying (4) and g ∈ G.
We use a martingale g(X0)Zh to construct an alternative probability distribution as
follows. Starting from the probability distribution associated with the baseline model (1),
we use h to represent another probability distribution conditioned on F0. To do this, think
of taking any Ft-measurable random variable Yt and multiplying it by Zht before computing
expectations conditioned on X0. Associated with h are probabilities defined implicitly by
Eh [Bt|F0] = E[Zh
t Bt|F0
]
for any t ≥ 0 and any bounded Ft-measurable random variable Bt. Similarly, we write
EgB0 = E [g(X0)B0]
for any bounded random variable B0 in the date zero information set F0.
Here the positive random variable Zht acts as a “Radon-Nikodym derivative” for the
date t conditional expectation operator Eh [·|X0]. The martingale property of the process
Zh ensures that the conditional expectations operators for different t’s are compatible in
the sense that they satisfy a Law of Iterated Expectations. The random variable g(X0)
acts as a “Radon-Nikodym derivative” for the date zero unconditional distribution vis a
vis a baseline probability distribution Q over the date zero state vector X0.
While under the baseline modelW is a standard Brownian motion, under the alternative
h model distribution this process has increments
dWt = htdt+ dW ht , (6)
where W h is a standard Brownian motion. While (3) expresses the evolution of logZh in
4
terms of increment dW , the evolution in terms of dW h is:
d logZht = ht · dW
ht +
1
2|ht|
2dt. (7)
In light of (7), we can write model (1) as:
d log Yt = (.01) (µ+Xt) dt + (.01)α · ht + (.01)α · dW ht
dXt = φdt− κXtdt+ σ · ht + σ · dW ht ,
which implies that Y has a (local) growth rate µ+Xt+α ·ht+.012|α|2 under the h model.4
2.2 Quantifying Probability Distortions
Discounted relative entropy quantifies how a (g, h) pair distorts baseline model probabilities.
We construct discounted relative entropy in two steps. First, we condition on X0 = 0 and
focus solely on h; second, we focus on misspecifications of Q.
i) Our first step is to compute
(Zh; x) =δ
∫ ∞
0
exp(−δt)E(Zh
t logZht
∣∣∣X0 = x)dt
=
(1
2
)∫ ∞
0
exp(−δt)E(Zh
t |ht|2∣∣∣X0 = x
)dt (8)
where the second equality follows from an application of integration by parts. We
write as a function of Zh instead of h because is convex in Zh. The discounted
entropy concept (8) quantifies how h distorts baseline probabilities.5
ii) Our second step applies when we don’t condition on X0. Suppose that Q is the
stationary probability distribution for X under the baseline model and that g is the
density used to alter Q. We average over the initial state via
(g;Zh) ≡
∫(Zh; x)g(x)Q(dx) +
∫g(x) log g(x)dQ (9)
which includes a relative entropy penalty for the initial density g.
4The growth rate includes a multiplication by 100 that offsets one of the .01’s.5Hansen et al. (2006) used the representation of discounted relative entropy that appears on the right
side of the first line of (8).
5
3 Convex sets of models
Two convex sets that surround the baseline model are designed to include parametric
probability models that a decision maker cares about. One set can readily be used for
robust control problems, but the other cannot. Nevertheless, the second set is useful
because it generalizes Chernoff (1952) entropy to a Markov environment and thereby has
an explicit statistical interpretation. We are interested in how these two convex sets are
related.
3.1 Alternative parametric models
The following parametric model nests baseline model (1) within a bigger class:
d logCt = .01 (µ+Xt) dt+ .01α · dW ht
dXt = φdt− κXtdt+ σ · dW ht , (10)
where W h is a Brownian motion and (6) continues to describe the relationship between the
processes W and W h. Here (µ, φ, κ) are parameters of the baseline model (1), (µ, φ, κ) are
parameters of model (10), and (α, σ) are parameters common to both models. We want to
use drift distortions h for W to represent models in a parametric class defined by (10). We
can express model (10) in terms of our section 2.1 structure by setting
ht = η(Xt) ≡ η0 + η1Xt
and using (1), (6), and (10) to deduce the following restrictions on η0 and η1:
[α′
σ′
]η0 =
[µ− µ
φ− φ
]
[α′
σ′
]η1 =
[0
κ− κ
]. (11)
By maintaining restrictions (11), we can find pairs (η0, η1) that represent members of a
class of models having the parametric form (10). Among other possibilities, we can set
Q to assign all of the probability to an initial state or we can set it equal to the station-
ary distribution implied by the baseline model, in which case we can construct g so that
g(x)dQ(x) is the stationary distribution under the alternative model implied by (η0, η1).
6
We consider restrictions on the martingales Zh pertinent for modeling probabilities
conditioned on X0.
Definition 3.1. Z+ is the set of martingales g(X0)Zh constructed by (i) selecting a pair
(η0, η1) that satisfies (11), (ii) pinning down an associated ht = η0+η1Xt, (iiii) constructing
an implied martingale Zh via (2), and (iv) selecting g ∈ G.
We can further restrict a family of models by using a nonnegative quadratic function of x
ξ(x).= ξ0 + 2ξ1x+ ξ2x
2 ≥ 0 (12)
to express a collection of alternatives to a baseline model. For example, to induce ξ to
capture a prespecified κ, form
ξ(x).= ξ0 + 2ξ1x+ ξ2x
2 =1
|σ|2(κ− κ)2 x2.
This choice of ξ makes both κ = κ and κ = 2κ− κ be alternative parameter configurations
that are among the models to be included in a convex set of Z’s. More generally, we can
select (µ, φ, κ) and compute η1 and η1 by solving the counterpart to (11). Then
ξ(x) = |η0 + η1x|2.
We again pick up additional parametric models by casting our restrictions in terms of the
quadratic function ξ.
Definition 3.2.
Zo ={Zh ∈ Z+ : |ht|
2 ≤ ξ(Xt)}
for all t with probability one.
Next we construct a larger set of martingales that contains Zo but allows departures
from the parametric structure (10).
Definition 3.3.
Z ={Zh ∈ Z : |ht|
2 ≤ ξ(Xt)}
(13)
for all t with probability one.
7
Zh’s in Zo represent models with time-invariant parameters; Zh’s in Z can represent
models whose parameters vary over time.6
3.2 Appropriate restrictions on the h process
We want to construct a set of models by using a measure of statistical discrepancy from
a baseline model. For that purpose, we shall replace the instant-by-instant inequality in
the definition (13) of Z with restrictions cast in terms of probability weighted averages
of |ht|2. These restrictions allow the trade-offs among intertemporal dimensions of model
comprehended by statistical model discrimination criteria.7
3.3 Z∗
We construct our first convex set of martingales Zh by starting with a drift distortion h
that represents a particular alternative parametric model created along lines described in
section 3.1. We use the following functional of a process Zh:
ǫ(Zh; |h|2, x
)=δ
∫ ∞
0
exp(−δu)E[Zh
u
(logZh
u
)du|X0 = x
]
−
∫ ∞
0
exp(−δu)E[Zh
u | hu | du|X0 = x]
=1
2
∫ ∞
0
exp(−δu)Eh[(
|hu|2 − |hu|
2)|X0 = x
]. (14)
Three important features of ǫ are:
i) ǫ is convex in Zh;
ii) ǫ can readily be computed under the h model;
iii) ǫ depends on h only through the scalar process |h|2.
6This is accomplished by requiring Zh to belong to Z rather than Z+.7In constructing Zo and Z, we impose the instant-by-instant constraint on ht described in (13). But
some models that don’t satisfy such an instant-by-instant constraint are equally difficult to distinguishstatistically from the baseline model. The ambiguity averse decision maker of Chen and Epstein (2002)considers a set of models characterized by martingales that are generated by h processes that satisfyinstant-by-instant constraints on h. Anderson et al. (1998) explored consequences of this type of constraintwithout the state dependence.
8
Let
|h|2 = ξ(x)
and introduce a positive number ρ.
Definition 3.4.
Z∗ =
{g(X0)Z
h ∈ Z :
∫ǫ[Zh; ξ(X), x
]g(x)Q(dx) +
∫g(x) log g(x)Q(x)− ρ ≤ 0
}.
(15)
Z∗ includes martingales in Z that are associated with the parametric probability models
of Section 3.1. In light of feature i) ǫ, the set Z∗ is is convex in g(X0)Zh and necessarily
contains Z h and Zh = 1. Feature ii) makes it tractable to use Z∗ to pose a recursive robust
decision problem. Feature iii) provides a convenient way to use {ξ} to include parametric
models like those discussed in Section 3.1 within Z∗.
In section 5, we pose a robust decision problem in which Z∗ serves as a family of positive
martingales. Evidently, both the baseline model (1) and the alternative models captured
by the quadratic function ξ(X) play important roles in shaping the set Z∗. These models
also shape our next convex set, which considers the entropy of a Z ∈ Z∗ relative to the
baseline model (1).
3.4 Z
While the drift distortions h help shape Z∗, they don’t influence a set of martingales Z
that emerges from studying how Brownian motions disguise probability distortions of a
baseline model, making them difficult to distinguish statistically. To construct Z we use
Chernoff (1952) entropy, which differs from discounted relative entropy. Although Chernoff
entropy’s connection to a specific statistical decision problem makes it attractive, it has the
disadvantage that it is less tractable than relative entropy for the types of robust decision
problems that interest us.
In the spirit of Anderson et al. (2003), we use Chernoff (1952) entropy to measure
a distortion Z to a baseline model. Think of a pairwise model selection problem that
statistically compares the baseline model (1) with a model generated by the martingale
Zh. The logarithm of the martingale evolves according to
d logZht = −
1
2|ht|
2dt+ ht · dWt.
9
Consider a statistical model selection rule based on a data history of length t that takes the
form logZht ≥ log τ , where Zh
t is the likelihood ratio associated with the alternative model
for a sample size t. To construct a bound on the probability that this model selection rule
incorrectly chooses the alternative model when the baseline model governs the data, we use
an argument from large deviations theory that starts from the inequality
1{logZht≥τ} = 1{−rτ+r logZh
t≥0} = 1{exp(−rτ)(Zh
t)r≥1} ≤ exp(−rτ)(Zh
t )r,
which holds for 0 ≤ r ≤ 1. The expectation of the term on the left side equals the
probability of mistakenly selecting the alternative model when the data are a sample of
size t generated by the baseline model. We bound this mistake probability for large t by
following Donsker and Varadhan (1976) and Newman and Stuck (1979) and studying
lim supt→∞
1
tlogE
[exp(−rτ)
(Zh
t
)r]= lim sup
t→∞
1
tlogE
[(Zh
t
)r]
for alternative choices of r. The threshold τ does not affect this limit. Furthermore, the
limit is often independent of the initial state X0 = x. To get the best bound, we compute
inf0≤r≤1
lim supt→∞
1
tlogE
[(Zh
t
)r],
a limit that is typically negative because mistake probabilities decay with sample size. A
measure of Chernoff entropy is then
χ(Zh, x) = − inf0≤r≤1
lim supt→∞
1
tlogE
[(Zh
t
)r]. (16)
Appendix E describes how to compute Chernoff entropy.
To help interpret χ(Zh, x), consider the following argument. If the actual decay rate
of mistake probabilities were constant, then mistake probabilities for two sample sizes
Ti, i = 1, 2, would be
mistake probabilityi = .5 exp(−Tiψ∗)
for ψ∗ = χ(Zh, x). We define a ‘half-life’ as an increase in the sample size T2 − T1 > 0 that
multiplies the mistake probability by a factor of .5:
.5 =mistake probability2mistake probability1
=exp(−T2ψ
∗)
exp(−T1ψ∗).
10
So the half-life is approximately
T2 − T1 =− log .5
ψ∗. (17)
The preceding back-of-the-envelope calculation justifies the detection error bound com-
puted by Anderson et al. (2003). The bound on the decay rate should be interpreted
cautiously because, while it is constant, the actual decay rate is not. Furthermore, the pair-
wise comparison oversimplifies the challenge truly facing a robust decision maker, which is
statistically to discriminate among multiple models.
We could conduct a symmetrical calculation that reverses the roles of the two models,
so that the h model with martingale Zh becomes the model on which we condition. It is
straightforward to show that the limiting rate remains the same. Thus, when we select a
model by comparing a log likelihood ratio to a constant threshold, the two types of mistakes
share the same asymptotic decay rate.
Our second convex set is a ball formed using Chernoff entropy (16).
Definition 3.5.
Z ={Zh ∈ Z : χ(Zh; x) ≤ χ
}. (18)
The radius χ of the ball can be adjusted to attain a specified half-life.
4 Calibrating θ and γ
In subsections 4.1 and 4.2, we formulate a robust planning problem for an economy with
a representative consumer having an instantaneous utility function that is logarithmic in
consumption. Associated with the worst-case probability from the robust planning problem
is a greatest lower bound of expected discounted utility over the family Z∗ of alternative
probability distributions. In subsections 4.3, 4.4, and 4.5, we represent the worst-case
probability as a drift distortion to the multivariate Brownian motion in the baseline model
(1). We then use that drift distortion to guide the calibration of the parameters θ and γ
that pin down the size of the set Z∗. In section 5, we show how that same worst-case drift
distortion appears in a recursive representation of competitive equilibrium prices for an
economy with a representative investor. We deduce uncertainty prices and connect them
to the worst-case drift distortion from our robust planning problem.
11
4.1 Robust planner’s problem
Consider a consumption process C∗ = Y . Guess a value function υ(x, θ, γ) + log Y . We
use a family martingales in the set Z∗ to represent alternative probabilities. We take the
function ξ to be
ξ(x) = γξ(x),
where for the moment γ is an arbitrary parameter and ξ is pre-specified. Eventually, we will
allow ξ to be specified a priori up to a scale determined by a scalar γ that we’ll calibrate
by imposing a model detection probability half-life defined in terms of Chernoff entropy.
Let θ be a multiplier on the constraint:
∫ǫ[Zh; ξ(X), x
]g(x)Q(dx) +
∫g(x) log g(x)Q(x)− ρ ≤ 0 (19)
In subsection 4.2, for a given (θ, γ) we construct a recursive representation of a worst-case
drift distortion.
4.2 Recursive representation of worst-case drift distortion
Given (γ, θ), we compute h by solving HJB equation
0 = minh
− δυ(x, θ, γ) + (.01)(µ+ x)− υ′(x, θ)κx+1
2|σ|2υ′′(x, θ, γ)
+ (.01)α · h + υ′(x, θ, γ)σ · h+θ
2|h|2 −
θγ
2ξ(x). (20)
The solution υ of HJB equation (20) is quadratic in x:
υ (x, θ, γ) = −1
2
[υ2(θ, γ)x
2 + 2υ1(θ, γ)x+ υ0(θ, γ)],
which implies that the minimizing h is linear in x:
h∗(x, θ, γ) = −1
θ[.01α− συ2(θ, γ)x− συ1(θ, γ)] . (21)
4.3 Determining θ
To set θ, we must decide how to weight the initial state. Previous research by Petersen et al.
(2000) and Hansen et al. (2006) set Q to be a mass point over single value of x and thereby
12
effectively conditioned on the initial state. Here we suggest an alternative approach.
Under the baseline model, X has a stationary distribution Q having density q with
mean φ/κ and variance |σ|2/(2κ). Consider any g ∈ G. Introduce a nonnegative parameter
ρ that we temporarily take as a fixed number. To make a conservative adjustment of the
probability measure over the initial state, compute g given (θ, γ) by solving
ming∈G
∫υ (x, θ, γ) g(x)Q(dx) + θ
[∫log g(x)g(x)Q(dx)− ρ
]. (22)
The minimizing g is an exponentially tilted density
g(x, θ, γ) ∝ exp
[−1
θυ (x, θ, γ)
](23)
that is evidently normal with precision
ω(θ, γ) =2κ
|σ|2−
1
θυ2(θ, γ)
and mean
ν(θ, γ) =1
ω
[2φ
|σ|2+
1
θυ1(θ, γ)
].
To compute θ, we substitute g into (22) to obtain the maximand in the following
problem:
maxθ>0
−θ log
[∫exp
(−1
θυ(x, θ, γ)
)Q(dx)
]− θρ.
Let θ(ρ, γ) denote the maximizing θ, which depends on the value of ρ. If we set ρ and γ
defining the set of alternative models, we know θ and through representation (21) have a
complete characterization of the worst-case model. In the next subsection we describe how
to use a half-life for a model detection statistic to determine γ. This will tell us how to
scale γ for a given value of ρ. We could specify ρ a priori. But in the next subsection, we
suggest an alternative approach.
13
4.4 Specifying ρ
Instead of setting ρ a priori, we set it to solve:
∫g[x, θ(ρ, γ), γ] log g[x, θ(ρ, γ), γ]Q(dx) = ρ. (24)
Let ρ∗ denote the resulting ρ and let θ∗(γ) = θ(ρ∗, γ). Recall the relative entropy concept
(g;Zh) from equation (9)
(g;Zh) ≡
∫g(x)(Zh; x)Q(dx) +
∫g(x) log g(x)Q(dx),
where
(Zh; x) =1
2
∫ ∞
0
exp(−δu)Eh[|hu|
2|X0 = x].
The relative entropy measure includes an adjustment for distorting the initial distribution.
By imposing (24), we have set ρ∗ exactly to offset the term∫g(x) log g(x)Q(dx) when
evaluating the constraint (19) at the minimizing choice of g and Z.
4.5 Setting γ via Chernoff entropy
We refine a suggestion of Anderson et al. (1998). Compute h∗ [x, θ∗(γ), γ] and evaluate the
associated Chernoff entropy. Then adjust γ to match a target half-life.8 A larger value of
γ should lead to a smaller half-life. Call the resulting γ, γ∗ and let
g∗(x) = g [x, θ∗(γ∗), γ∗]
5 Robust Portfolio Choice and Pricing
In this section, we describe equilibrium prices that reconcile a representative consumer to
bearing risks presented by the environment described by baseline model (1) in light of his
concerns about model misspecification as expressed with the set Z∗. We construct equilib-
rium prices by appropriately extracting shadow prices from the robust planning problem of
subsection 4.1. We decompose risk prices into separate equilibrium compensations for bear-
ing risk and for bearing model uncertainty. We also describe an equilibrium term structure
8We expect but haven’t proved that the half-life is monotone in γ.
14
of compensations for bearing model uncertainty. We begin by posing the representative
consumer’s portfolio choice problem.
5.1 Robust representative consumer portfolio problem
A representative consumer faces a continuous time Merton portfolio problem in which
individual wealth K evolves as
dKt = −Ctdt+ (.01)Ktι(Xt)dt+KtAt · dWt + (.01)Ktπ(Xt) · Atdt, (25)
where At = a is a vector of chosen risk exposures, ι(x) is the instantaneous risk free rate
expressed as a percent, and π(x) is the vector of risk prices evaluated at state Xt = x.
Initial wealth is K0. The investor has discounted logarithmic preferences but distrusts his
probability model.
Key inputs to a representative consumer’s robust portfolio problem are the baseline
model (1), the wealth evolution equation (25), the vector of risk prices π(x), and the
quadratic function ξ in (12) that defines the alternative explicit models that concern the
representative consumer. As in the robust planners problem analyzed in section 4.1, let θ
be a penalty parameter or Lagrange multiplier on the constraint (19). For the recursive
competitive equilibrium, we take (θ, γ) as given. We described how we calibrate these
parameters in section 4.
Under the guess that the value function takes the form ψ(x, θ, γ) + log k + log δ, the
HJB equation for the robust portfolio allocation problem is
0 = maxa,c
minh
−δψ(x, θ, γ)− δ log k − δ log δ + δ log c−c
k+ (.01)ι(x)
+ (.01)π(x) · a+ a · h−|a|2
2+(φ− κx
)ψ′(x, θ, γ) + h · σψ′(x, θ, γ)
+|σ|2
2ψ′′(x, θ, γ) + θ
[|h|2
2−γ
2
(ξ2x
2 + 2ξ1x+ ξ0
)]. (26)
First-order conditions for consumption are
δ
c∗=
1
k,
which implies that c∗ = δk, an implication that flows partly from the representative con-
sumer’s unitary elasticity of intertemporal substitution. First-order conditions for a and h
15
are
(.01)π(x) + h∗(x, θ, γ)− a∗(x, θ, γ) = 0 (27a)
a∗(x, θ, γ) + θh∗(x, θ, γ) + ψ′(x, θ, γ)σ = 0. (27b)
Here we appeal to arguments like those in Hansen and Sargent (2008, ch. 7) to justify
stacking first-order conditions and not worrying about “who goes first” in the two-person
zero-sum game.9
5.2 Competitive equilibrium prices
We show here that the drift distortion h∗ that emerges from the robust planner’s problem of
subsection 4.1 determines prices that a competitive equilibrium awards for bearing model
uncertainty. To compute a vector π(x) of competitive equilibrium vector of risk prices, we
find a robust planner’s marginal valuation of exposure to the W shocks. We decompose
that price vector into separate compensations for bearing risk and for accepting model
uncertainty.
Noting from the robust planner’s problem that the shock exposure vectors for logK
and log Y must coincide implies
a∗ = (.01)α.
Thus, from (27a), π = π∗, where
π∗(x) = α− 100h∗(x, θ). (28)
Similarly, in the problem for a representative consumer within a competitive equilibrium,
the drifts for logK and log Y must coincide:
−δ + (.01)ι(x) + (.01)[(.01)α− h∗(x)] · α−.0001
2α · α = (.01)(µ+ x),
so that ι = ι∗, where
ι∗(x) = 100δ + (µ+ x) + α · h∗(x, θ)−.01
2α · α. (29)
9If we were to use a timing protocol that allows the maximizing player to take account of the impactof its decisions on the minimizing agent, we would obtain the same equilibrium decision rules as thosedescribed in the text.
16
We can use these formulas for equilibrium prices to construct a solution to the HJB equation
of a representative consumer in a competitive equilibrium by letting υ = φ and g∗ = (.01)α.
5.3 Reinterpreting the worst-case portfolio problem
As described by Hansen et al. (2006), there is an ordinary (i.e., non-robust) portfolio
selection problem that confronts a representative consumer who has a completely trusted
model for the exogenous state dynamics and whose decision rule matches the decision rule
that attains the value function ψ associated with the HJB equation (26) for the robust
representative consumer portfolio problem. This consumer’s completely trusted model
of course differs from the baseline model (1). Hansen et al. (2006) call this an ex post
problem because it comes from exchanging orders of maximization and minimization in
the two-person zero-sum game that gives rise to the robust portfolio choice rule. This ex
post portfolio problem is a special case of a Merton problem with a state evolution that is
distorted relative to the baseline model. The distorted evolution imputes to the process W
a drift h∗ so that
dWt = h∗(Xt)dt+ dW ∗t
= (η∗0 + η∗1Xt) dt+ dW ∗t ,
whereW ∗ is a multivariate standard Brownian motion under the h∗ probability distribution.
Thus,
d logYt = .01(µ+Xt)dt+ (.01)h∗(Xt)dt+ (.01)α · dW ∗t
dXt = −κXtdt+ σ · h∗(Xt)dt+ σ · dW ∗t .
A value function ψ + log a satisfies the HJB equation
0 = maxa,c
− δψ(x)− δ log k + δ log c−c
k+ (.01)ι(x) + (.01)π(x) · a+ a · (η0 + η∗1x)−
|a|2
2
[−κx+ η∗0 + η∗1x] ψ′(x) +
|σ|2
2ψ′′(x).
First-order conditions are
δ
c∗−
1
k= 0
17
(.01)π(x) + η∗0 + η∗1x− a∗ = 0,
which lead to decision rules c∗ = δk and
a∗ = (.01)π(x) + η∗0 + η∗1x.
Because the exposure and drift for logK and log Y should coincide in equilibrium, it follows
that
−δ + (.01)ι(x) + .01α · η∗(x) + .0001α · α−.0001
2π · α = (.01) [µ+ x+ α · h∗(x)] .
Thus, the ordinary decision rules that solve the ex post portfolio problem imply the same
equilibrium prices as the robust portfolio problem, so that π = π∗ and ι = ι∗, as given by
(28) and (29), respectively.
5.4 Term structure of uncertainty prices
We now study how competitive equilibrium uncertainty prices vary over an investment hori-
zon by computing a pricing counterpart to an impulse response function. Our continuous-
time formulation means that the pertinent shock that occurs during the “next instant” is
an incremental change that will have incremental effects on prices across all future time
periods. An asset exposed to these shocks earns compensations that depend on the horizon.
Shock-price elasticities are state dependent because they vary with the growth state. In
this section, we compute the elasticities and produce what we regard as a dynamic value
decomposition. We present a quantitative example in section 7.3.10
5.4.1 Local uncertainty prices
The equilibrium stochastic discount factor process for our robust representative consumer
economy is
d logSt = −δdt− .01 (µ+Xt) dt− .01α · dWt + h∗t · dWt −1
2|h∗t |
2dt. (30)
The stochastic discount factor has a linear local mean and a quadratic local variance.
The exponential-quadratic formulation has been used extensively in empirical asset pricing
10Hansen (2011) presents an overview of such shock elasticities.
18
applications. Duffie and Kan (1994) described term structures of interest rates implied by
models with affine stochastic discount factors. Ang and Piazzesi (2003) estimated a term
structure model with an affine stochastic discount factor process driven by macroeconomic
variables.
The entries of the vector, π∗(Xt) given by (28), which equal minus the local exposures to
the Brownian shocks, are usually interpreted as local “risk prices,” but we shall reinterpret
them. Motivated by the decomposition
minus stochastic discount factor exposure = .01α −h∗t ,
risk price uncertainty price
we prefer to think of .01α as risk prices induced by the curvature of log utility and −h∗t
as “uncertainty” prices induced by a representative investor’s doubts about the baseline
model. Here h∗t = η∗0 + η∗1x, as described in equation (21). When η∗1 = 0, h∗t is constant;
but when η∗1 differs from zero, the uncertainty prices −h∗t = −h∗(Xt) are time varying and
depend linearly on the growth state Xt. When the dependence of h∗ on x is positive, these
uncertainty prices are higher in bad times than in good times. Countercyclical uncertainty
prices emerge endogenously from a baseline model that excludes stochastic volatility in the
underlying consumption risk as an exogenous input. Such fluctuations emerge endogenously
from a baseline model that excludes stochastic volatility in the underlying consumption risk
as an exogenous input. Stochastic volatility models introduce new risks to be priced while
also inducing fluctuations in the prices of the “original” risks. The mechanism in this
paper simultaneously enhances and induces fluctuations in the uncertainty prices, but it
introduces no new sources of risk. Instead, it focuses on the impact of uncertainty about
the implications of those risks.
Following Borovicka et al. (2011), we assign horizon-dependent uncertainty prices to risk
exposures. To represent shock price elasticities, we study the dependence of logarithms of
expected returns on an investment horizon. The logarithm of the expected return from a
consumption payoff at date t is
logE
(Ct
C0
∣∣∣∣∣X0 = x
)− logE
[St
(Ct
C0
) ∣∣∣∣∣X0 = x
]. (31)
The first term captures the expected payoff and the second the cost of the payoff. A shock
in the next instance affects the consumption and the stochastic discount factor processes.
19
In continuous time, this leads formally to what is called a Malliavin derivative. There is
one such derivative for each Brownian increment. The date zero shock influences both the
expected asset payoff at date t (aggregate consumption in this case) and the cost of an asset
with this payoff. Its impact on the logarithm of the expected return is the price elasticity
and its impact on the logarithm of the expected payoff is the exposure elasticity.
Consider initially the expected payoff term on the left side of (31). Let D0Ct denote
the derivative vector of Ct with respect to dW0. The familiar formula for a derivative of a
logarithm applies so that
D0Ct = CtD0 logCt.
A contribution to the elasticity vector for horizon t is
E[Ct
C0
D0 logCt|X0 = x]
E[Ct
C0
|X0 = x] .
There is a distinct elasticity for each shock. Since logC has linear dynamics, D0 logCt can
be shown to be the same as the vector of impulse responses of logCt to shocks at date zero,
which does not depend on the Markov state. We call this an exposure elasticity.
Consider next the cost term on the right-hand side of (31). For the productM = SC, a
calculation analogous to the preceding one confirms that the contribution of the cost term
to the elasticity is
E[Mt
M0D0 logMt|X0 = x
]
E[Mt
M0|X0 = x
] .
The dynamic evolution of the stochastic discount factor is not linear in the state variable,
and as a result
D0 logMt = D0 log St +D0 logCt
is no longer a deterministic function of time.
The shock price elasticity combines these calculations:
E[Ct
C0
D0 logCt|X0 = x]
E[Ct
C0|X0 = x
] −E[Mt
M0
D0 logMt|X0 = x]
E[Mt
M0|X0 = x
] = −E[Mt
M0
D0 logSt|X0 = x]
E[Mt
M0|X0 = x
] . (32)
This is a valuation analogue of the impulse response function routinely estimated by em-
20
pirical macroeconomists. As we vary t ≥ 0, we trace out a trajectory of uncertainty price
elasticities for each shock. We give nearly analytical formulas for these in Appendix F.
6 A quantitative example
For a laboratory, we use our baseline model (1) evaluated at the following maximum like-
lihood estimates computed by Hansen and Sargent (2010):11
µ = .465,
α =
[.468
0
]
φ = 0
κ = .185
σ =
[0
.149
](33)
We consider Chernoff entropy balls associated with half-lives of 40 quarters, 80 quarters,
and 120 quarters. For comparison, we include a model with no concern for robustness, which
is equivalent to a half-life equal to infinity. We consider two specifications of ξ:
ξ(x) = x2
ξ(x) = 1.
Tables 1 and 2 report worst case models that emerge from the robust planner’s HJB
equation (20) for the specifications ξ(x) = x2 and ξ(x) = 1, respectively. The worst-case
models endow X with a negative mean given by φ
κ. This in turn shifts the implied growth
in consumption. Since the worst case model also includes a change in µ, we report the
composite outcome µ + φ
κ. As we reduce the half-life, the worst-case model makes the
constant parameter adjustment smaller. When ξ = x2, we also increase the persistence in
the growth-rate process. Notice that while the minimizing agent could choose to reduce
persistence even more, he instead chooses to allocate some of the entropy distortion to the
11The estimates are for Y being consumption of nondurables and services for aggregate U.S. data overthe period 1948II to 2009 IV.
21
constant terms.
Consistent with findings of Anderson et al. (2003) and Hansen and Sargent (2010), when
ξ(x) = 1, there is no change in the persistence parameter κ. The worst-case model targets
the constant terms in the consumption evolution and the state evolution. The worst-case
analysis reduces to a determination of how much to distort the respective constant terms
only.
Half-Life µ φ κ µ+ (φ/κ)
∞ 0.4650 0 0.1850 0.4650
120 0.4270 -0.0255 0.1491 0.2562
80 0.4239 -0.0301 0.1361 0.2024
40 0.4228 -0.0391 0.1073 0.0579
Table 1: Worst-case parameter values for ξ(x) = x2 associated with HJB equation (20).
Half-Life µ φ κ µ+ (φ/κ)
∞ 0.4650 0 0.1850 0.4650
120 0.4140 -0.0276 0.1850 0.2648
80 0.4026 -0.0338 0.1850 0.2198
40 0.3768 -0.0478 0.1850 0.1182
Table 2: Worst-case parameter values for ξ(x) = 1 associated with HJB equation (20).
Corresponding to each of our two specifications of ξ(x), figures 1 and 2 plot the expected
consumption growth over horizon t. The expected growth rate at t is:
[1 0
]exp
([0 1
0 −κ
]t
)[µ
φ
]= µ+
φ
κ−φ
κexp(−κt).
Integrating this growth rate over an interval [0, t] gives a worst-case trend for log consump-
tion: (µ+
φ
κ
)t+
φ
κ2exp(−κt)−
φ
κ2(34)
Notice that the initial growth trend growth rate is µ and that the eventual growth rate is
µ+ φ
κ. In this calculation, we impose the distorted model starting at date zero and consider
22
its implications going forward. The shift in the constant term for the evolution of X has
no immediate impact on the growth of logC. Its eventual impact is determined in part by
the persistence parameter κ. We applied formula (34) to alternative models including the
worst-case models to compute the long-run drifts reported in Figures 1 and 2.
Figure 1: Long-run drift of logCt for the three target half-lives when ξ = x2.
23
Figure 2: Long-run drift of logCt for the three target half-lives when ξ = 1.
Next we consider the distributional impacts. The new information about logCt− logC0
(scaled by 100) is:
∫ t
0
∫ u
0
exp(−κv)σ · dBu−vdu+
∫ t
0
α · dBu =
∫ t
0
∫ u
0
exp[−κ(u − r)]σ · dBrdu+
∫ t
0
α · dBu
=
∫ t
0
∫ t
r
exp[−κ(u− r)]duσ · dBr +
∫ t
0
α · dBu
=1
κ
∫ t
0
exp(κr) [exp(−κr)− exp(−κt)] σ · dBr
+
∫ t
0
α · dBu
=1
κ
∫ t
0
[1− exp[−κ(t− r)]]σ · dBr +
∫ t
0
α · dBu
The variance is .0001 times the following object
1
κ2
∫ t
0
[1− 2 exp[−κ(t− r)] + exp[−2κ(t− r)]|σ|2dr + |α|2t
=1
κ2|σ|2t+ |α|2t−
2
κ3[1− exp(−tκ)]|σ|2 +
1
2κ3[1− exp(−2κt)]|σ|2,
24
where we have used the fact that α · σ = 0. Using this calculation, figures 3 and 4
display the interdecile ranges of the distribution for consumption growth over alternative
horizons. These figures depict deciles for both the baseline model and the worst-case models
associated with a half-life of 80. The region between the deciles illustrates a component
of risk in the consumption distribution. The variation across the baseline and worst-case
models reflects a broader notion of uncertainty driven by skepticism about the baseline
model. The upper decile of the worst-case model overlaps the lower decile of the baseline
model in both figures. The interdecile range is somewhat larger when ξ is quadratic in x
than when ξ is constant.
Figure 3: Expected values of logCt scaled by 100 for the baseline model and for a half-life
of 80 when ξ = x2. The shaded black and red areas show the .1 and .9 interdecile ranges
under the baseline model and the worst-case model for a half-life of 80. The black line is
the mean growth for the baseline model, and the red circle line is the mean growth for the
worst-case model.
25
Figure 4: Expected values of logCt scaled by 100 for the baseline model and for a half-life
of 80 when ξ = 1. The shaded black and red areas show the .1 and .9 interdecile ranges
under the the baseline and the worst-case model for a half-life of 80. The black line is the
mean growth for the baseline model, and the red circle line is the mean growth for the
worst-case model.
7 Comparing sets of models
We have used the set of models
Z∗ =
{Zh :
∫ǫ(Zh; ξ, x
)g∗(x)Q(dx) ≤ 0
}
to describe our robust representative consumer’s concerns about misspecification of his
baseline model 1. In this section, we compare Z∗ to a corresponding Chernoff entropy ball
Z and also to another set called an entropy ball that we now describe.
26
7.1 Entropy ball
Anderson et al. (2003) and Hansen and Sargent (2010) focused primarily on entropy balls.
Here we are interested in constructing a new set Z that we define as the smallest entropy
ball that contains Z∗. An entropy ball is a family of Zhs that satisfy12
(Zh) =
∫ [∫ ∞
0
exp(−δt)E(Zh
t logZht |X0 = x
)]g∗(x)Q(dx) ≤
1
2δξ (35)
for some constant ξ > 0. By constructing an entropy ball that contains Z∗, we compute
how large relative entropy can be for martingales in the set Z∗.
To determine this magnitude, we take the quadratic function ξ(x) and pose a maximum
problem that starts from the observation that a martingale Zh that is biggest in terms of
its relative entropy satisfies the constraint in definition 3.4 at equality:
∫ǫ[Zh; ξ(Xt), x
]q∗(x)Q(dx) = 0.
This leads us to maximize the discounted objective
∫ (E
[∫ ∞
0
exp(−δt)Zht ξ(Xt)dt|X0 = x
])g∗(x)Q(dx)
subject to the constraint:
∫ǫ(Zh; ξ, x
)g∗(x)Q(dx) ≤ 0.
Associated with this optimization problem is the HJB equation:
0 = maxh
− δυ(x, ϑ) +1
2ξ(x)− υ′(x, ϑ)κx+
1
2|σ|2υ′′(x, ϑ)
+ υ′(x, ϑ)σ · h−ϑ
2|h|2 +
ϑ
2ξ(x), (36)
where we can regard ϑ as a Kuhn-Tucker multiplier on a relative entropy constraint. Be-
12We use the term ball loosely because typically a ball in mathematics is defined using a metric. Althoughrelative entropy quantifies statistical discrepancy, it is not a metric because it depends on which of twomodels is taken as the baseline model.
27
cause the maximizing h takes the form
h = η0(ϑ) + η1(ϑ)X,
it is one of the models in the parametric class from section 3.1. To compute ξ in constraint
(35), we solve
ξ = 2δminϑ≥0
∫υ(x, ϑ)g∗(x)Q(dx).
Let
Z.=
{Zh :
∫(Zh; x
)g∗(x)Q(dx) ≤
1
2δξ
}
By construction Z∗ ⊂ Z.
7.2 Comparing sets
We compare intersections of Zo with each of the three sets Z∗, Z, and Z. While it is
tractable to use the sets Z∗ and Z to formulate robust decision problems, these sets are
not directly linked to statistical discrimination problems. The set Z is closely linked to
“statistical discrimination,” but for forming robust decision problems it is not as tractable
as the other two. It would be comforting if Z∗ were closely to approximate Z, at least in
regions near the worst-case model that emerges from the robust planner’s HJB equation
(20).
We compute and report the projection(Z ∩ Zo
)of the Chernoff ball on Zo for three
half-lives in figure 6. We represent these projections using the three parameter values that
characterize Zo. For comparison, we also report (Z∗ ∩ Zo). The sets are distinct, but
the big differences occur for larger values of κ at which the Chernoff ball contains points
not included in Z∗. Such large values of κ turn out not the be the ones that the robust
planner most fears. Overall, the regions are closer for longer specifications of the half-life
of Chernoff entropy.
28
Figure 5: Projections of Sets I and II onto three-parameter axis. From Top to Bottom:
Target half-lives 120, 80, and 40, respectively. Left: ξ = x2. Right: ξ = 1. (Z∗∩Zo) shown
in blue mesh. (Z ∩ Zo) shown in yellow. The solution to the robust planner’s problem is
shown as the red point.
29
Half-Life µ φ κ µ+ (φ/κ)
∞ 0.4650 0 0.1850 0.465069.78 0.3982 -0.0362 0.1850 0.202442.19 0.3791 -0.0466 0.1850 0.127316.65 0.3282 -0.0742 0.1850 -0.0726
Table 3: Worst-case Parameter Values for ξ(x) = x2 under the set (Z ∩ Zo).
In figure 6, we compare the entropy ball (Z ∩ Zo) projection to both (Z∗ ∩ Zo) and
(Z ∩ Zo) when ξ(x) = x2. The left side of this plot shows how large an entropy ball
would have to be to contain the set used in the robust planning problem affiliated with
HJB equation (20). The right side compares the Chernoff ball to this entropy ball. As is
evident from this figure, the resulting entropy ball is much larger. When we solve the robust
planner’s problem with this constructed ball, we reduce the implied half lives, as reported
in Table 7.2, decline from 120, 80 and 40 to 70, 42 and 17, respectively. The constant terms
for both the consumption equation (the first equation of (10)) and the consumption growth
equation (the second equation of (10)) are reduced while the autoregressive parameter is
not altered in comparison to Table 6.
30
Figure 6: Comparing entropy balls to sets I and II when ξ = x2. From Top to bottom:
half-lives 120, 80, and 40, respectively. (Z∗ ∩ Zo) shown in blue, (Z ∩ Zo) shown in black
mesh. (Z ∩ Zo) shown in yellow. The solution to the robust planner’s problem is the red
point.
31
7.3 Shock price elasticities
Figure 7: Shock-price elasticity to a shock to X for the three target half-lives when ξ = x2.
From Top to Bottom: the half-lives 120, 80, and 40, respectively. The shaded regions show
interquartile ranges of the shock-price elasticities under the stationary distribution for X .
32
The uncertainty price elasticities depend on the initial state x. In figure 7, we display
the shock elasticities evaluated at the median and the two quartiles of the stationary dis-
tribution for X . We shade in interquartile ranges. Figure 7 shows shock price elasticity
trajectories for the growth rate shock. They are nearly constant across horizons. Increasing
the concern for robustness, as reflected by the sizes of the associated Chernoff half lives,
makes the elasticities larger and increases their variation across horizons.
8 Concluding remarks
We have applied our proposal for constructing a set of models surrounding a decision
maker’s baseline probability model to an asset pricing model in which a representative
consumer’s responses to model uncertainty make so-called prices of risk be countercyclical.
We say so-called because they are actually compensations for model uncertainty, not risk.
And their countercyclical components are entirely due to fears of model misspecification.
We have produced an affine model (30) of the log stochastic discount factor whose so-called
risk prices reflect a robust planner’s worst-case drift distortions h∗t . We describe how these
drift distortions should be interpreted as prices of model uncertainty. The dependency of
these uncertainty prices h∗t on the growth state x, and thus whether they are pro cyclical or
countercyclical, is shaped partly by a function ξ(x) that describes some particular models
that serve as alternatives to a baseline model. In this way, the theory of countercyclical
risk premia in this paper is all about how our robust consumer responds to the presence
of the particular alternative models among a huge set of more vaguely specified alternative
models that concern our representative consumer. We have demonstrated that this is a
simple way of generating countercyclical risk premia.
It is worthwhile comparing this paper’s way of inducing countercyclical risk premia with
three other macro/finance models that also get them. Campbell and Cochrane (1999) pro-
ceed in the standard rational expectations single-known-probability-model tradition and so
exclude any fears of model misspecification from the mind of their representative consumer.
They construct a history-dependent utility function in which the history of consumption
expresses an externality. This history dependence makes the consumer’s local risk aversion
depend in a countercyclical way on the economy’s growth state. Ang and Piazzesi (2003)
use an affine stochastic discount factor in a no-arbitrage statistical model and explore links
between the term structure of interest rates and other macroeconomic variables. Their
approach allows movements in risk prices to be consistent with historical evidence without
33
specifying a general equilibrium model. A third approach introduces stochastic volatil-
ity into the macroeconomy by positing that the volatilities of shocks driving consumption
growth are themselves stochastic processes. A stochastic volatility model induces time
variation in risk prices via exogenous movements in the conditional volatilities impinging
on macroeconomic variables.
What drives countercyclical risk prices in Hansen and Sargent (2010) is a particular
kind of robust model averaging occurring inside the head of the representative consumer.
The consumer carries along two difficult-to-distinguish models of consumption growth, one
asserting i.i.d. log consumption growth, the other asserting that the growth in log consump-
tion is a process with a slowly moving conditional mean.13 The consumer uses observations
on consumption growth to update a Bayesian prior over these two models, starting from
an initial prior probability of .5. The prior wanders over the post WWII sample for US
data, but ends up about where it started. Each period, the Hansen and Sargent representa-
tive consumer expresses his specification distrust by exponentially twisting a posterior over
the two baseline models in a pessimistic direction. That leads the consumer to interpret
good news as temporary and bad news as persistent, causing him to put countercyclical
uncertainty components into equilibrium “risk” prices.
In this paper, we propose a different way to induce variation in risk prices. We abstract
from learning and instead consider alternatives models with parameters whose future vari-
ations are not discernible from from the past. These time-varying parameter models differ
from the decision maker’s baseline model, a fixed parameter model whose parameters can
be well estimated from historical data. We ensure that among the class of alternative mod-
els are ones that allow for parameters persistently to deviate from those of the baseline
model in statistically subtle and time-varying ways. In addition to this class of alternative
models, the decision maker also includes other statistical specifications in the set of models
that concern him. The robust planner’s worst-case model responds to these forms of model
ambiguity partly by having more persistence than in a baseline models. Our approach gains
tractability because the worst-case model turns out to be a time-invariant model in which
projections for long-term growth are more cautious and stochastic growth is more persis-
13Bansal and Yaron (2004) and Hansen and Sargent (2010) both start from the observation that two suchmodels are difficult to distinguish empirically, but they draw different conclusions from that observation.Bansal and Yaron use the observation to justify a representative consumer who with complete confidenceembraces one of the models (the long-run risk model with persistent log consumption growth), while Hansenand Sargent use the observation to justify a representative consumer who initially puts prior probability.5 on both models and who continues to carry along both models when evaluating prospective outcomes.
34
tent than in the baseline model. Worst-case shock distributions are shifted in an adverse
fashion and with additional persistence that gives rise to enduring effects on uncertainty
prices. Adverse shifts in the shock distribution that drive up the absolute magnitudes of
uncertainty prices larger were also present in some of our earlier work (for example, see
Hansen et al. (1999) and Anderson et al. (2003)). In this paper, we induce state dependence
in uncertainty prices in a different way, namely, by specifying the set of alternative models
to capture concerns about the baseline model’s specification of persistence in consumption
growth.
Models of robustness and ambiguity aversion bring new parameters. In this paper,
we extend our earlier work on Anderson et al. (2003) on restricting these parameters by
exploiting connections between models of statistical model discrimination and our way of
formulating robustness. We build on mathematical formulations of Newman and Stuck
(1979), Petersen et al. (2000), and Hansen et al. (2006). We pose an ex ante robustness
problem that pins down a robustness penalty parameter θ by linking it to an asymptotic
measure of statistical discrimination between models. This asymptotic measure allows us to
quantify a half-life for reducing the mistakes in selecting between competing models based
on historical evidence. A large statistical discrimination rate implies a short half-life for
reducing discrimination mistake probabilities. Anderson et al. (2003) and Hansen (2007)
had studied the connection between conditional discrimination rates and uncertainty prices
that clear security markets. By following Newman and Stuck (1979) and studying asymp-
totic rates, we link statistical discrimination half-lives to calibrated equilibrium uncertainty
prices.
35
A Z∗ Reconsidered
Let h be a vector process adapted to the filtration F = {Ft : t ≥ 0}. Replace h with h− h
in formula (5) to construct a martingale Zh−h that determines what we shall call an h− h
model.
Let Bt be a bounded Ft-measurable random variable. Define
Eh(Bt|X0) ≡ E[Zh
t Bt|X0
]
Eh−h(Bt|X0) ≡ E[Zh−h
t Bt|X0
].
Here Eh denotes an expectation under the h model and Eh−h denotes an expectation under
the h− h model.
By usingZht
Zh−h
t
as a Radon-Nykodym’ derivative at time t, we can represent the h model
in terms of the h− h model:
Eh(Bt|X0) = E[Zh
t Bt|X0
]
= E
[(Zh
t
Zh−ht
)Zh−h
t Bt|X0
]
= Eh−h
[(Zh
t
Zh−ht
)Bt|X0
].
Recall that under the h probability distribution, W h is a multivariate standard Brownian
motion where from (6), dWt = htdt+ dW ht . Thus,
d logZh−ht = (ht − ht) · dWt −
1
2|ht − ht|
2dt
= (ht − ht) · dWht −
1
2|ht − ht|
2dt+ (ht − ht) · ht
= (ht − ht) · dWht +
1
2|ht|
2dt−1
2|ht|
2dt. (37)
Conditioned on date zero information, the discounted relative entropy of the h model
with respect to the h− h model is:
δ
∫ ∞
0
exp(−δt)E[Zh
t
(logZh
t − logZh−ht
)|X0 = x
]dt =
1
2Eh
[∫ ∞
0
exp(−δt)|ht|2dt | F0
]
(38)
36
where we have used integration by parts and the evolution in (37). We are interested in the
discrepancy between: (i) the relative entropy (8) of h with respect to the baseline model,
and (ii) the relative entropy (38) of the h model with respect to the h− h model:
ǫ(Zh; |h|2, x
)=δ
∫ ∞
0
exp(−δt)E[Zh
t
(logZh
t
)|X0 = x
]dt
− δ
∫ ∞
0
exp(−δt)E[Zh
t
(logZh
t − logZh−ht
)|X0 = x
]dt
=1
2
∫ ∞
0
exp(−δt)Eh[(
|ht|2 − |ht|
2)|X0 = x
]dt. (39)
B Construction of entropy ball
For a given multiplier, write the value function that solves HJB equation (36) in the form:
υ (x, θ) =1
2
[υ2(θ)x
2 + 2υ1(θ)x+ υ0(θ)]
which gives us
h∗(x, θ) =1
θ[συ2(θ)x+ συ1(θ)] .
We can solve for υ2, υ1, and υ0 by comparing the coefficients for x2, x and the constant
terms, respectively. Solving first for υ2:
υ2(θ) = θ
δ + 2κ−
√(δ + 2κ)2 − 4 |σ|2 ξ2
(θ+1)θ
2 |σ|2
.
−δυ1(θ) + (θ + 1)ξ1 − υ1(θ)κ+1
θ|σ|2υ2(θ)υ1(θ) = 0.
Thus
υ1(θ) = 2(θ + 1)
ξ1
δ +√
(δ + 2κ)2 − 4 |σ|2 ξ2(θ+1)
θ
.
Finally,
−δ
2υ0(θ) +
(θ + 1)
2ξ0 +
1
2υ2(θ)|σ|
2 +1
2θυ1(θ)
2|σ|2 = 0.
Thus
υ0(θ) =1
δ
[(θ + 1)ξ0 + υ2(θ)|σ|
2 +1
θυ1(θ)
2|σ|2].
37
To build the associated entropy ball, we construct
ξ = δminθ≥0
[υ2(θ)x
2 + 2υ1(θ)x+ υ0(θ)]
for X0 = x.
C Asymptotic error rates
Consider an alternative model:
d log Yt = (.01) (µ+Xt) dt+ (.01)α · dWt
dXt = φdt− κXtdt+ σ · dWt.
i) Input values of κ, µ, and φ. Construct the implied h(x) = η1x+ η0 by solving;
[µ− µ
φ− φ
]=
[α′
σ′
]η0
[0
κ− κ
]=
[α′
σ′
]η1
for η1 and η0.
ii) For a given r, construct ζ0, ζ1, ζ2, κ, ψ, φ, and µ from:
(−r + r2)|h(x)|2 = (−r + r
2)|η0 + η1x|2 = −
(ζ0 + 2ζ1x+ ζ2x
2)
κ = (1− r)κ + rκ
ψ = (1− r) + rψ
φ = rφ
µ = (1− r)µ + rµ
iii) Solve
−ψ =−1
2
(ζ0 + 2ζ1x+ ζ2x
2)+ (φ− xκ)(log e)′(x)
38
+(log e)′′(x)
2|σ|2 +
[(log e)′(x)]2
2|σ|2
where log e(x) = λ1x+12λ2x
2. Thus,
λ2 =κ−
√(κ)2 + ζ2|σ|2
|σ|2.
Given λ2, λ1 solves
−ζ1 − κλ1 + λ1λ2|σ|2 + φλ2 = 0
or
λ1 =ζ1 − φλ2λ2|σ|2 − κ
= −ζ1 − φλ2√(κ)2 + ζ2|σ|2
.
Finally,
ψ =1
2ζ0 −
1
2|σ|2λ2 −
1
2|σ|2 (λ1)
2 − φλ1.
iv) Repeat for alternative r’s and maximize ψ as a function of r.
D Robust value function
We solve for υ and θ∗.
D.1 Solving for υ
Consider
0 = minh
− δυ(x) + (.01)(µ+ x) + υ′(x)(φ+ κx) +1
2|σ|2υ′′(x, θ)
+ (.01)α · h+ υ′(x, θ)σ · h+θ
2|h|2 −
θ
2
(ξ2x
2 + 2ξ1x+ ξ0)
Recall that the value function is quadratic
υ (x, θ) = −1
2
[υ2(θ)x
2 + 2υ1(θ)x+ υ0(θ)]
which implies
h(x, θ) = −1
θ[.01α− συ2(θ)x− συ1(θ)] .
39
We can solve for υ2, υ1, and υ0 by matching the coefficients for x2, x and the constant
terms, respectively. Solving first for υ2:
υ2(θ) = θ
δ + 2κ−
√(δ + 2κ)2 − 4 |σ|2 ξ2
2 |σ|2
= θυ2,
where the last equation essentially defines υ2.
υ1(θ) =2
−.01− .01 (α · σ) υ2 + θφυ2 + θξ1
δ +√
(δ + 2κ)2 − 4 |σ|2 ξ2
υ0(θ) =1
δ
[−.02µ+ φυ1(θ) + θ |σ|2 υ2 +
1
θ|.01α− συ1(θ)|
2 + θξ0
]
For convenience, we write
υ1(θ) =υ1,1θ + υ1,0
υ0(θ) =υ0,1θ + υ0,0 + υ0,−1θ−1.
D.2 Solving for θ∗
We want to solve
maxθ
−1
2
(|σ|2
2κθ − υ2 (θ) |σ|2
)υ1 (θ)
2 +θ
2
[log(2κθ − υ2 (θ) |σ|
2)− log (2θκ)]−
1
2υ0 (θ)− θǫ.
θ∗ satisfies
θ∗ =
√|σ|2υ21,0 + (2κ− υ2|σ|2) υ0,−1
|σ|2υ21,1 + (2κ− υ2|σ|2) [− log (2κ− υ2|σ|2) + log (2κ) + υ0,1 + 2ǫ]
40
E Operationalizing Chernoff entropy
Here is how to compute Chernoff entropies for parametric models of the form (10). Because
the h’s associated with them take the form
ht = η(Xt),
these alternative models are Markovian. This allows us to compute Chernoff entropy by
using an eigenvalue approach of Donsker and Varadhan (1976) and Newman and Stuck
(1979). We start by computing the drift of(Zh
t
)r
f(Xt) for 0 ≤ r ≤ 1 at t = 0:
[Gf ](x).=(−r + r
2)
2|η(x)|2f(x) + rf(x)′ση(x)
− f ′(x)κx+f ′′(x)
2|σ|2,
where [Gf ](x) is the drift given that X0 = x. Next we solve the eigenvalue problem
[G(r)]e(x, r) = −ψ(r)e(x, r),
whose eigenfunction e(x, r) is the exponential of a quadratic function of x. We compute
Chernoff entropy numerically by solving:
χ(Zh, x) = maxr∈[0,1]
ψ(r).
To deduce a corresponding equation for log e, notice that
(log e)′(x) =e′(x)
e(x)
and
(log e)′′(x) =e′′(x)
e(x)−
[e′(x)
e(x)
]2.
For a positive f
[Gf ](x)
f(x).=(−r + r
2)
2|h(x)|2 + r(log f)′(x)σ · h(x)− (log f)′(x)κx
+log f ′′(x)
2|σ|2 +
[log f ′(x)]2
2|σ|2. (40)
41
Using formula (40), define[G(log f)
](x) =
[Gf ](x)
f(x).
Then we can solve [G(log e)
](x) = −ψ
for log e to compute the positive eigenfunction e.
These calculations allow us numerically to compute the largest and smallest Chernoff
entropies attained by members of the set Zo.
F Computing shock elasticities
We compute shock price elasticities in four steps:
i) Stochastic impulse response for log S. We solve the recursion:
d logS1t = −.01X1
t dt+ h1t · dWt − h∗t · h1tdt
dX1t = −κX1
t dt
dXt = −κXt + σ · dWt
h∗t = η∗0 + η∗1Xt
h1t = η∗1X1t
where X10 = σ · u and log S1
0 = −.01α · u + h∗(x) · u. The quadratic terms in the
evolution equation of logS make log S1t stochastic.
ii) Deterministic impulse response for logC. We solve the recursion:
d logC1t = .01X1
t dt
dX1t = −κX1
t dt,
where logC10 = (.01)α · u and X1
0 = σ · u. Thus,
logC1t =
.01
κ[1− exp (−κt)]σ · u+ (.01)α · u
42
iii) ComputeE (Mt logS
1t |X0 = x)
E (Mt|X0 = x)
where Mt = St
(Ct
C0
). Note that
d logMt = −δdt+ h∗t · dWt −1
2|h∗t |
2dt.
Let dWt have drift ht and compute expectations conditioned on X0 = x recursively:
d logS1t = −.01X1
t dt+ h1t · (htdt+ dWt)− h∗t · h1tdt
= −.01X1t dt+ h1t · dWt
dX1t = −κX1
t dt
h∗t = η∗0 + η∗1Xt
h1t = η∗1X1t ,
where X10 = σ · u and log S1
0 = −(.01)α · u+ h∗(x) · u. Thus,
E (Mt log S1t |X0 = x)
E (Mt|X0 = x)= −
.01
κ[1− exp (−κt)]σ · u− .01α · u+ h∗(x) · u.
iv) Construct elasticities:
(a) Shock-exposure elasticity for consumption:
E(
Ct
C0logC1
t |X0 = x)
E(
Ct
C0|X0 = x
) =.01
κ[1− exp(−κt)] σ · u+ .01α · u,
which is also the continuous time impulse response for logC.
(b) Shock-price elasticity
E(
Ct
C0logC1
t |X0 = x)
E(
Ct
C0|X0 = x
) −E [Mt (logS
1t + logC1
t ) |X0 = x]
E (Mt|X0 = x)= −
E (Mt log S1t |X0 = x)
E (Mt|X0 = x)
=.01
κ[1− exp(−κt)] σ · u+ .01α · u− h∗(x) · u.
43
which implements formula (32).
44
References
Anderson, Evan W., Lars Peter Hansen, and Thomas J. Sargent. 1998. Risk and Robustness
in Equilibrium. Available on webpages.
———. 2003. A Quartet of Semigroups for Model Specification, Robustness, Prices of Risk,
and Model Detection. Journal of the European Economic Association 1 (1):68–123.
Ang, Andrew and Monika Piazzesi. 2003. A No-Arbitrage Vector Autoregression of the
Term Structure Dynamics with Macroeconomic and Latent Variables. Journal of Mon-
etary Economics 50:745–787.
Bansal, Ravi and Amir Yaron. 2004. Risks for the Long Run: A Potential Resolution of
Asset Pricing Puzzles. Journal of Finance 59 (4):1481–1509.
Borovicka, Jaroslav, Lars Peter Hansen, Mark Hendricks, and Jose A. Scheinkman. 2011.
Risk-Price Dynamics. Journal of Financial Econometrics 9 (1):3–65.
Campbell, John Y. and John Cochrane. 1999. Force of Habit: A Consumption-Based Expla-
nation of Aggregate Stock Market Behavior. Journal of Political Economy 107 (2):205–
251.
Chen, Zengjing and Larry Epstein. 2002. Ambiguity, Risk, and Asset Returns in Continuous
Time. Econometrica 70:1403–1443.
Chernoff, Herman. 1952. A Measure of Asymptotic Efficiency for Tests of a Hypothesis
Based on the Sum of Observations. Annals of Mathematical Statistics 23 (4):pp. 493–507.
Donsker, Monroe E. and S. R. Srinivasa Varadhan. 1976. On the Principal Eigenvalue
of Second-Order Elliptic Differential Equations. Communications in Pure and Applied
Mathematics 29:595–621.
Duffie, Darrell and Rui Kan. 1994. Multi-Factor Term Structure Models. Philosophical
Transactions: Physical Sciences and Engineering 347 (1684):577–586.
Gilboa, Itzhak and David Schmeidler. 1989. Maxmin expected utility with non-unique
prior. Journal of Mathematical Economics 18 (2):141–153.
Hansen, Lars Peter. 2007. Beliefs, Doubts and Learning: Valuing Macroeconomic Risk.
American Economic Review 97 (2):1–30.
45
———. 2011. Dynamic Valuation Decomposition within Stochastic Economies. Economet-
rica 80 (3):911–967. Fisher-Schultz Lecture at the European Meetings of the Econometric
Society.
Hansen, Lars Peter and Thomas Sargent. 2010. Fragile beliefs and the price of uncertainty.
Quantitative Economics 1 (1):129–162.
Hansen, Lars Peter and Thomas J. Sargent. 2001. Robust Control and Model Uncertainty.
American Economic Review 91 (2):60–66.
———. 2008. Robustness. Princeton, New Jersey: Princeton University Press.
Hansen, Lars Peter, Thomas J. Sargent, and Jr. Tallarini, Thomas D. 1999. Robust Per-
manent Income and Pricing. The Review of Economic Studies 66 (4):873–907.
Hansen, Lars Peter, Thomas J. Sargent, Gauhar A. Turmuhambetova, and Noah Williams.
2006. Robust Control and Model Misspecification. Journal of Economic Theory
128 (1):45–90.
James, Matthew R. 1992. Asymptotic analysis of nonlinear stochastic risk-sensitive control
and differential games. Mathematics of Control, Signals and Systems 5 (4):401–417.
Newman, C. M. and B. W. Stuck. 1979. Chernoff Bounds for Discriminating between Two
Markov Processes. Stochastics 2 (1-4):139–153.
Petersen, I.R., M.R. James, and P. Dupuis. 2000. Minimax optimal control of stochastic
uncertain systems with relative entropy constraints. Automatic Control, IEEE Transac-
tions on 45 (3):398–412.
46
top related