sets of models and prices of uncertainty

Sets of Models and Prices of Uncertainty

Lars Peter Hansen Thomas J. Sargent∗

March 8, 2015

Abstract

A representative consumer expresses distrust of a baseline probability model by using

a convex set of martingales as likelihood ratios to represent other probability models.

The consumer constructs the set to include martingales that represent particular

parametric alternatives to the baseline model as well as others representing only

vaguely specified models statistically close to the baseline model. The representative

consumer’s max-min expected utility over that set gives rise to equilibrium prices

of model uncertainty expressed as worst-case distortions to the drifts in his baseline

model. We calibrate a quantitative example to aggregate US consumption data.

Key words: Risk, uncertainty, uncertainty prices, Chernoff entropy, robustness, shock

price elasticities, affine stochastic discount factor

∗We thank Scott Lee, Botao Wu, and especially Lloyd Han and Paul Ho for carrying out the computa-tions.

1 Introduction

Specifying a set of probability distributions is an essential part of applying the Gilboa and

Schmeidler (1989) max-min expected utility model. This paper proposes a new way to

imagine that a decision maker forms that set and provides an application to asset pricing.

When a representative investor describes risks with a set of probability models, uncer-

tainty premia augment prices of exposures to those risks. We describe how our method for

specifying that set affects prices of model uncertainty.

Our experiences as applied econometricians attract us to robust control theory. We

always regard our own quantitative models as approximations to better models that we had

not formulated. This is also the attitude of the robust decision maker modeled in Hansen

and Sargent (2001) and Hansen et al. (2006). The decision maker has a single baseline

probability model with a finite number of parameters. He wants to evaluate outcomes

under alternative models that are statistically difficult to distinguish from his baseline

model. He expresses distrust of his baseline model by surrounding it with an uncountable

number of alternative models, many of which have uncountable numbers of parameters. He

represents these alternative models by multiplying the baseline probabilities with likelihood

ratios whose entropies relative to the baseline model are less than a bound that expresses

the idea that alternative models are statistically close to the baseline model.

The decision theory presented in this paper retains the starting point of a single baseline

model but differs from Hansen et al. (2006) in how it forms the set surrounding the baseline

model. A new object appears: a quadratic function of a Markov state that defines alter-

native parametric models to be included within a set of models surrounding the baseline

model. The decision maker wants valuations that are robust to these models in addition

to other “vaguely specified” models expressed as before by multiplying the baseline model

by likelihood ratios. The quadratic function can be specified to include alternatives to the

baseline model including ones with fixed parameters, time varying parameters, and other

less structured forms of model uncertainty.

For asset pricing, a key object that emerges from the analysis in Hansen and Sargent

(2010) is a vector of worst-case drift distortions to the baseline model. The negative of

the drift distortion vector equals the vector of market prices of model uncertainty that

compensate the representative investor for bearing model uncertainty. The effects that our

new object – the quadratic function indexing particular alternative models – has on mar-

ket prices of uncertainty are all intermediated through this drift distortion. We show how

1

the quadratic function can produce drift distortions that imply stochastic discount factors

resembling ones attained by earlier authors under different assumptions about the sources

of risks. For example, models that posit that a representative consumer’s consumption

process has innovations with stochastic volatility introduce new risk exposures in the form

of the shocks to volatilities. Their presence induces time variation in equilibrium compen-

sations for exposures to shocks that include both the stochastic volatility shock as well as

the “original” shocks whose volatilities now move. By way of contrast, we introduce no

stochastic volatility and no new risks. Instead, we amplify the prices of exposures to the

“original” shocks. We induce fluctuations in those prices by modeling how the representa-

tive consumer struggles to confront his doubts about the baseline model. We extend these

insights to the analysis of uncertainty prices over alternative investment horizons.

Section 2 describes a representative consumer’s baseline probability model and martin-

gale perturbations to it. Section 3 describes two convex sets of martingales that perturb

the baseline model. Section 4 uses one of these sets to form a robust planning problem that

generates a worst-case model that we use to calibrate key parameters measuring the size

of a convex set of models. Section 5 constructs a recursive representation of a competitive

equilibrium. Then it links the worst-case model that emerges the robust planning problem

to competitive equilibrium compensations that the representative consumer earns for bear-

ing model uncertainty. This section also describes a term structure of these market prices

of uncertainty. By borrowing from Hansen and Sargent (2010), section 6 describes a quan-

titative version of a baseline model as well as a class of models that particularly concern the

robust consumer and the robust planner. Section 7 uses the quantitative model to compare

the set of models that concern both our robust planner and our representative consumer

with two other sets featured in Anderson et al. (2003) and Hansen and Sargent (2010),

one based on Chernoff entropy, the other on relative entropy. Section 8 offers concluding

remarks. Six appendices provide technical details.

2

2 The model

2.1 Mathematical framework

A representative consumer cares about a stochastic process for Y.= {Yt : t ≥ ∞} described

by the baseline model1

d log Yt = (.01) (µ+Xt) dt+ (.01)α · dWt

dXt = φdt− κXtdt+ σ · dWt, (1)

where W is a multivariate Brownian motion, X is a scalar process initialized at a random

variable X0 governed by probability distribution Q, and µ+Xt+.012|α|2 is the date t growth

rate of Y expressed as a percent. The quintet (µ, κ, φ, α, σ, Q) characterizes the baseline

model.2

Because he doesn’t trust the baseline model, the consumer also cares about Y under

probability models obtained by multiplying probabilities associated with the baseline model

(1) by likelihood ratios. We represent a likelihood ratio by a stochastic process Zh that is

a positive martingale with respect to the baseline model and that satisfies3

dZht = Zh

t ht · dWt, (2)

or

d logZht = ht · dWt −

1

2|ht|

2dt, (3)

where h is adapted to the filtration F = {Ft : t ≥ 0} associated with the Brownian motion

W and satisfies ∫ t

0

|hu|2du <∞ (4)

with probability one. Imposing the initial condition Zh0 = 1, we express the solution of

stochastic differential equation (2) as:

Zht = exp

(∫ t

0

ht · dWt −1

2

∫ t

0

|hu|2du

). (5)

1We let X denote the stochastic process, Xt the process at date t, and x a realized value of the state.2In earlier papers, we sometimes referred to what we now call the baseline model as the decision maker’s

approximating model or benchmark model.3James (1992), Chen and Epstein (2002), and Hansen et al. (2006) used this representation.

3

When we want to allow Z0 to be drawn from an unknown probability distribution, we take

a Borel measurable function g > 0 satisfying Eg(X0) = 1 and consider martingales equal

to

g(X0)Zht .

Let G denote the collection of all such functions g(x). A pair (g, h) represents a perturbation

of the baseline model (1).

Definition 2.1. Z denotes the set of all martingales g(X0)Zh constructed via representa-

tion (5) with some process h adapted to F = {Ft : t ≥ 0} and satisfying (4) and g ∈ G.

We use a martingale g(X0)Zh to construct an alternative probability distribution as

follows. Starting from the probability distribution associated with the baseline model (1),

we use h to represent another probability distribution conditioned on F0. To do this, think

of taking any Ft-measurable random variable Yt and multiplying it by Zht before computing

expectations conditioned on X0. Associated with h are probabilities defined implicitly by

Eh [Bt|F0] = E[Zh

t Bt|F0

]

for any t ≥ 0 and any bounded Ft-measurable random variable Bt. Similarly, we write

EgB0 = E [g(X0)B0]

for any bounded random variable B0 in the date zero information set F0.

Here the positive random variable Zht acts as a “Radon-Nikodym derivative” for the

date t conditional expectation operator Eh [·|X0]. The martingale property of the process

Zh ensures that the conditional expectations operators for different t’s are compatible in

the sense that they satisfy a Law of Iterated Expectations. The random variable g(X0)

acts as a “Radon-Nikodym derivative” for the date zero unconditional distribution vis a

vis a baseline probability distribution Q over the date zero state vector X0.

While under the baseline modelW is a standard Brownian motion, under the alternative

h model distribution this process has increments

dWt = htdt+ dW ht , (6)

where W h is a standard Brownian motion. While (3) expresses the evolution of logZh in

4

terms of increment dW , the evolution in terms of dW h is:

d logZht = ht · dW

ht +

1

2|ht|

2dt. (7)

In light of (7), we can write model (1) as:

d log Yt = (.01) (µ+Xt) dt + (.01)α · ht + (.01)α · dW ht

dXt = φdt− κXtdt+ σ · ht + σ · dW ht ,

which implies that Y has a (local) growth rate µ+Xt+α ·ht+.012|α|2 under the h model.4

2.2 Quantifying Probability Distortions

Discounted relative entropy quantifies how a (g, h) pair distorts baseline model probabilities.

We construct discounted relative entropy in two steps. First, we condition on X0 = 0 and

focus solely on h; second, we focus on misspecifications of Q.

i) Our first step is to compute

(Zh; x) =δ

∫ ∞

0

exp(−δt)E(Zh

t logZht

∣∣∣X0 = x)dt

=

(1

2

)∫ ∞

0

exp(−δt)E(Zh

t |ht|2∣∣∣X0 = x

)dt (8)

where the second equality follows from an application of integration by parts. We

write as a function of Zh instead of h because is convex in Zh. The discounted

entropy concept (8) quantifies how h distorts baseline probabilities.5

ii) Our second step applies when we don’t condition on X0. Suppose that Q is the

stationary probability distribution for X under the baseline model and that g is the

density used to alter Q. We average over the initial state via

(g;Zh) ≡

∫(Zh; x)g(x)Q(dx) +

∫g(x) log g(x)dQ (9)

which includes a relative entropy penalty for the initial density g.

4The growth rate includes a multiplication by 100 that offsets one of the .01’s.5Hansen et al. (2006) used the representation of discounted relative entropy that appears on the right

side of the first line of (8).

5

3 Convex sets of models

Two convex sets that surround the baseline model are designed to include parametric

probability models that a decision maker cares about. One set can readily be used for

robust control problems, but the other cannot. Nevertheless, the second set is useful

because it generalizes Chernoff (1952) entropy to a Markov environment and thereby has

an explicit statistical interpretation. We are interested in how these two convex sets are

related.

3.1 Alternative parametric models

The following parametric model nests baseline model (1) within a bigger class:

d logCt = .01 (µ+Xt) dt+ .01α · dW ht

dXt = φdt− κXtdt+ σ · dW ht , (10)

where W h is a Brownian motion and (6) continues to describe the relationship between the

processes W and W h. Here (µ, φ, κ) are parameters of the baseline model (1), (µ, φ, κ) are

parameters of model (10), and (α, σ) are parameters common to both models. We want to

use drift distortions h for W to represent models in a parametric class defined by (10). We

can express model (10) in terms of our section 2.1 structure by setting

ht = η(Xt) ≡ η0 + η1Xt

and using (1), (6), and (10) to deduce the following restrictions on η0 and η1:

[α′

σ′

]η0 =

[µ− µ

φ− φ

]

[α′

σ′

]η1 =

[0

κ− κ

]. (11)

By maintaining restrictions (11), we can find pairs (η0, η1) that represent members of a

class of models having the parametric form (10). Among other possibilities, we can set

Q to assign all of the probability to an initial state or we can set it equal to the station-

ary distribution implied by the baseline model, in which case we can construct g so that

g(x)dQ(x) is the stationary distribution under the alternative model implied by (η0, η1).

6

We consider restrictions on the martingales Zh pertinent for modeling probabilities

conditioned on X0.

Definition 3.1. Z+ is the set of martingales g(X0)Zh constructed by (i) selecting a pair

(η0, η1) that satisfies (11), (ii) pinning down an associated ht = η0+η1Xt, (iiii) constructing

an implied martingale Zh via (2), and (iv) selecting g ∈ G.

We can further restrict a family of models by using a nonnegative quadratic function of x

ξ(x).= ξ0 + 2ξ1x+ ξ2x

2 ≥ 0 (12)

to express a collection of alternatives to a baseline model. For example, to induce ξ to

capture a prespecified κ, form

ξ(x).= ξ0 + 2ξ1x+ ξ2x

2 =1

|σ|2(κ− κ)2 x2.

This choice of ξ makes both κ = κ and κ = 2κ− κ be alternative parameter configurations

that are among the models to be included in a convex set of Z’s. More generally, we can

select (µ, φ, κ) and compute η1 and η1 by solving the counterpart to (11). Then

ξ(x) = |η0 + η1x|2.

We again pick up additional parametric models by casting our restrictions in terms of the

quadratic function ξ.

Definition 3.2.

Zo ={Zh ∈ Z+ : |ht|

2 ≤ ξ(Xt)}

for all t with probability one.

Next we construct a larger set of martingales that contains Zo but allows departures

from the parametric structure (10).

Definition 3.3.

Z ={Zh ∈ Z : |ht|

2 ≤ ξ(Xt)}

(13)

for all t with probability one.

7

Zh’s in Zo represent models with time-invariant parameters; Zh’s in Z can represent

models whose parameters vary over time.6

3.2 Appropriate restrictions on the h process

We want to construct a set of models by using a measure of statistical discrepancy from

a baseline model. For that purpose, we shall replace the instant-by-instant inequality in

the definition (13) of Z with restrictions cast in terms of probability weighted averages

of |ht|2. These restrictions allow the trade-offs among intertemporal dimensions of model

comprehended by statistical model discrimination criteria.7

3.3 Z∗

We construct our first convex set of martingales Zh by starting with a drift distortion h

that represents a particular alternative parametric model created along lines described in

section 3.1. We use the following functional of a process Zh:

ǫ(Zh; |h|2, x

)=δ

∫ ∞

0

exp(−δu)E[Zh

u

(logZh

u

)du|X0 = x

]

−

∫ ∞

0

exp(−δu)E[Zh

u | hu | du|X0 = x]

=1

2

∫ ∞

0

exp(−δu)Eh[(

|hu|2 − |hu|

2)|X0 = x

]. (14)

Three important features of ǫ are:

i) ǫ is convex in Zh;

ii) ǫ can readily be computed under the h model;

iii) ǫ depends on h only through the scalar process |h|2.

6This is accomplished by requiring Zh to belong to Z rather than Z+.7In constructing Zo and Z, we impose the instant-by-instant constraint on ht described in (13). But

some models that don’t satisfy such an instant-by-instant constraint are equally difficult to distinguishstatistically from the baseline model. The ambiguity averse decision maker of Chen and Epstein (2002)considers a set of models characterized by martingales that are generated by h processes that satisfyinstant-by-instant constraints on h. Anderson et al. (1998) explored consequences of this type of constraintwithout the state dependence.

8

Let

|h|2 = ξ(x)

and introduce a positive number ρ.

Definition 3.4.

Z∗ =

{g(X0)Z

h ∈ Z :

∫ǫ[Zh; ξ(X), x

]g(x)Q(dx) +

∫g(x) log g(x)Q(x)− ρ ≤ 0

}.

(15)

Z∗ includes martingales in Z that are associated with the parametric probability models

of Section 3.1. In light of feature i) ǫ, the set Z∗ is is convex in g(X0)Zh and necessarily

contains Z h and Zh = 1. Feature ii) makes it tractable to use Z∗ to pose a recursive robust

decision problem. Feature iii) provides a convenient way to use {ξ} to include parametric

models like those discussed in Section 3.1 within Z∗.

In section 5, we pose a robust decision problem in which Z∗ serves as a family of positive

martingales. Evidently, both the baseline model (1) and the alternative models captured

by the quadratic function ξ(X) play important roles in shaping the set Z∗. These models

also shape our next convex set, which considers the entropy of a Z ∈ Z∗ relative to the

baseline model (1).

3.4 Z

While the drift distortions h help shape Z∗, they don’t influence a set of martingales Z

that emerges from studying how Brownian motions disguise probability distortions of a

baseline model, making them difficult to distinguish statistically. To construct Z we use

Chernoff (1952) entropy, which differs from discounted relative entropy. Although Chernoff

entropy’s connection to a specific statistical decision problem makes it attractive, it has the

disadvantage that it is less tractable than relative entropy for the types of robust decision

problems that interest us.

In the spirit of Anderson et al. (2003), we use Chernoff (1952) entropy to measure

a distortion Z to a baseline model. Think of a pairwise model selection problem that

statistically compares the baseline model (1) with a model generated by the martingale

Zh. The logarithm of the martingale evolves according to

d logZht = −

1

2|ht|

2dt+ ht · dWt.

9

Consider a statistical model selection rule based on a data history of length t that takes the

form logZht ≥ log τ , where Zh

t is the likelihood ratio associated with the alternative model

for a sample size t. To construct a bound on the probability that this model selection rule

incorrectly chooses the alternative model when the baseline model governs the data, we use

an argument from large deviations theory that starts from the inequality

1{logZht≥τ} = 1{−rτ+r logZh

t≥0} = 1{exp(−rτ)(Zh

t)r≥1} ≤ exp(−rτ)(Zh

t )r,

which holds for 0 ≤ r ≤ 1. The expectation of the term on the left side equals the

probability of mistakenly selecting the alternative model when the data are a sample of

size t generated by the baseline model. We bound this mistake probability for large t by

following Donsker and Varadhan (1976) and Newman and Stuck (1979) and studying

lim supt→∞

1

tlogE

[exp(−rτ)

(Zh

t

)r]= lim sup

t→∞

1

tlogE

[(Zh

t

)r]

for alternative choices of r. The threshold τ does not affect this limit. Furthermore, the

limit is often independent of the initial state X0 = x. To get the best bound, we compute

inf0≤r≤1

lim supt→∞

1

tlogE

[(Zh

t

)r],

a limit that is typically negative because mistake probabilities decay with sample size. A

measure of Chernoff entropy is then

χ(Zh, x) = − inf0≤r≤1

lim supt→∞

1

tlogE

[(Zh

t

)r]. (16)

Appendix E describes how to compute Chernoff entropy.

To help interpret χ(Zh, x), consider the following argument. If the actual decay rate

of mistake probabilities were constant, then mistake probabilities for two sample sizes

Ti, i = 1, 2, would be

mistake probabilityi = .5 exp(−Tiψ∗)

for ψ∗ = χ(Zh, x). We define a ‘half-life’ as an increase in the sample size T2 − T1 > 0 that

multiplies the mistake probability by a factor of .5:

.5 =mistake probability2mistake probability1

=exp(−T2ψ

∗)

exp(−T1ψ∗).

10

So the half-life is approximately

T2 − T1 =− log .5

ψ∗. (17)

The preceding back-of-the-envelope calculation justifies the detection error bound com-

puted by Anderson et al. (2003). The bound on the decay rate should be interpreted

cautiously because, while it is constant, the actual decay rate is not. Furthermore, the pair-

wise comparison oversimplifies the challenge truly facing a robust decision maker, which is

statistically to discriminate among multiple models.

We could conduct a symmetrical calculation that reverses the roles of the two models,

so that the h model with martingale Zh becomes the model on which we condition. It is

straightforward to show that the limiting rate remains the same. Thus, when we select a

model by comparing a log likelihood ratio to a constant threshold, the two types of mistakes

share the same asymptotic decay rate.

Our second convex set is a ball formed using Chernoff entropy (16).

Definition 3.5.

Z ={Zh ∈ Z : χ(Zh; x) ≤ χ

}. (18)

The radius χ of the ball can be adjusted to attain a specified half-life.

4 Calibrating θ and γ

In subsections 4.1 and 4.2, we formulate a robust planning problem for an economy with

a representative consumer having an instantaneous utility function that is logarithmic in

consumption. Associated with the worst-case probability from the robust planning problem

is a greatest lower bound of expected discounted utility over the family Z∗ of alternative

probability distributions. In subsections 4.3, 4.4, and 4.5, we represent the worst-case

probability as a drift distortion to the multivariate Brownian motion in the baseline model

(1). We then use that drift distortion to guide the calibration of the parameters θ and γ

that pin down the size of the set Z∗. In section 5, we show how that same worst-case drift

distortion appears in a recursive representation of competitive equilibrium prices for an

economy with a representative investor. We deduce uncertainty prices and connect them

to the worst-case drift distortion from our robust planning problem.

11

4.1 Robust planner’s problem

Consider a consumption process C∗ = Y . Guess a value function υ(x, θ, γ) + log Y . We

use a family martingales in the set Z∗ to represent alternative probabilities. We take the

function ξ to be

ξ(x) = γξ(x),

where for the moment γ is an arbitrary parameter and ξ is pre-specified. Eventually, we will

allow ξ to be specified a priori up to a scale determined by a scalar γ that we’ll calibrate

by imposing a model detection probability half-life defined in terms of Chernoff entropy.

Let θ be a multiplier on the constraint:

∫ǫ[Zh; ξ(X), x

]g(x)Q(dx) +

∫g(x) log g(x)Q(x)− ρ ≤ 0 (19)

In subsection 4.2, for a given (θ, γ) we construct a recursive representation of a worst-case

drift distortion.

4.2 Recursive representation of worst-case drift distortion

Given (γ, θ), we compute h by solving HJB equation

0 = minh

− δυ(x, θ, γ) + (.01)(µ+ x)− υ′(x, θ)κx+1

2|σ|2υ′′(x, θ, γ)

+ (.01)α · h + υ′(x, θ, γ)σ · h+θ

2|h|2 −

θγ

2ξ(x). (20)

The solution υ of HJB equation (20) is quadratic in x:

υ (x, θ, γ) = −1

2

[υ2(θ, γ)x

2 + 2υ1(θ, γ)x+ υ0(θ, γ)],

which implies that the minimizing h is linear in x:

h∗(x, θ, γ) = −1

θ[.01α− συ2(θ, γ)x− συ1(θ, γ)] . (21)

4.3 Determining θ

To set θ, we must decide how to weight the initial state. Previous research by Petersen et al.

(2000) and Hansen et al. (2006) set Q to be a mass point over single value of x and thereby

12

effectively conditioned on the initial state. Here we suggest an alternative approach.

Under the baseline model, X has a stationary distribution Q having density q with

mean φ/κ and variance |σ|2/(2κ). Consider any g ∈ G. Introduce a nonnegative parameter

ρ that we temporarily take as a fixed number. To make a conservative adjustment of the

probability measure over the initial state, compute g given (θ, γ) by solving

ming∈G

∫υ (x, θ, γ) g(x)Q(dx) + θ

[∫log g(x)g(x)Q(dx)− ρ

]. (22)

The minimizing g is an exponentially tilted density

g(x, θ, γ) ∝ exp

[−1

θυ (x, θ, γ)

](23)

that is evidently normal with precision

ω(θ, γ) =2κ

|σ|2−

1

θυ2(θ, γ)

and mean

ν(θ, γ) =1

ω

[2φ

|σ|2+

1

θυ1(θ, γ)

].

To compute θ, we substitute g into (22) to obtain the maximand in the following

problem:

maxθ>0

−θ log

[∫exp

(−1

θυ(x, θ, γ)

)Q(dx)

]− θρ.

Let θ(ρ, γ) denote the maximizing θ, which depends on the value of ρ. If we set ρ and γ

defining the set of alternative models, we know θ and through representation (21) have a

complete characterization of the worst-case model. In the next subsection we describe how

to use a half-life for a model detection statistic to determine γ. This will tell us how to

scale γ for a given value of ρ. We could specify ρ a priori. But in the next subsection, we

suggest an alternative approach.

13

4.4 Specifying ρ

Instead of setting ρ a priori, we set it to solve:

∫g[x, θ(ρ, γ), γ] log g[x, θ(ρ, γ), γ]Q(dx) = ρ. (24)

Let ρ∗ denote the resulting ρ and let θ∗(γ) = θ(ρ∗, γ). Recall the relative entropy concept

(g;Zh) from equation (9)

(g;Zh) ≡

∫g(x)(Zh; x)Q(dx) +

∫g(x) log g(x)Q(dx),

where

(Zh; x) =1

2

∫ ∞

0

exp(−δu)Eh[|hu|

2|X0 = x].

The relative entropy measure includes an adjustment for distorting the initial distribution.

By imposing (24), we have set ρ∗ exactly to offset the term∫g(x) log g(x)Q(dx) when

evaluating the constraint (19) at the minimizing choice of g and Z.

4.5 Setting γ via Chernoff entropy

We refine a suggestion of Anderson et al. (1998). Compute h∗ [x, θ∗(γ), γ] and evaluate the

associated Chernoff entropy. Then adjust γ to match a target half-life.8 A larger value of

γ should lead to a smaller half-life. Call the resulting γ, γ∗ and let

g∗(x) = g [x, θ∗(γ∗), γ∗]

5 Robust Portfolio Choice and Pricing

In this section, we describe equilibrium prices that reconcile a representative consumer to

bearing risks presented by the environment described by baseline model (1) in light of his

concerns about model misspecification as expressed with the set Z∗. We construct equilib-

rium prices by appropriately extracting shadow prices from the robust planning problem of

subsection 4.1. We decompose risk prices into separate equilibrium compensations for bear-

ing risk and for bearing model uncertainty. We also describe an equilibrium term structure

8We expect but haven’t proved that the half-life is monotone in γ.

14

of compensations for bearing model uncertainty. We begin by posing the representative

consumer’s portfolio choice problem.

5.1 Robust representative consumer portfolio problem

A representative consumer faces a continuous time Merton portfolio problem in which

individual wealth K evolves as

dKt = −Ctdt+ (.01)Ktι(Xt)dt+KtAt · dWt + (.01)Ktπ(Xt) · Atdt, (25)

where At = a is a vector of chosen risk exposures, ι(x) is the instantaneous risk free rate

expressed as a percent, and π(x) is the vector of risk prices evaluated at state Xt = x.

Initial wealth is K0. The investor has discounted logarithmic preferences but distrusts his

probability model.

Key inputs to a representative consumer’s robust portfolio problem are the baseline

model (1), the wealth evolution equation (25), the vector of risk prices π(x), and the

quadratic function ξ in (12) that defines the alternative explicit models that concern the

representative consumer. As in the robust planners problem analyzed in section 4.1, let θ

be a penalty parameter or Lagrange multiplier on the constraint (19). For the recursive

competitive equilibrium, we take (θ, γ) as given. We described how we calibrate these

parameters in section 4.

Under the guess that the value function takes the form ψ(x, θ, γ) + log k + log δ, the

HJB equation for the robust portfolio allocation problem is

0 = maxa,c

minh

−δψ(x, θ, γ)− δ log k − δ log δ + δ log c−c

k+ (.01)ι(x)

+ (.01)π(x) · a+ a · h−|a|2

2+(φ− κx

)ψ′(x, θ, γ) + h · σψ′(x, θ, γ)

+|σ|2

2ψ′′(x, θ, γ) + θ

[|h|2

2−γ

2

(ξ2x

2 + 2ξ1x+ ξ0

)]. (26)

First-order conditions for consumption are

δ

c∗=

1

k,

which implies that c∗ = δk, an implication that flows partly from the representative con-

sumer’s unitary elasticity of intertemporal substitution. First-order conditions for a and h

15

are

(.01)π(x) + h∗(x, θ, γ)− a∗(x, θ, γ) = 0 (27a)

a∗(x, θ, γ) + θh∗(x, θ, γ) + ψ′(x, θ, γ)σ = 0. (27b)

Here we appeal to arguments like those in Hansen and Sargent (2008, ch. 7) to justify

stacking first-order conditions and not worrying about “who goes first” in the two-person

zero-sum game.9

5.2 Competitive equilibrium prices

We show here that the drift distortion h∗ that emerges from the robust planner’s problem of

subsection 4.1 determines prices that a competitive equilibrium awards for bearing model

uncertainty. To compute a vector π(x) of competitive equilibrium vector of risk prices, we

find a robust planner’s marginal valuation of exposure to the W shocks. We decompose

that price vector into separate compensations for bearing risk and for accepting model

uncertainty.

Noting from the robust planner’s problem that the shock exposure vectors for logK

and log Y must coincide implies

a∗ = (.01)α.

Thus, from (27a), π = π∗, where

π∗(x) = α− 100h∗(x, θ). (28)

Similarly, in the problem for a representative consumer within a competitive equilibrium,

the drifts for logK and log Y must coincide:

−δ + (.01)ι(x) + (.01)[(.01)α− h∗(x)] · α−.0001

2α · α = (.01)(µ+ x),

so that ι = ι∗, where

ι∗(x) = 100δ + (µ+ x) + α · h∗(x, θ)−.01

2α · α. (29)

9If we were to use a timing protocol that allows the maximizing player to take account of the impactof its decisions on the minimizing agent, we would obtain the same equilibrium decision rules as thosedescribed in the text.

16

We can use these formulas for equilibrium prices to construct a solution to the HJB equation

of a representative consumer in a competitive equilibrium by letting υ = φ and g∗ = (.01)α.

5.3 Reinterpreting the worst-case portfolio problem

As described by Hansen et al. (2006), there is an ordinary (i.e., non-robust) portfolio

selection problem that confronts a representative consumer who has a completely trusted

model for the exogenous state dynamics and whose decision rule matches the decision rule

that attains the value function ψ associated with the HJB equation (26) for the robust

representative consumer portfolio problem. This consumer’s completely trusted model

of course differs from the baseline model (1). Hansen et al. (2006) call this an ex post

problem because it comes from exchanging orders of maximization and minimization in

the two-person zero-sum game that gives rise to the robust portfolio choice rule. This ex

post portfolio problem is a special case of a Merton problem with a state evolution that is

distorted relative to the baseline model. The distorted evolution imputes to the process W

a drift h∗ so that

dWt = h∗(Xt)dt+ dW ∗t

= (η∗0 + η∗1Xt) dt+ dW ∗t ,

whereW ∗ is a multivariate standard Brownian motion under the h∗ probability distribution.

Thus,

d logYt = .01(µ+Xt)dt+ (.01)h∗(Xt)dt+ (.01)α · dW ∗t

dXt = −κXtdt+ σ · h∗(Xt)dt+ σ · dW ∗t .

A value function ψ + log a satisfies the HJB equation

0 = maxa,c

− δψ(x)− δ log k + δ log c−c

k+ (.01)ι(x) + (.01)π(x) · a+ a · (η0 + η∗1x)−

|a|2

2

[−κx+ η∗0 + η∗1x] ψ′(x) +

|σ|2

2ψ′′(x).

First-order conditions are

δ

c∗−

1

k= 0

17

(.01)π(x) + η∗0 + η∗1x− a∗ = 0,

which lead to decision rules c∗ = δk and

a∗ = (.01)π(x) + η∗0 + η∗1x.

Because the exposure and drift for logK and log Y should coincide in equilibrium, it follows

that

−δ + (.01)ι(x) + .01α · η∗(x) + .0001α · α−.0001

2π · α = (.01) [µ+ x+ α · h∗(x)] .

Thus, the ordinary decision rules that solve the ex post portfolio problem imply the same

equilibrium prices as the robust portfolio problem, so that π = π∗ and ι = ι∗, as given by

(28) and (29), respectively.

5.4 Term structure of uncertainty prices

We now study how competitive equilibrium uncertainty prices vary over an investment hori-

zon by computing a pricing counterpart to an impulse response function. Our continuous-

time formulation means that the pertinent shock that occurs during the “next instant” is

an incremental change that will have incremental effects on prices across all future time

periods. An asset exposed to these shocks earns compensations that depend on the horizon.

Shock-price elasticities are state dependent because they vary with the growth state. In

this section, we compute the elasticities and produce what we regard as a dynamic value

decomposition. We present a quantitative example in section 7.3.10

5.4.1 Local uncertainty prices

The equilibrium stochastic discount factor process for our robust representative consumer

economy is

d logSt = −δdt− .01 (µ+Xt) dt− .01α · dWt + h∗t · dWt −1

2|h∗t |

2dt. (30)

The stochastic discount factor has a linear local mean and a quadratic local variance.

The exponential-quadratic formulation has been used extensively in empirical asset pricing

10Hansen (2011) presents an overview of such shock elasticities.

18

applications. Duffie and Kan (1994) described term structures of interest rates implied by

models with affine stochastic discount factors. Ang and Piazzesi (2003) estimated a term

structure model with an affine stochastic discount factor process driven by macroeconomic

variables.

The entries of the vector, π∗(Xt) given by (28), which equal minus the local exposures to

the Brownian shocks, are usually interpreted as local “risk prices,” but we shall reinterpret

them. Motivated by the decomposition

minus stochastic discount factor exposure = .01α −h∗t ,

risk price uncertainty price

we prefer to think of .01α as risk prices induced by the curvature of log utility and −h∗t

as “uncertainty” prices induced by a representative investor’s doubts about the baseline

model. Here h∗t = η∗0 + η∗1x, as described in equation (21). When η∗1 = 0, h∗t is constant;

but when η∗1 differs from zero, the uncertainty prices −h∗t = −h∗(Xt) are time varying and

depend linearly on the growth state Xt. When the dependence of h∗ on x is positive, these

uncertainty prices are higher in bad times than in good times. Countercyclical uncertainty

prices emerge endogenously from a baseline model that excludes stochastic volatility in the

underlying consumption risk as an exogenous input. Such fluctuations emerge endogenously

from a baseline model that excludes stochastic volatility in the underlying consumption risk

as an exogenous input. Stochastic volatility models introduce new risks to be priced while

also inducing fluctuations in the prices of the “original” risks. The mechanism in this

paper simultaneously enhances and induces fluctuations in the uncertainty prices, but it

introduces no new sources of risk. Instead, it focuses on the impact of uncertainty about

the implications of those risks.

Following Borovicka et al. (2011), we assign horizon-dependent uncertainty prices to risk

exposures. To represent shock price elasticities, we study the dependence of logarithms of

expected returns on an investment horizon. The logarithm of the expected return from a

consumption payoff at date t is

logE

(Ct

C0

∣∣∣∣∣X0 = x

)− logE

[St

(Ct

C0

) ∣∣∣∣∣X0 = x

]. (31)

The first term captures the expected payoff and the second the cost of the payoff. A shock

in the next instance affects the consumption and the stochastic discount factor processes.

19

In continuous time, this leads formally to what is called a Malliavin derivative. There is

one such derivative for each Brownian increment. The date zero shock influences both the

expected asset payoff at date t (aggregate consumption in this case) and the cost of an asset

with this payoff. Its impact on the logarithm of the expected return is the price elasticity

and its impact on the logarithm of the expected payoff is the exposure elasticity.

Consider initially the expected payoff term on the left side of (31). Let D0Ct denote

the derivative vector of Ct with respect to dW0. The familiar formula for a derivative of a

logarithm applies so that

D0Ct = CtD0 logCt.

A contribution to the elasticity vector for horizon t is

E[Ct

C0

D0 logCt|X0 = x]

E[Ct

C0

|X0 = x] .

There is a distinct elasticity for each shock. Since logC has linear dynamics, D0 logCt can

be shown to be the same as the vector of impulse responses of logCt to shocks at date zero,

which does not depend on the Markov state. We call this an exposure elasticity.

Consider next the cost term on the right-hand side of (31). For the productM = SC, a

calculation analogous to the preceding one confirms that the contribution of the cost term

to the elasticity is

E[Mt

M0D0 logMt|X0 = x

]

E[Mt

M0|X0 = x

] .

The dynamic evolution of the stochastic discount factor is not linear in the state variable,

and as a result

D0 logMt = D0 log St +D0 logCt

is no longer a deterministic function of time.

The shock price elasticity combines these calculations:

E[Ct

C0

D0 logCt|X0 = x]

E[Ct

C0|X0 = x

] −E[Mt

M0

D0 logMt|X0 = x]

E[Mt

M0|X0 = x

] = −E[Mt

M0

D0 logSt|X0 = x]

E[Mt

M0|X0 = x

] . (32)

This is a valuation analogue of the impulse response function routinely estimated by em-

20

pirical macroeconomists. As we vary t ≥ 0, we trace out a trajectory of uncertainty price

elasticities for each shock. We give nearly analytical formulas for these in Appendix F.

6 A quantitative example

For a laboratory, we use our baseline model (1) evaluated at the following maximum like-

lihood estimates computed by Hansen and Sargent (2010):11

µ = .465,

α =

[.468

0

]

φ = 0

κ = .185

σ =

[0

.149

](33)

We consider Chernoff entropy balls associated with half-lives of 40 quarters, 80 quarters,

and 120 quarters. For comparison, we include a model with no concern for robustness, which

is equivalent to a half-life equal to infinity. We consider two specifications of ξ:

ξ(x) = x2

ξ(x) = 1.

Tables 1 and 2 report worst case models that emerge from the robust planner’s HJB

equation (20) for the specifications ξ(x) = x2 and ξ(x) = 1, respectively. The worst-case

models endow X with a negative mean given by φ

κ. This in turn shifts the implied growth

in consumption. Since the worst case model also includes a change in µ, we report the

composite outcome µ + φ

κ. As we reduce the half-life, the worst-case model makes the

constant parameter adjustment smaller. When ξ = x2, we also increase the persistence in

the growth-rate process. Notice that while the minimizing agent could choose to reduce

persistence even more, he instead chooses to allocate some of the entropy distortion to the

11The estimates are for Y being consumption of nondurables and services for aggregate U.S. data overthe period 1948II to 2009 IV.

21

constant terms.

Consistent with findings of Anderson et al. (2003) and Hansen and Sargent (2010), when

ξ(x) = 1, there is no change in the persistence parameter κ. The worst-case model targets

the constant terms in the consumption evolution and the state evolution. The worst-case

analysis reduces to a determination of how much to distort the respective constant terms

only.

Half-Life µ φ κ µ+ (φ/κ)

∞ 0.4650 0 0.1850 0.4650

120 0.4270 -0.0255 0.1491 0.2562

80 0.4239 -0.0301 0.1361 0.2024

40 0.4228 -0.0391 0.1073 0.0579

Table 1: Worst-case parameter values for ξ(x) = x2 associated with HJB equation (20).


∞ 0.4650 0 0.1850 0.4650

120 0.4140 -0.0276 0.1850 0.2648

80 0.4026 -0.0338 0.1850 0.2198

40 0.3768 -0.0478 0.1850 0.1182

Table 2: Worst-case parameter values for ξ(x) = 1 associated with HJB equation (20).

Corresponding to each of our two specifications of ξ(x), figures 1 and 2 plot the expected

consumption growth over horizon t. The expected growth rate at t is:

[1 0

]exp

([0 1

0 −κ

]t

)[µ

φ

]= µ+

φ

κ−φ

κexp(−κt).

Integrating this growth rate over an interval [0, t] gives a worst-case trend for log consump-

tion: (µ+

φ

κ

)t+

φ

κ2exp(−κt)−

φ

κ2(34)

Notice that the initial growth trend growth rate is µ and that the eventual growth rate is

µ+ φ

κ. In this calculation, we impose the distorted model starting at date zero and consider

22

its implications going forward. The shift in the constant term for the evolution of X has

no immediate impact on the growth of logC. Its eventual impact is determined in part by

the persistence parameter κ. We applied formula (34) to alternative models including the

worst-case models to compute the long-run drifts reported in Figures 1 and 2.

Figure 1: Long-run drift of logCt for the three target half-lives when ξ = x2.

23

Figure 2: Long-run drift of logCt for the three target half-lives when ξ = 1.

Next we consider the distributional impacts. The new information about logCt− logC0

(scaled by 100) is:

∫ t

0

∫ u

0

exp(−κv)σ · dBu−vdu+

∫ t

0

α · dBu =

∫ t

0

∫ u

0

exp[−κ(u − r)]σ · dBrdu+

∫ t

0

α · dBu

=

∫ t

0

∫ t

r

exp[−κ(u− r)]duσ · dBr +

∫ t

0

α · dBu

=1

κ

∫ t

0

exp(κr) [exp(−κr)− exp(−κt)] σ · dBr

+

∫ t

0

α · dBu

=1

κ

∫ t

0

[1− exp[−κ(t− r)]]σ · dBr +

∫ t

0

α · dBu

The variance is .0001 times the following object

1

κ2

∫ t

0

[1− 2 exp[−κ(t− r)] + exp[−2κ(t− r)]|σ|2dr + |α|2t

=1

κ2|σ|2t+ |α|2t−

2

κ3[1− exp(−tκ)]|σ|2 +

1

2κ3[1− exp(−2κt)]|σ|2,

24

where we have used the fact that α · σ = 0. Using this calculation, figures 3 and 4

display the interdecile ranges of the distribution for consumption growth over alternative

horizons. These figures depict deciles for both the baseline model and the worst-case models

associated with a half-life of 80. The region between the deciles illustrates a component

of risk in the consumption distribution. The variation across the baseline and worst-case

models reflects a broader notion of uncertainty driven by skepticism about the baseline

model. The upper decile of the worst-case model overlaps the lower decile of the baseline

model in both figures. The interdecile range is somewhat larger when ξ is quadratic in x

than when ξ is constant.

Figure 3: Expected values of logCt scaled by 100 for the baseline model and for a half-life

of 80 when ξ = x2. The shaded black and red areas show the .1 and .9 interdecile ranges

under the baseline model and the worst-case model for a half-life of 80. The black line is

the mean growth for the baseline model, and the red circle line is the mean growth for the

worst-case model.

25

Figure 4: Expected values of logCt scaled by 100 for the baseline model and for a half-life

of 80 when ξ = 1. The shaded black and red areas show the .1 and .9 interdecile ranges

under the the baseline and the worst-case model for a half-life of 80. The black line is the

mean growth for the baseline model, and the red circle line is the mean growth for the

worst-case model.

7 Comparing sets of models

We have used the set of models

Z∗ =

{Zh :

∫ǫ(Zh; ξ, x

)g∗(x)Q(dx) ≤ 0

}

to describe our robust representative consumer’s concerns about misspecification of his

baseline model 1. In this section, we compare Z∗ to a corresponding Chernoff entropy ball

Z and also to another set called an entropy ball that we now describe.

26

7.1 Entropy ball

Anderson et al. (2003) and Hansen and Sargent (2010) focused primarily on entropy balls.

Here we are interested in constructing a new set Z that we define as the smallest entropy

ball that contains Z∗. An entropy ball is a family of Zhs that satisfy12

(Zh) =

∫ [∫ ∞

0

exp(−δt)E(Zh

t logZht |X0 = x

)]g∗(x)Q(dx) ≤

1

2δξ (35)

for some constant ξ > 0. By constructing an entropy ball that contains Z∗, we compute

how large relative entropy can be for martingales in the set Z∗.

To determine this magnitude, we take the quadratic function ξ(x) and pose a maximum

problem that starts from the observation that a martingale Zh that is biggest in terms of

its relative entropy satisfies the constraint in definition 3.4 at equality:

∫ǫ[Zh; ξ(Xt), x

]q∗(x)Q(dx) = 0.

This leads us to maximize the discounted objective

∫ (E

[∫ ∞

0

exp(−δt)Zht ξ(Xt)dt|X0 = x

])g∗(x)Q(dx)

subject to the constraint:

∫ǫ(Zh; ξ, x

)g∗(x)Q(dx) ≤ 0.

Associated with this optimization problem is the HJB equation:

0 = maxh

− δυ(x, ϑ) +1

2ξ(x)− υ′(x, ϑ)κx+

1

2|σ|2υ′′(x, ϑ)

+ υ′(x, ϑ)σ · h−ϑ

2|h|2 +

ϑ

2ξ(x), (36)

where we can regard ϑ as a Kuhn-Tucker multiplier on a relative entropy constraint. Be-

12We use the term ball loosely because typically a ball in mathematics is defined using a metric. Althoughrelative entropy quantifies statistical discrepancy, it is not a metric because it depends on which of twomodels is taken as the baseline model.

27

cause the maximizing h takes the form

h = η0(ϑ) + η1(ϑ)X,

it is one of the models in the parametric class from section 3.1. To compute ξ in constraint

(35), we solve

ξ = 2δminϑ≥0

∫υ(x, ϑ)g∗(x)Q(dx).

Let

Z.=

{Zh :

∫(Zh; x

)g∗(x)Q(dx) ≤

1

2δξ

}

By construction Z∗ ⊂ Z.

7.2 Comparing sets

We compare intersections of Zo with each of the three sets Z∗, Z, and Z. While it is

tractable to use the sets Z∗ and Z to formulate robust decision problems, these sets are

not directly linked to statistical discrimination problems. The set Z is closely linked to

“statistical discrimination,” but for forming robust decision problems it is not as tractable

as the other two. It would be comforting if Z∗ were closely to approximate Z, at least in

regions near the worst-case model that emerges from the robust planner’s HJB equation

(20).

We compute and report the projection(Z ∩ Zo

)of the Chernoff ball on Zo for three

half-lives in figure 6. We represent these projections using the three parameter values that

characterize Zo. For comparison, we also report (Z∗ ∩ Zo). The sets are distinct, but

the big differences occur for larger values of κ at which the Chernoff ball contains points

not included in Z∗. Such large values of κ turn out not the be the ones that the robust

planner most fears. Overall, the regions are closer for longer specifications of the half-life

of Chernoff entropy.

28

Figure 5: Projections of Sets I and II onto three-parameter axis. From Top to Bottom:

Target half-lives 120, 80, and 40, respectively. Left: ξ = x2. Right: ξ = 1. (Z∗∩Zo) shown

in blue mesh. (Z ∩ Zo) shown in yellow. The solution to the robust planner’s problem is

shown as the red point.

29


∞ 0.4650 0 0.1850 0.465069.78 0.3982 -0.0362 0.1850 0.202442.19 0.3791 -0.0466 0.1850 0.127316.65 0.3282 -0.0742 0.1850 -0.0726

Table 3: Worst-case Parameter Values for ξ(x) = x2 under the set (Z ∩ Zo).

In figure 6, we compare the entropy ball (Z ∩ Zo) projection to both (Z∗ ∩ Zo) and

(Z ∩ Zo) when ξ(x) = x2. The left side of this plot shows how large an entropy ball

would have to be to contain the set used in the robust planning problem affiliated with

HJB equation (20). The right side compares the Chernoff ball to this entropy ball. As is

evident from this figure, the resulting entropy ball is much larger. When we solve the robust

planner’s problem with this constructed ball, we reduce the implied half lives, as reported

in Table 7.2, decline from 120, 80 and 40 to 70, 42 and 17, respectively. The constant terms

for both the consumption equation (the first equation of (10)) and the consumption growth

equation (the second equation of (10)) are reduced while the autoregressive parameter is

not altered in comparison to Table 6.

30

Figure 6: Comparing entropy balls to sets I and II when ξ = x2. From Top to bottom:

half-lives 120, 80, and 40, respectively. (Z∗ ∩ Zo) shown in blue, (Z ∩ Zo) shown in black

mesh. (Z ∩ Zo) shown in yellow. The solution to the robust planner’s problem is the red

point.

31

7.3 Shock price elasticities

Figure 7: Shock-price elasticity to a shock to X for the three target half-lives when ξ = x2.

From Top to Bottom: the half-lives 120, 80, and 40, respectively. The shaded regions show

interquartile ranges of the shock-price elasticities under the stationary distribution for X .

32

The uncertainty price elasticities depend on the initial state x. In figure 7, we display

the shock elasticities evaluated at the median and the two quartiles of the stationary dis-

tribution for X . We shade in interquartile ranges. Figure 7 shows shock price elasticity

trajectories for the growth rate shock. They are nearly constant across horizons. Increasing

the concern for robustness, as reflected by the sizes of the associated Chernoff half lives,

makes the elasticities larger and increases their variation across horizons.

8 Concluding remarks

We have applied our proposal for constructing a set of models surrounding a decision

maker’s baseline probability model to an asset pricing model in which a representative

consumer’s responses to model uncertainty make so-called prices of risk be countercyclical.

We say so-called because they are actually compensations for model uncertainty, not risk.

And their countercyclical components are entirely due to fears of model misspecification.

We have produced an affine model (30) of the log stochastic discount factor whose so-called

risk prices reflect a robust planner’s worst-case drift distortions h∗t . We describe how these

drift distortions should be interpreted as prices of model uncertainty. The dependency of

these uncertainty prices h∗t on the growth state x, and thus whether they are pro cyclical or

countercyclical, is shaped partly by a function ξ(x) that describes some particular models

that serve as alternatives to a baseline model. In this way, the theory of countercyclical

risk premia in this paper is all about how our robust consumer responds to the presence

of the particular alternative models among a huge set of more vaguely specified alternative

models that concern our representative consumer. We have demonstrated that this is a

simple way of generating countercyclical risk premia.

It is worthwhile comparing this paper’s way of inducing countercyclical risk premia with

three other macro/finance models that also get them. Campbell and Cochrane (1999) pro-

ceed in the standard rational expectations single-known-probability-model tradition and so

exclude any fears of model misspecification from the mind of their representative consumer.

They construct a history-dependent utility function in which the history of consumption

expresses an externality. This history dependence makes the consumer’s local risk aversion

depend in a countercyclical way on the economy’s growth state. Ang and Piazzesi (2003)

use an affine stochastic discount factor in a no-arbitrage statistical model and explore links

between the term structure of interest rates and other macroeconomic variables. Their

approach allows movements in risk prices to be consistent with historical evidence without

33

specifying a general equilibrium model. A third approach introduces stochastic volatil-

ity into the macroeconomy by positing that the volatilities of shocks driving consumption

growth are themselves stochastic processes. A stochastic volatility model induces time

variation in risk prices via exogenous movements in the conditional volatilities impinging

on macroeconomic variables.

What drives countercyclical risk prices in Hansen and Sargent (2010) is a particular

kind of robust model averaging occurring inside the head of the representative consumer.

The consumer carries along two difficult-to-distinguish models of consumption growth, one

asserting i.i.d. log consumption growth, the other asserting that the growth in log consump-

tion is a process with a slowly moving conditional mean.13 The consumer uses observations

on consumption growth to update a Bayesian prior over these two models, starting from

an initial prior probability of .5. The prior wanders over the post WWII sample for US

data, but ends up about where it started. Each period, the Hansen and Sargent representa-

tive consumer expresses his specification distrust by exponentially twisting a posterior over

the two baseline models in a pessimistic direction. That leads the consumer to interpret

good news as temporary and bad news as persistent, causing him to put countercyclical

uncertainty components into equilibrium “risk” prices.

In this paper, we propose a different way to induce variation in risk prices. We abstract

from learning and instead consider alternatives models with parameters whose future vari-

ations are not discernible from from the past. These time-varying parameter models differ

from the decision maker’s baseline model, a fixed parameter model whose parameters can

be well estimated from historical data. We ensure that among the class of alternative mod-

els are ones that allow for parameters persistently to deviate from those of the baseline

model in statistically subtle and time-varying ways. In addition to this class of alternative

models, the decision maker also includes other statistical specifications in the set of models

that concern him. The robust planner’s worst-case model responds to these forms of model

ambiguity partly by having more persistence than in a baseline models. Our approach gains

tractability because the worst-case model turns out to be a time-invariant model in which

projections for long-term growth are more cautious and stochastic growth is more persis-

13Bansal and Yaron (2004) and Hansen and Sargent (2010) both start from the observation that two suchmodels are difficult to distinguish empirically, but they draw different conclusions from that observation.Bansal and Yaron use the observation to justify a representative consumer who with complete confidenceembraces one of the models (the long-run risk model with persistent log consumption growth), while Hansenand Sargent use the observation to justify a representative consumer who initially puts prior probability.5 on both models and who continues to carry along both models when evaluating prospective outcomes.

34

tent than in the baseline model. Worst-case shock distributions are shifted in an adverse

fashion and with additional persistence that gives rise to enduring effects on uncertainty

prices. Adverse shifts in the shock distribution that drive up the absolute magnitudes of

uncertainty prices larger were also present in some of our earlier work (for example, see

Hansen et al. (1999) and Anderson et al. (2003)). In this paper, we induce state dependence

in uncertainty prices in a different way, namely, by specifying the set of alternative models

to capture concerns about the baseline model’s specification of persistence in consumption

growth.

Models of robustness and ambiguity aversion bring new parameters. In this paper,

we extend our earlier work on Anderson et al. (2003) on restricting these parameters by

exploiting connections between models of statistical model discrimination and our way of

formulating robustness. We build on mathematical formulations of Newman and Stuck

(1979), Petersen et al. (2000), and Hansen et al. (2006). We pose an ex ante robustness

problem that pins down a robustness penalty parameter θ by linking it to an asymptotic

measure of statistical discrimination between models. This asymptotic measure allows us to

quantify a half-life for reducing the mistakes in selecting between competing models based

on historical evidence. A large statistical discrimination rate implies a short half-life for

reducing discrimination mistake probabilities. Anderson et al. (2003) and Hansen (2007)

had studied the connection between conditional discrimination rates and uncertainty prices

that clear security markets. By following Newman and Stuck (1979) and studying asymp-

totic rates, we link statistical discrimination half-lives to calibrated equilibrium uncertainty

prices.

35

A Z∗ Reconsidered

Let h be a vector process adapted to the filtration F = {Ft : t ≥ 0}. Replace h with h− h

in formula (5) to construct a martingale Zh−h that determines what we shall call an h− h

model.

Let Bt be a bounded Ft-measurable random variable. Define

Eh(Bt|X0) ≡ E[Zh

t Bt|X0

]

Eh−h(Bt|X0) ≡ E[Zh−h

t Bt|X0

].

Here Eh denotes an expectation under the h model and Eh−h denotes an expectation under

the h− h model.

By usingZht

Zh−h

t

as a Radon-Nykodym’ derivative at time t, we can represent the h model

in terms of the h− h model:

Eh(Bt|X0) = E[Zh

t Bt|X0

]

= E

[(Zh

t

Zh−ht

)Zh−h

t Bt|X0

]

= Eh−h

[(Zh

t

Zh−ht

)Bt|X0

].

Recall that under the h probability distribution, W h is a multivariate standard Brownian

motion where from (6), dWt = htdt+ dW ht . Thus,

d logZh−ht = (ht − ht) · dWt −

1

2|ht − ht|

2dt

= (ht − ht) · dWht −

1

2|ht − ht|

2dt+ (ht − ht) · ht

= (ht − ht) · dWht +

1

2|ht|

2dt−1

2|ht|

2dt. (37)

Conditioned on date zero information, the discounted relative entropy of the h model

with respect to the h− h model is:

δ

∫ ∞

0

exp(−δt)E[Zh

t

(logZh

t − logZh−ht

)|X0 = x

]dt =

1

2Eh

[∫ ∞

0

exp(−δt)|ht|2dt | F0

]

(38)

36

where we have used integration by parts and the evolution in (37). We are interested in the

discrepancy between: (i) the relative entropy (8) of h with respect to the baseline model,

and (ii) the relative entropy (38) of the h model with respect to the h− h model:

ǫ(Zh; |h|2, x

)=δ

∫ ∞

0

exp(−δt)E[Zh

t

(logZh

t

)|X0 = x

]dt

− δ

∫ ∞

0

exp(−δt)E[Zh

t

(logZh

t − logZh−ht

)|X0 = x

]dt

=1

2

∫ ∞

0

exp(−δt)Eh[(

|ht|2 − |ht|

2)|X0 = x

]dt. (39)

B Construction of entropy ball

For a given multiplier, write the value function that solves HJB equation (36) in the form:

υ (x, θ) =1

2

[υ2(θ)x

2 + 2υ1(θ)x+ υ0(θ)]

which gives us

h∗(x, θ) =1

θ[συ2(θ)x+ συ1(θ)] .

We can solve for υ2, υ1, and υ0 by comparing the coefficients for x2, x and the constant

terms, respectively. Solving first for υ2:

υ2(θ) = θ

δ + 2κ−

√(δ + 2κ)2 − 4 |σ|2 ξ2

(θ+1)θ

2 |σ|2

.

−δυ1(θ) + (θ + 1)ξ1 − υ1(θ)κ+1

θ|σ|2υ2(θ)υ1(θ) = 0.

Thus

υ1(θ) = 2(θ + 1)

ξ1

δ +√

(δ + 2κ)2 − 4 |σ|2 ξ2(θ+1)

θ

.

Finally,

−δ

2υ0(θ) +

(θ + 1)

2ξ0 +

1

2υ2(θ)|σ|

2 +1

2θυ1(θ)

2|σ|2 = 0.

Thus

υ0(θ) =1

δ

[(θ + 1)ξ0 + υ2(θ)|σ|

2 +1

θυ1(θ)

2|σ|2].

37

To build the associated entropy ball, we construct

ξ = δminθ≥0

[υ2(θ)x

2 + 2υ1(θ)x+ υ0(θ)]

for X0 = x.

C Asymptotic error rates

Consider an alternative model:

d log Yt = (.01) (µ+Xt) dt+ (.01)α · dWt

dXt = φdt− κXtdt+ σ · dWt.

i) Input values of κ, µ, and φ. Construct the implied h(x) = η1x+ η0 by solving;

[µ− µ

φ− φ

]=

[α′

σ′

]η0

[0

κ− κ

]=

[α′

σ′

]η1

for η1 and η0.

ii) For a given r, construct ζ0, ζ1, ζ2, κ, ψ, φ, and µ from:

(−r + r2)|h(x)|2 = (−r + r

2)|η0 + η1x|2 = −

(ζ0 + 2ζ1x+ ζ2x

2)

κ = (1− r)κ + rκ

ψ = (1− r) + rψ

φ = rφ

µ = (1− r)µ + rµ

iii) Solve

−ψ =−1

2

(ζ0 + 2ζ1x+ ζ2x

2)+ (φ− xκ)(log e)′(x)

38

+(log e)′′(x)

2|σ|2 +

[(log e)′(x)]2

2|σ|2

where log e(x) = λ1x+12λ2x

2. Thus,

λ2 =κ−

√(κ)2 + ζ2|σ|2

|σ|2.

Given λ2, λ1 solves

−ζ1 − κλ1 + λ1λ2|σ|2 + φλ2 = 0

or

λ1 =ζ1 − φλ2λ2|σ|2 − κ

= −ζ1 − φλ2√(κ)2 + ζ2|σ|2

.

Finally,

ψ =1

2ζ0 −

1

2|σ|2λ2 −

1

2|σ|2 (λ1)

2 − φλ1.

iv) Repeat for alternative r’s and maximize ψ as a function of r.

D Robust value function

We solve for υ and θ∗.

D.1 Solving for υ

Consider

0 = minh

− δυ(x) + (.01)(µ+ x) + υ′(x)(φ+ κx) +1

2|σ|2υ′′(x, θ)

+ (.01)α · h+ υ′(x, θ)σ · h+θ

2|h|2 −

θ

2

(ξ2x

2 + 2ξ1x+ ξ0)

Recall that the value function is quadratic

υ (x, θ) = −1

2

[υ2(θ)x

2 + 2υ1(θ)x+ υ0(θ)]

which implies

h(x, θ) = −1

θ[.01α− συ2(θ)x− συ1(θ)] .

39

We can solve for υ2, υ1, and υ0 by matching the coefficients for x2, x and the constant

terms, respectively. Solving first for υ2:

υ2(θ) = θ

δ + 2κ−

√(δ + 2κ)2 − 4 |σ|2 ξ2

2 |σ|2

= θυ2,

where the last equation essentially defines υ2.

υ1(θ) =2

−.01− .01 (α · σ) υ2 + θφυ2 + θξ1

δ +√

(δ + 2κ)2 − 4 |σ|2 ξ2

υ0(θ) =1

δ

[−.02µ+ φυ1(θ) + θ |σ|2 υ2 +

1

θ|.01α− συ1(θ)|

2 + θξ0

]

For convenience, we write

υ1(θ) =υ1,1θ + υ1,0

υ0(θ) =υ0,1θ + υ0,0 + υ0,−1θ−1.

D.2 Solving for θ∗

We want to solve

maxθ

−1

2

(|σ|2

2κθ − υ2 (θ) |σ|2

)υ1 (θ)

2 +θ

2

[log(2κθ − υ2 (θ) |σ|

2)− log (2θκ)]−

1

2υ0 (θ)− θǫ.

θ∗ satisfies

θ∗ =

√|σ|2υ21,0 + (2κ− υ2|σ|2) υ0,−1

|σ|2υ21,1 + (2κ− υ2|σ|2) [− log (2κ− υ2|σ|2) + log (2κ) + υ0,1 + 2ǫ]

40

E Operationalizing Chernoff entropy

Here is how to compute Chernoff entropies for parametric models of the form (10). Because

the h’s associated with them take the form

ht = η(Xt),

these alternative models are Markovian. This allows us to compute Chernoff entropy by

using an eigenvalue approach of Donsker and Varadhan (1976) and Newman and Stuck

(1979). We start by computing the drift of(Zh

t

)r

f(Xt) for 0 ≤ r ≤ 1 at t = 0:

[Gf ](x).=(−r + r

2)

2|η(x)|2f(x) + rf(x)′ση(x)

− f ′(x)κx+f ′′(x)

2|σ|2,

where [Gf ](x) is the drift given that X0 = x. Next we solve the eigenvalue problem

[G(r)]e(x, r) = −ψ(r)e(x, r),

whose eigenfunction e(x, r) is the exponential of a quadratic function of x. We compute

Chernoff entropy numerically by solving:

χ(Zh, x) = maxr∈[0,1]

ψ(r).

To deduce a corresponding equation for log e, notice that

(log e)′(x) =e′(x)

e(x)

and

(log e)′′(x) =e′′(x)

e(x)−

[e′(x)

e(x)

]2.

For a positive f

[Gf ](x)

f(x).=(−r + r

2)

2|h(x)|2 + r(log f)′(x)σ · h(x)− (log f)′(x)κx

+log f ′′(x)

2|σ|2 +

[log f ′(x)]2

2|σ|2. (40)

41

Using formula (40), define[G(log f)

](x) =

[Gf ](x)

f(x).

Then we can solve [G(log e)

](x) = −ψ

for log e to compute the positive eigenfunction e.

These calculations allow us numerically to compute the largest and smallest Chernoff

entropies attained by members of the set Zo.

F Computing shock elasticities

We compute shock price elasticities in four steps:

i) Stochastic impulse response for log S. We solve the recursion:

d logS1t = −.01X1

t dt+ h1t · dWt − h∗t · h1tdt

dX1t = −κX1

t dt

dXt = −κXt + σ · dWt

h∗t = η∗0 + η∗1Xt

h1t = η∗1X1t

where X10 = σ · u and log S1

0 = −.01α · u + h∗(x) · u. The quadratic terms in the

evolution equation of logS make log S1t stochastic.

ii) Deterministic impulse response for logC. We solve the recursion:

d logC1t = .01X1

t dt

dX1t = −κX1

t dt,

where logC10 = (.01)α · u and X1

0 = σ · u. Thus,

logC1t =

.01

κ[1− exp (−κt)]σ · u+ (.01)α · u

42

iii) ComputeE (Mt logS

1t |X0 = x)

E (Mt|X0 = x)

where Mt = St

(Ct

C0

). Note that

d logMt = −δdt+ h∗t · dWt −1

2|h∗t |

2dt.

Let dWt have drift ht and compute expectations conditioned on X0 = x recursively:

d logS1t = −.01X1

t dt+ h1t · (htdt+ dWt)− h∗t · h1tdt

= −.01X1t dt+ h1t · dWt

dX1t = −κX1

t dt

h∗t = η∗0 + η∗1Xt

h1t = η∗1X1t ,

where X10 = σ · u and log S1

0 = −(.01)α · u+ h∗(x) · u. Thus,

E (Mt log S1t |X0 = x)

E (Mt|X0 = x)= −

.01

κ[1− exp (−κt)]σ · u− .01α · u+ h∗(x) · u.

iv) Construct elasticities:

(a) Shock-exposure elasticity for consumption:

E(

Ct

C0logC1

t |X0 = x)

E(

Ct

C0|X0 = x

) =.01

κ[1− exp(−κt)] σ · u+ .01α · u,

which is also the continuous time impulse response for logC.

(b) Shock-price elasticity

E(

Ct

C0logC1

t |X0 = x)

E(

Ct

C0|X0 = x

) −E [Mt (logS

1t + logC1

t ) |X0 = x]

E (Mt|X0 = x)= −

E (Mt log S1t |X0 = x)

E (Mt|X0 = x)

=.01

κ[1− exp(−κt)] σ · u+ .01α · u− h∗(x) · u.

43

which implements formula (32).

44

References

Anderson, Evan W., Lars Peter Hansen, and Thomas J. Sargent. 1998. Risk and Robustness

in Equilibrium. Available on webpages.

———. 2003. A Quartet of Semigroups for Model Specification, Robustness, Prices of Risk,

and Model Detection. Journal of the European Economic Association 1 (1):68–123.

Ang, Andrew and Monika Piazzesi. 2003. A No-Arbitrage Vector Autoregression of the

Term Structure Dynamics with Macroeconomic and Latent Variables. Journal of Mon-

etary Economics 50:745–787.

Bansal, Ravi and Amir Yaron. 2004. Risks for the Long Run: A Potential Resolution of

Asset Pricing Puzzles. Journal of Finance 59 (4):1481–1509.

Borovicka, Jaroslav, Lars Peter Hansen, Mark Hendricks, and Jose A. Scheinkman. 2011.

Risk-Price Dynamics. Journal of Financial Econometrics 9 (1):3–65.

Campbell, John Y. and John Cochrane. 1999. Force of Habit: A Consumption-Based Expla-

nation of Aggregate Stock Market Behavior. Journal of Political Economy 107 (2):205–

251.

Chen, Zengjing and Larry Epstein. 2002. Ambiguity, Risk, and Asset Returns in Continuous

Time. Econometrica 70:1403–1443.

Chernoff, Herman. 1952. A Measure of Asymptotic Efficiency for Tests of a Hypothesis

Based on the Sum of Observations. Annals of Mathematical Statistics 23 (4):pp. 493–507.

Donsker, Monroe E. and S. R. Srinivasa Varadhan. 1976. On the Principal Eigenvalue

of Second-Order Elliptic Differential Equations. Communications in Pure and Applied

Mathematics 29:595–621.

Duffie, Darrell and Rui Kan. 1994. Multi-Factor Term Structure Models. Philosophical

Transactions: Physical Sciences and Engineering 347 (1684):577–586.

Gilboa, Itzhak and David Schmeidler. 1989. Maxmin expected utility with non-unique

prior. Journal of Mathematical Economics 18 (2):141–153.

Hansen, Lars Peter. 2007. Beliefs, Doubts and Learning: Valuing Macroeconomic Risk.

American Economic Review 97 (2):1–30.

45

———. 2011. Dynamic Valuation Decomposition within Stochastic Economies. Economet-

rica 80 (3):911–967. Fisher-Schultz Lecture at the European Meetings of the Econometric

Society.

Hansen, Lars Peter and Thomas Sargent. 2010. Fragile beliefs and the price of uncertainty.

Quantitative Economics 1 (1):129–162.

Hansen, Lars Peter and Thomas J. Sargent. 2001. Robust Control and Model Uncertainty.

American Economic Review 91 (2):60–66.

———. 2008. Robustness. Princeton, New Jersey: Princeton University Press.

Hansen, Lars Peter, Thomas J. Sargent, and Jr. Tallarini, Thomas D. 1999. Robust Per-

manent Income and Pricing. The Review of Economic Studies 66 (4):873–907.

Hansen, Lars Peter, Thomas J. Sargent, Gauhar A. Turmuhambetova, and Noah Williams.

2006. Robust Control and Model Misspecification. Journal of Economic Theory

128 (1):45–90.

James, Matthew R. 1992. Asymptotic analysis of nonlinear stochastic risk-sensitive control

and differential games. Mathematics of Control, Signals and Systems 5 (4):401–417.

Newman, C. M. and B. W. Stuck. 1979. Chernoff Bounds for Discriminating between Two

Markov Processes. Stochastics 2 (1-4):139–153.

Petersen, I.R., M.R. James, and P. Dupuis. 2000. Minimax optimal control of stochastic

uncertain systems with relative entropy constraints. Automatic Control, IEEE Transac-

tions on 45 (3):398–412.

46

sets of models and prices of uncertainty

Documents

set of models

set of probability models

baseline probability

quantitative models

better models

utility model

thebaseline model

onlyvaguely specified