inference for a class of stochastic volatility models using ...gmartin/forbes_etal_2005.pdf(2001),...
TRANSCRIPT
Inference for a Class ofStochastic Volatility Models Using Option and SpotPrices: Application of a Bivariate Kalman Filter∗
Catherine S. Forbes, Gael M. Martin and Jill Wright
Department of Econometrics and Business Statistics, PO Box 11E,
Monash University, Vic. 3800 Australia. Email: [email protected].
Phone 61 3 9905 1189; Fax 61 3 9905 5474
July 16, 2005
Abstract
In this paper Bayesian methods are applied to a stochastic volatility modelusing both the prices of the asset and the prices of options written on the asset.Posterior densities for all model parameters, latent volatilities and the market priceof volatility risk are produced via a hybrid Markov Chain Monte Carlo samplingalgorithm. Candidate draws for the unobserved volatilities are obtained in blocksby applying the Kalman filter and simulation smoother to a linearization of a state-space representation of the model. Crucially, information from both the spot andoption prices affects the draws via the specification of a bivariate measurementequation. Alternative models nested within the Heston (1993) framework are rankedvia posterior odds ratios, as well as via fit, predictive and hedging performance. Themethod is illustrated using Australian News Corporation spot and option price data.
Keywords: Option Pricing; Volatility Risk; Markov Chain Monte Carlo; Non-linear State Space Model; Multi-Move Sampler.
JEL Classifications: C11, C15, G13.
∗This research has been supported by an Australian Research Council Grant Discovery Grant anda Monash University Grant. The authors would like to thank Vance Martin for some very helpfulsuggestions and comments.
1
1 Introduction
Numerous empirical studies over the past two decades have demonstrated that the con-
stant volatility feature of a geometric diffusion process for an asset price is inconsistent
with both the observed time-variation in return volatility and the excess kurtosis and
skewness evident in the empirical distribution of returns; see Bollerslev, Chou and Kroner
(1992) for a review. The option pricing literature supports this finding of misspecifica-
tion, with certain empirical regularities, such as implied volatility ‘smiles’, being viewed
as evidence that asset prices deviate from the assumption of geometric Brownian motion
that underlies the Black and Scholes (1973) option price; see, for example, Bakshi, Cao
and Chen (1997), Hafner and Herwartz (2001) and Lim, Martin and Martin (2005).
In response to these now standard empirical findings, many alternative specifications
for asset prices have been proposed. In the option pricing context, the specifications that
have received the most attention have been variants of a continuous time stochastic volatil-
ity (SV) model, often augmented by random jump processes. This focus has occurred,
in part, as a consequence of the relative ease with which theoretical option prices can be
produced, with closed form solutions for the option price, up to one-dimensional integrals,
being available for some specifications; see, for example Hull and White (1987), Heston
(1993), Bates (2000) and Pan (2002). This ease of evaluation contrasts with the more
computationally intensive Monte Carlo methods needed to price options based on certain
conditional autoregressive heteroscedasticity (GARCH) formulations, in which the option
price is estimated as an average over many simulation paths; see Hafner and Herwartz
(2001), Bauwens and Lubrano (2002) and Martin, Forbes and Martin (2005), amongst
others.
Although the relative simplicity of pricing options under SV models is being increas-
ingly matched by that of new computationally efficient pricing methods suitable for al-
ternative time-varying volatility specifications (e.g. Heston and Nandi, 2000, Duan and
Simonato, 2001, and Lim, Martin and Martin, 2005) the SV formulation (with and with-
out jumps) remains the preeminent type of model used in the option pricing literature. In
particular, much attention is given to developing methods for extracting estimates of the
parameters of an SV model from observed option prices, including an estimate of the price
of volatility risk factored into those prices by market participants; see, for example, Guo
2
(1998), Chernov and Ghysels (2000), Pan (2002) and Eraker (2004). Of particular con-
cern in that literature is the complexity that results from both the option and spot price
processes being a function of a high-dimensional vector of unobserved latent volatilities.
Different contributions to the literature are distinguished, one from the other, primarily
in terms of how the proposed inferential methods manage that complexity.
This paper proposes a new Bayesian method for conducting inference within the frame-
work of the Heston (1993) class of SV models, using both option prices and spot prices on
the underlying asset. Posterior densities are produced for the parameters of the volatility
model, for the latent volatilities and for the market price of volatility risk. The method
involves augmenting the probability density function for a panel of option prices with the
density function describing the bivariate process for the spot price and the volatility (see
also Polson and Stroud, 2003, and Eraker, 2004). Posterior results are produced via a
hybrid Markov Chain Monte Carlo (MCMC) sampling algorithm. As part of this algo-
rithm, candidate draws of the volatilities are obtained via the application of the Kalman
filter and simulation smoother to a linearization of a non-linear state-space representa-
tion of the model. Information from both the spot and option prices affects the draws
via the specification of a bivariate measurement equation. For the options-based com-
ponent of this bivariate specification, a time series of implied volatilities, produced via
the Black-Scholes (BS) model, is taken as a noisy measurement of the evolution of the
assumed stochastic volatility process. The ‘multi-move’ approach to sampling the latent
volatilities is motivated by the Bayesian state space literature, in which such block sam-
plers are shown to be superior to alternative ‘single-move’ approaches (e.g. Carter and
Kohn, 1994, Frühwirth-Schnatter, 1994, de Jong and Shephard, 1995, and Durbin and
Koopman, 2002), and contrasts with the single-move algorithms of Jones (2003), Polson
and Stroud (2003) and Eraker (2004).
In addition to producing posterior point and interval estimates for the unknown ele-
ments of the full Heston model, we also present methods for comparing this model with
alternative specifications nested in the Heston framework. Specifically, we use various
Bayesian model selection criteria to rank the unrestricted Heston model against models
in which, respectively, volatility risk is not priced and a leverage effect is not accommo-
dated. We also compare these three SV models with the constant volatility BS model,
also obtained as a restricted version of the Heston specification. The selection criteria
3
constitute posterior odds ratios, constructed via Savage-Dickey density ratios (e.g. Koop
and Potter, 1999), as well as measures based on fit, predictive and hedging error densities.
The idea of Bayesian model averaging in an option pricing context is also introduced. The
methodology is applied to spot and option price data for News Corporation, as observed
over the period 1998 to 2001.
The outline of the paper is as follows. In Section 2, we define the European call option
price, contrasting the theoretical value of this price under the BS and Heston models.
In Section 3, we describe our Bayesian inferential method, including the hybrid MCMC
scheme that we adopt. Details of the bivariate Kalman filter and simulation smoother
used in the generation of candidate draws of the volatility vector are outlined. In Section
4 we outline the criteria used to rank the various volatility models, including the constant
volatility BS specification. Section 5 includes a description of the News Corporation data
to be used in the empirical demonstration of the method, followed by an outline of the
numerical results. We provide some concluding comments in Section 6.
2 Option Pricing
2.1 Black-Scholes Option Pricing
An option is an asset that gives the owner the right to either buy (call option) or sell (put
option) the underlying asset at some future point in time, at some pre-specified exercise,
or strike price, K. For the European call option that is the focus of this paper, the owner
has the right to buy the underlying asset (or exercise the option) only when the option
matures, at the future time point T. In this case, the price of the option is given by the
expected value of the discounted payoff of the option,
q = Et
£e−rτ max (S (T )−K, 0)
¤, (1)
where S (T ) denotes the spot price of the asset at the time of maturity, τ is the length
of the option contract, or time to maturity, and r denotes the risk-free interest rate
assumed to hold over the life of the option. Given the assumption that underlies the
BS model, namely that the asset price follows a geometric diffusion, and assuming an
absence of riskless arbitrage opportunities, a portfolio made up of both the option and
the underlying asset can be constructed in such a way that the return on the portfolio
4
is equal to r. The possibility of such a portfolio implies that options are priced as if
investors were risk-neutral, with the conditional expectation in (1) evaluated not under
the empirical, or ‘objective’ distribution, but under the risk-neutral distribution for S (T )
as a consequence; see Campbell, Lo and McKinley (1997, Chp. 9) and Hull (2000, Chp.
11) for more details. Risk-neutral pricing in the BS case effects a shift in the drift term
in the diffusion process for the logarithm of the spot price by a factor of r − µ, where µ
is the actual rate of return on the underlying asset. As such, the risk-neutral conditional
distribution of S (T ), given S (t), inherits the lognormal form of the objective distribution,
but with r replacing µ in the specification of this distribution. In this case, the expectation
in (1) has the form
qBS = S (t)Φ(d1)−K−rτΦ(d2), (2)
where d1 = (ln(S (t) /K) + (r + 0.5σ2) τ) /σ√τ , d2 = d1−σ
√τ , σ is the constant diffusion
(or volatility) parameter, and Φ(.) denotes the cumulative normal distribution. Note that
given an observed option price, the formula in (2) may be inverted to find an implied
value of the volatility parameter σ, as the latter is the only unknown element that affects
qBS.
2.2 Option Pricing Under The Heston SV Model
The assumption of a stochastic process for volatility introduces an additional element of
randomness into the option pricing problem, which needs to be accommodated when pro-
ducing a theoretical option price via the type of arbitrage arguments referred to above; see
Ingersoll (1987) for some general discussion of this issue. Heston (1993) uses a particular
set of arguments to produce a theoretical price of the form
qH = S(t)P1 −Ke−rτP2. (3)
5
The terms P1 and P2 in (3) are functions of the unknown quantities that characterize the
assumed risk-neutral bivariate spot price and volatility process, given by
dS(t) = rS(t)dt+pv(t)S(t)dε2(t) (4)
and
dv(t) = κ[θ − v(t)]dt+ σvpv(t)dε1(t), (5)
where ε1(t) and ε2(t) are correlated Weiner processes with correlation parameter ρ, and θ
is the long-run mean of v(t), to which v(t) reverts at rate κ > 0. The unknown quantities
that impact on qH at time t thus constitute: the parameters of the square root volatility
process in (5), namely κ, θ and σv, the correlation parameter ρ, as well as the unobservable
latent variance, v(t); see Heston for details.
The objective bivariate process associated with (5) is given by
dS(t) = µS(t)dt+pv(t)S(t)dε2(t) (6)
and
dv(t) = κa[θa − v(t)]dt+ σvpv(t)dε1(t), (7)
with the parameters in (5) and (7) related as follows:
θ =κaθa
κa + λ, (8)
κ = κa + λ. (9)
The volatility risk premium, λ(S(t), v(t)), is assumed to be proportional to v(t), that is,
λ(S(t), v(t)) = λv(t). (10)
A negative value for λ and, hence, for the risk premium, is often observed empirically.
With λ < 0, (5) implies slower reversion to a higher long-run mean than is implied by the
actual variance process in (7).
6
Given (8) and (9) it is clear that observed option prices can be used to estimate both
the parameters of the objective process and λ only if additional identifying information
on the actual process is incorporated in the inferential procedure. As is shown in Section
3, we achieve this identification by augmenting the density function associated with the
assumed generating process for the option prices with the density function that describes
the dynamics of the objective process for the spot prices and volatilities.
3 Bayesian Inference for the Heston Model
3.1 Specification of the Joint Posterior Density Function
We begin discussion of the Bayesian inferential method with details of the approach used
to estimate the general Heston model. Having developed a methodology for the general
model, analysis of the nested models follows in a straightforward manner, as detailed in
Section 4.1
Given that Bayesian inferences are to be produced in part from observed market
option prices, option prices need to be assigned a particular distributional model. Letting
Ci represent the ith observed market price of the call option and ri, Ki, τ i and Si represent
the observable factors that affect the ith option price, the option pricing model is specified
as
Ci = β0 + β1qH(ri, Ki, τ i, Si, φ) + ui, i = 1, 2, . . . , N. (11)
The term ui in (11) is an unobservable pricing error, assumed to have a normal dis-
tribution with zero mean and variance, σ2u, N is the number of observed option prices
and qH(ri, Ki, τ i, Si, φ) is the ith theoretical option price as defined in (3).2 The vector
φ = (vt, κa, θa, σv, λ, ρ)
0 comprises all unobservable elements that characterize qH , includ-
ing the latent variance. The index i in (11) indicates both variation over time and variation
across option contracts at a given point in time. However, in specifying the joint density
function for the spot prices and volatilities, as associated with a discretized version of (6)
1As the main aim of the paper is methodological, we choose to focus on the Heston model and threevariants, rather than expanding the model to accommodate the jump processes that may be needed insome empirical settings. Results on the empirical usefulness of allowing for random jumps in the assetprocess and/or the volatility process are rather mixed; see, for example, Eraker (2004).
2More sophisticated models describing the deviations between observed and theoretical option pricesare possible; see, for example, Bates (1996, 2000), Eraker (2004) and Martin, Forbes and Martin (2005).
7
and (7), we use only end-of-day spot prices. As such, the spot price process is assumed to
describe movement in the volatilities and spot prices from day to day. Hence the vector
of latent variances, v = (v1, v2, . . . , vn)0, has dimension equal to the number of days, n
say, in the pooled sample of option price data, where n < N. The tth element of v, vt,
enters the formulae for all qH associated with option contracts on day t. Variation in vt
across the day is not modelled. However, the spot price that enters qH for the ith option
contact is the price that is observed as synchronously as possible with the option price.
Consequently, there are N observations (not necessarily distinct) on Si that are used in
the specification of the generating process for the option prices, whilst only n closing
spot prices are used in the specification of the generating process for the spot prices and
volatilities.
Grouping all observable factors that influence the ith option price together in a vector
zi = (ri,Ki, τ i, Si)0, (11) becomes
Ci = β0 + β1qH(zi, φ) + ui, i = 1, 2, . . . , N, (12)
The presence of ui in (12) reflects the fact that the theoretical option model, qH(zi, φ), is
only an approximation of the process that has lead to the determination of an observed
option price. That is, ui encompasses ‘model error’. It may also encompass ‘market error’,
in which an observed option price differs from its theoretical counterpart as the result of
factors such as, for example, the non-synchronous recording of spot and option prices,
transaction costs and other market frictions.3 The joint density function for the vector of
option prices c = (C1, C2, . . . , CN)0, conditional on the known vector z = (z1, z2, . . . zN)0
and on the unknown φ, is thus given by
p(c|z,φ, β, σu) = (2πσ2u)−N/2NYi=1
exp
µ− 1
2σ2u[Ci − [β0 + β1qH(zi, φ)]]
2
¶, (13)
with β = (β0, β1)0.
As already noted, in order to produce simultaneous inference about the parameters
of both the risk-neutral and objective processes or, equivalently, about the parameters of
3To rule out arbitrage, the distribution of Ci should strictly speaking be truncated at a lower bound oflbi = max{0, Si−e−riτ iKi}. Since experimentation has established that the truncation has only minimalimpact on inferences, we choose to ignore it in the estimation procedure. We do however, invoke thetruncation when producing the predictive densities used in ranking alternative models.
8
the objective process and the market price of volatility risk, information about the way
in which the observed spot prices have evolved needs to be incorporated in the inferential
procedure. We incorporate this information by augmenting the data generating process in
(13) with a discretized version of the bivariate spot price and volatility process in (6) and
(7). This augmentation also serves to identify the process to which the estimated volatil-
ities must adhere. The vector of (closing) spot prices associated with the n days in the
sample is defined as s = (S1, S2, . . . , Sn)0. Suppressing the dependence of p(c|z, φ, β, σu)on all elements of z other than s (including all other intraday synchronous spot prices
that are used to calculate qH(zi, φ) for each i) and defining the vector δ = {φ, µ, β, σu} asthe full set of unknowns in the problem, the joint posterior density for δ is thus specified
as
p(δ|c, s) ∝ p(c|s, v, ωa, λ, ρ, β, σu)× p(s, v|ωa, ρ, µ)× p(ωa, ρ, λ, µ, β, σu)
∝ p(c|s, v, ωa, λ, ρ, β, σu)× p(s|v, ωa, ρ, µ)× p(v|ωa)
×p(ωa, ρ, λ, µ, β, σu). (14)
To simplify notation, the parameters of the objective volatility process in (7) are grouped
as ωa = (κa, θa, σv)0. The form of p(v|ωa) in (14) is determined via a Euler discretization
of (7),
vt = vt−1 + κa(θa − vt−1)∆t+ σv√vt−1√∆tε1t
= κaθa∆t+ (1− κa∆t)vt−1 + σv√vt−1√∆tε1t (15)
ε1t ∼ N(0, 1); 2, 3, . . . n,
with the initial value set equal to the long-run mean, θa. By convention, both the variances
and the parameters of the volatility model enter the theoretical option price formula,
qH(zi, φ), in annualized form. As (15) describes day-to-day movements in the annualized
variance, we set ∆t = 1/252 (years), assuming 252 trading days in the year; see also
Chernov and Ghysels (2000).4
4See Eraker, Johannes and Polson (2003) for evidence that the daily interval is small enough to renderthe discretization bias arising from (15) negligible.
9
The form of p(s|v, ωa, ρ, µ) in (14) is determined via an Euler discretization of the
logarithmic transform of the diffusion in (6), namely
lnSt = lnSt−1 + (µ− 0.5vt−1)∆t+√vt−1√∆tε2t, (16)
ε2t ∼ N(0, 1); 2, 3, . . . n, (17)
conditional on a specified value for lnS1. From (16) and (17), and from the fact that ε1tand ε2t are correlated, it follows that
p(s|v, ωa, ρ, µ) = (2π)−(n−1)/2[(1− ρ2)∆t]−(n−1)/2
×nQt=2
1
St√vt−1
exp
½−1/[2(1− ρ2)∆t](
lnSt − µlnSt.vt√vt−1
)2¾
(18)
where
µlnSt.vt = (lnSt−1 + [µ− 0.5vt−1]∆t) +ρ
σv(vt − [θaκa∆t+ (1− κa∆t)vt−1]). (19)
We further specify that
p(ωa, ρ, λ, µ, β, σu) = p(κa)× p(θa)× p(σv)× p(ρ)× p(λ)× p(µ)× p(β)× p(σu). (20)
That is, it is assumed that all parameters in the model are a priori independent.5
3.2 A Hybrid Gibbs-MH Algorithm
3.2.1 Overview of the Algorithm
Due to the large number of unknowns in the model and the manner in which they are
related, analytical treatment of the joint posterior distribution in (14) is not possible,
and an MCMC algorithm has been developed. To implement the MCMC algorithm, the
parameters in δ are ‘blocked’ into eight groups as follows: v, κa, θa, σv, λ, ρ, µ and
5The assumption of independence between λ and ρ could be questioned, given prior knowledge of thepossible relationship between the signs of λ and ρ. It is maintained, however, for the sake of computationalconvenience.
10
(β, σu).6 Starting values are chosen, and a Gibbs-based MCMC algorithm is then applied
to produce successive draws of the unknowns via the respective conditional posteriors:
1. p(v|ωa, λ, ρ, µ, β, σu, c, s)
2. p(κa|v, θa, σv, λ, ρ, µ, β, σu, c, s)
3. p(θa|v, κa, σv, λ, ρ, µ, β, σu, c, s)
4. p(σv|v, κa, θa, λ, ρ, µ, β, σu, c, s)
5. p(λ|v, ωa, ρ, β, σu, c, s)
6. p(ρ|v, ωa, λ, µ, β, σu, c, s)
7. p(µ|v, ωa, ρ, s)
8. p(β, σu|v, ωa, λ, ρ, c, s)
In the description of each conditional, we make explicit the relevant conditioning elements.
For example, the conditionals of λ and (β, σu) do not depend on µ and the conditional of
µ does not depend on any aspect of the model that relates to the option prices, namely
(β, σu), λ and c.
Given the assumption of normality for the error term in (17), in combination with a
uniform prior for µ, standard algebraic manipulations lead to a conditional posterior for
µ with mean and variance respectively,
bµ =⎛⎜⎜⎝
nPt=2
y∗t x∗t
nPt=2
(x∗t )2
⎞⎟⎟⎠× 1
∆t; σ2µ =
⎛⎜⎜⎝ (1− ρ2)nPt=2
(x∗t )2
⎞⎟⎟⎠× 1
∆t,
6Depending on the size of the sample to which the method is applied, the vector v may be furtherdivided into smaller blocks, in order to increase the acceptance rates associated with this component ofthe algorithm. However, we describe the simulation of v below in terms of the full vector, with details ofthe further blocking that is applied in the context of the empirical application provided in Section 5.
11
where
y∗t =
¡lnSt − µlnSt.vt
¢√vt−1
x∗t =1
√vt−1
,
with µlnSt.vt as defined in (19). Simulated values of µ are thus readily obtainable via the
generation of normal random variates. Using the expression for the joint density of the
vector of option prices in (13) and adopting the following noninformative prior for (β, σu),
p(β, σu) = p(β)× p(σu)
∝ 1
σu, (21)
the joint conditional posterior for (β, σu) has the usual normal-gamma form. The normal
conditional for β, p(β|σu, v, ωa, λ, ρ, c, s), has mean and variance,
bβ = (X 0X)−1X 0c; var(β) = σ2u(X0X)−1, (22)
respectively, where X = (ι, qH), with ι a (N × 1) vector of ones and qH denoting the
(N×1) vector of theoretical option prices with ith element qH(zi, φ). The inverted gammadensity for σu, p(σu|v, ωa, λ, ρ, c, s), has degrees of freedom νσu = (N − 2) and parametersσu =
q(c−Xbβ)0(c−Xbβ)/νσu; see, for example, Zellner (1996, Appendix A.4). Draws
of (β, σu) can be obtained from the joint conditional using standard simulation algorithms.
All of the six other conditionals listed above are non-standard, with MH subchains
being applied to produce draws. An MH algorithm involves sampling from the actual
conditional indirectly via a candidate distribution that is both easy to sample from and
a good match for the conditional distribution. The candidate draw is then accepted as
a draw from the actual conditional with a given probability; see, for example, Chib and
Greenberg (1996) and Geweke (1999). The MH draws are inserted into the outer Gibbs
chain, with convergence of the hybrid chain to the full joint posterior distribution ensured,
given the satisfaction of the usual convergence criteria; see Tierney (1994).
The most difficult conditional to draw from efficiently is p(v|ωa, λ, ρ, µ, β, σu, c, s). This
difficulty derives mainly from the complexity of the functional relationship between the
12
theoretical option price qH and the volatility at each point t which, in turn, produces a
complex non-linear relationship between the vector of observed option prices, c, and the
state vector v. The candidate distribution for v that we propose is based on a particular
linearization of this component of the model, with the BS implied variances used as proxies
for observed option prices. Details of this approach are given below. Draws from the other
nonstandard conditionals are produced via random walk MH algorithms; see Chib and
Greenberg (1996) for details. All posterior quantities of interest are calculated from the
full set of MCMC iterates, excluding those in the burn-in part of the chain. Marginal
posteriors are estimated from the simulated values for each parameter of interest using
kernel smoothing.
3.2.2 A Bivariate Kalman Filter and Smoother for Sampling the Volatilities
The discretized model in (15) and (16) can be written as a nonlinear state space model.
The measurement of the (continuously compounded) spot return, ∆ lnSt, depends upon
the unobservable volatilities vt and vt−1. To introduce information from the option prices,
we augment this model by specifying a second observation equation, in which the implied
BS variance for day t, vBSt , is viewed as a noisy linear measurement of the true stochastic
variance, vt. An implied BS variance is obtained by equating (2) with an observed price
and solving for the parameter σ2. In order to produce a vector of implied BS variances
that matches the dimension of v, the BS variances are averaged over the day, with a vector
of n BS variances produced as a result. The variance of this second measurement is fixed,
and denoted by σ2imp. A state space representation of the augmented model is therefore
given by the following two bivariate equations
∙∆ lnStvBSt
¸=
"ρσv−³∆t2+ ρ(1−κa∆t)
σv
´1 0
# ∙vtvt−1
¸+
∙µ∆t− ρθaκa∆t
σv
0
¸+
∙ pvt−1∆t (1− ρ2)ε2t
σimpε3t
¸(23)
and
∙vtvt−1
¸=
∙1− κa∆t 0
1 0
¸ ∙vt−1vt−2
¸+
∙θaκa∆t0
¸+
∙σv√vt−1∆tε1t0
¸, (24)
13
with ε3t ∼ N(0, 1), 1, 2, . . . n. To eradicate the nonlinear dependence of both the return,
∆ lnSt, and the current stochastic variance, vt, on the previous stochastic variance, vt−1,
we replace vt−1 in the error specification in both (23) and (24) with its long-run mean,
θa. It is this linearized approximation to (23) and (24) that is used in the MH algo-
rithm. Candidate draws of the volatility vector are generated using the Kalman filter
and smoothing algorithm of Carter and Kohn (1994). Note that the evaluation of the
candidate density in the MH ratio at the previously generated volatility vector requires
a rerunning of the Kalman smoothing algorithm using this vector to obtain the updated
means and variances.7
4 Incorporating Model Uncertainty
4.1 Alternative Models
The Heston model nests three alternative models for volatility associated respectively with
λ = 0, ρ = 0 and σv = 0. Setting λ = 0 is equivalent to imposing the assumption that
volatility risk is not priced. This assumption is invoked in the early stochastic volatility
analysis of Hull and White (1987) for computational convenience. It has however been
challenged by more recent work, in which estimates of λ that differ significantly from zero
have been reported (see Guo, 1998 and Eraker, 2004, amongst others). An assessment of
this restricted model thus amounts to an assessment of the attitude to volatility risk that
is implicit in option prices.
The model obtained by setting ρ = 0 implies a lack of skewness in the underlying spot
price process. In particular, it implies a lack of the so-called ‘leverage effect’ (associated
with ρ < 0), whereby negative returns are accompanied by an increase in volatility. An
assessment of this restricted model corresponds to an assessment of whether or not returns
are skewed and/or option prices have factored in skewed returns.
Finally, the restriction σv = 0 implies constant volatility.8 Given the assumption
of normal returns, this restriction equates to the assumption of BS option pricing. An
assessment of the empirical validity of this restriction thus amounts to an assessment of
7The Kalman filter need not be rerun, only the smoothing algorithm.8This restriction only implies constant variance in equilibrium. Hence, we also impose the restriction
in our algorithm that υ0 = θa, i.e. that the initial variance is equal to the equilibrium variance.
14
the validity of the BS model.
We refer to the alternative models corresponding to the restrictions λ = 0, ρ = 0
and σv = 0 as respectively M2, M3 and M4, and to the full Heston model as M1. In this
section, several criteria that are used to rank these alternative models in the Section 5 are
described. These criteria are used to supplement the results obtained by simply estimating
the full model, M1, and testing the restrictions via the construction of interval estimates
for each of the relevant parameters. The concept of averaging across the alternative
models, in particular with a view to improving predictive performance, is also discussed.
In what follows, we refer to the vectors of unobservables associated with the four models,
M1, M2, M3 and M4, as δ1, δ2, δ3 and δ4 respectively. These vectors are in turn defined
as follows:
1. δ1 = {v, ωa, λ, ρ, µ, β, σu}
2. δ2 = {v, ωa, ρ, µ, β, σu}
3. δ3 = {v, ωa, λ, µ, β, σu}
4. δ4 = {θa, µ, β, σu}
Due to the nested structure of these models, we can also re-express δ1 as, alternatively,
δ1 = {δ2, λ}, δ1 = {δ3, ρ} or δ1 = {δ4, v, κa, σv, λ, ρ}.
4.2 Bayes Factors using the Savage-Dickey Density Ratio
In the Bayesian framework, the four alternative models, M1, M2, M3 and M4, can be
ranked according to the magnitude of their respective posterior probabilities. In the
present context, the data on which posterior inference is based comprises the vector of
option prices, c, and the vector of spot prices, s. Hence, the ranking occurs via the
posterior model probabilities, P (M1|c, s), P (M2|c, s), P (M3|c, s) and P (M4|c, s). Theseprobabilities can, in turn, be derived via the set of posterior odds ratios for each model,
relative to the reference model, M1, subject to the restriction that the posterior probabil-
ities add to one. Given equal prior odds for all models, the posterior odds ratio for Mk
versus M1 reduces to the Bayes factor for Mk versus M1,
BFk1 =p(c, s|Mk)
p(c, s|M1); k = 2, 3, 4, (25)
15
where p(c, s|Mk) denotes the marginal likelihood for model Mk and is defined as
p(c, s|Mk) =
Zδk
p (c|s, δk,Mk) p(s|δk,Mk)p(δk|Mk)dδk; k = 1, 2, . . . 4. (26)
In the expression for p(c, s|Mk) in (26), δk denotes the vector of unobservables that char-
acterize modelMk, p (c|s, δk,Mk) denotes the joint density for the option prices underMk,
p(s|δk,Mk) denotes the joint density for the spot prices under Mk and p(δk|Mk) denotes
the prior density for δk under Mk. For models M1, M2 and M3, δk includes the vector
of stochastic variances. Hence, for these models, p(δk|Mk) is equal to the product of the
density for the stochastic variances, given the parameters, and the prior density for the
parameters.
In the present context, there is no closed form expression for p(c, s|Mk). Various
numerical approaches to the estimation of marginal likelihoods have been proposed; see
Geweke (1999). However, such numerical procedures would be particularly burdensome
in the present context, in particular due to the presence of the n−dimensional vector oflatent variances, v, in both the reference model and two of the alternative models,M2 and
M3. Fortunately, as each of the alternative models considered here are nested within the
Heston model,M1, (25) can be expressed in a particularly simple analytical form, namely
as the Savage-Dickey (SD) density ratio; see Verdinelli and Wasserman (1995) and Koop
and Potter (1999). For the case of M2 versus M1, the SD density ratio is given by:
SD21 =p(λ = 0|M1, c, s)
p(λ = 0|M1), (27)
where p(λ = 0|M1, c, s) denotes the ordinate of the marginal posterior for λ under the He-
ston model,M1, evaluated at λ = 0 and p(λ = 0|M1) denotes the ordinate of the marginal
prior for λ under M1, evaluated at λ = 0. For (27) to be an equivalent representation of
the Bayes factor for M2 versus M1, BF21, it must hold that
p(δ2|M1, λ = 0) = p(δ2|M2), (28)
where p(δ2|M1, λ = 0) denotes the density for δ2 in the model M1 with λ = 0 imposed.
The condition in (28) can be interpreted as the requirement that the prior distribution
over all unknowns in the Heston model, M1, conditional on λ = 0, is equivalent to the
prior assigned to these unknowns in the submodel, M2, in which λ = 0 is imposed from
16
the outset. The assumption of a priori independence between λ and all other parameters
in the Heston model (see (20)), means that this condition is satisfied. In addition, the
prior for λ must be proper, and to avoid indeterminacy of the SD21 density ratio, this
prior should have a nonzero ordinate at λ = 0.
For analogous reasons, the Bayes factor for the case of M3 (ρ = 0) versus M1 reduces
to
SD31 =p(ρ = 0|M1, c, s)
p(ρ = 0|M1), (29)
and the Bayes factor for the case of M4 (σv = 0) versus M1 reduces to
SD41 =p(σv = 0|M1, c, s)
p(σv = 0|M1). (30)
4.3 Fit and Predictive Performance
For modelMk with parameter vector δk, the residual associated with fitting the ith option
price, Ci, is given by
resi = Ci − [β0 + β1q(zi, φk)] , i = 1, 2, . . . , N, (31)
where q(zi, φk) denotes the theoretical option price associated with model Mk, k =
1, 2, . . . 4, and δk = {φk, µ, β, σu}. ForM1 the appropriate price is qH(zi, φ1), as defined in
(3). ForM2 andM3, the price is qH(zi, φ2) and qH(zi, φ3) respectively; that is, the Heston
option price, but with λ = 0 and ρ = 0 respectively imposed. For M4, the price is the
theoretical BS option price, given by
qBS(zi, φ4 = θa) = SiΦ (d1)−Kie−riτiΦ (d2) , (32)
with d1 and d2 as defined after (2), and with σ =√θa.
The quantity resi is a nonlinear function of the underlying parameters and latent
volatilities contained in φk, as well as being a linear function of β = (β0, β1)0, conditional
on φk. Hence, its posterior distribution can be derived via the appropriate transformation
of the posterior distribution of φk and β. As outlined in Section 3.2.1, p(β|σu, φk, c, s) isnormal, with mean and variance given in (22). As such, the conditional posterior for resi,
p(resi|σu, φk, c, s), is also normal, with mean
E(resi|σu, φk, c, s) = Ci −hbβ0 + bβ1q(zi, φk)i
17
and variance
var(resi|σu, φk, c, s) = σ2u [1, q(zi, φk)] (X0X)−1 [1, q(zi, φk)]
0 ,
with bβ0 and bβ1 being the elements of the two-dimensional vector bβ in (22) and X as
defined following (22). The marginal posterior of resi is given by
p(resi|c, s) =Zσu
Zφk
p(resi|σu, φk, c, s)p(σu, φk|c, s)dσudφk,
which can, in turn, be estimated via B MCMC draws of σu and φk, σ(j)u and φ
(j)k , j =
1, 2, . . . , B, as
p(resi|c, s) =1
B
BXj=1
p(resi|σ(j)u , φ(j)k , c, s).
For M1, the appropriate algorithm is the full MCMC scheme as described in Section 3.2.
For M2 and M3, the algorithm is a reduced version of that scheme, with λ = 0 and
ρ = 0 respectively imposed. ForM4, iterates of δ4 = {θa, µ, β, σu} are generated from theposterior density for δ4, given by
p(δ4|c, s) ∝ p(c|s,θa, β, σu)× p(s|θa, µ)× p(θa, µ),
where p(c|s,θa, β, σu) is given by
p(c|s,θa, β, σu) = (2π)−N/2σ−NuNYi=1
exp
µ− 1
2σ2u[Ci − [β0 + β1qBS(zi, θ
a)]]2¶
and p(s|θaµ) by
p(s|θa, µ) = (2π)−n/2(θa∆t)−n/2 ×nQt=2
1
St×
exp©(−1/[2θa∆t]) (lnSt − [lnSt−1 + (µ− 0.5θa)∆t])2
ª. (33)
The prior density, p(θa, µ), is specified as being uniform over both parameters, truncated
from below at θa = 0. Draws of δ4 can be produced via the conditionals for µ and
θa respectively. Since the conditional for µ is normal, with mean and variance given
respectively by bµ = Pnt=2(lnSt − lnSt−1 + 0.5θa∆t)
n− 1 × 1
∆t
18
and
σ2µ =
µθa
n− 1
¶× 1
∆t,
draws for µ given θa are readily obtained. Draws of θa conditional on µ can be obtained
via a simple approximation of the one-dimensional conditional distribution function for θ
(i.e. via ‘Griddy Gibbs’), with the boundary at θa = 0 imposed on the draws.
The proportion of 50% and 95% resi intervals that cover zero can be calculated for
each modelMk, with the best fitting model being the one for which this proportion is the
closest to the nominal level. If the observed option prices used to define (31) belong to the
vector c, the fit assessment is within-sample. If not, the fit assessment is out-of-sample.
The latter is the form of fit assessment used in the Section 5.
Given the distributional assumption in (11), for model Mk the predictive density for
an out-of-sample option price, Cf say, is given by:
p(Cf |c, s) =Zδk
p(Cf |c, s, φk, β, σu)p(δk|c, s)dδk, (34)
where p(Cf |c,s, φk, β, σu) is a normal density with mean β0+ β1q(zf , φk) and variance σ2u
and p(δk|c, s) is the joint posterior density for parameters of model Mk. The notation zf
is used to denote the known factors associated with the future option contract f. In order
to impose the lower bound that Cf must exceed in order for arbitrage opportunities to
be avoided, in the construction of (34) we specify p(Cf |c,s, φk) as the truncated normaldensity
p(Cf |c,s,φk, β, σu) =(2πσ2u)
−1/2
(1− Φ(slbf))exp
µ− 1
2σ2u[Cf − [β0 + β1q(zf , φk)]]
2
¶, (35)
where
slbf =lbf − [βo + β1q(zf , φk)]
σu
is the standardized version of the no-arbitrage lower bound, lbf = max{0, Sf−e−rf τfKf};see Hull (2000, Chp. 7). Again using the simulation output from the algorithm appropriate
for modelMk, repeated draws from p(δk|c, s), δ(j), j = 1, 2, . . . , B, can be used to constructan estimate of p(Cf |c, s) as
dp(Cf |c, s) =1
B
BXj=1
p(Cf |c,s, φ(j)k , β(j), σ(j)u ). (36)
19
In Section 5, prediction intervals constructed from (36) are used to rank the predictive
performance of the models.
4.4 Hedging Performance
An important measure of the performance of the alternative volatility models is the extent
to which they produce small hedging errors. In this paper we focus on the errors associated
with single instrument hedge portfolios, in which movements in the underlying spot price,
St, are hedged against by taking the appropriate position in a single option contract, with
price Ct, at time t. In this case the resulting cash position with minimum variance is
Ct −NkSt, (37)
where Nk denotes the number of shares in the underlying asset in which the investor goes
long for every call option in which the investor goes short, assuming model Mk. In the
case of the Heston model, M1, N1 is defined as
N1 =∂qH∂St
+ρσvSt
∂qH∂vt
,
where∂qH∂St
= P1
and∂qH∂vt
= St∂P1∂vt−Ke−rtτ
∂P2∂vt
, (38)
with qH as defined in (3); see Bakshi, Cao and Chen (1997) and Chernov and Ghysels
(2000) for more details. The hedging error over one day, say, for the Heston model, H1, is
defined as the difference between the minimum variance hedge portfolio, as constructed
on day t and invested at the risk-free rate rt, and the value of that same portfolio when
unwound one day later, namely,
H1 = (Ct −N1St) ertτ − (Ct+1 −N1St+1) , (39)
where Ct+1 and St+1 denote respectively the option and spot prices on day t + 1. Since
H1 is a function of the unknown parameters and variances, via N1, the posterior density
of H1, p(H1|c, s), can be estimated from the MCMC output associated with estimation of
20
the Heston model. That is, draws of φ1 can be used to produce draws of H1, which can
then be used to produce a nonparametric kernel estimate of p(H1|c, s). The distributionof hedging errors over any time period can be constructed in an analogous way.
The same sort of exercise can be performed for each of the alternative models, M2,
M3 and M4. For the first two of these alternative models, the relevant posterior densities
for the hedging errors, p(H2|c, s) and p(H3|c, s), can be estimated from the output of theMCMC schemes in which λ = 0 and ρ = 0 are imposed respectively. In the case of M4,
specification of N4 = Φ (d1) , with d1 as defined after (2), renders the portfolio in (37)
delta-hedged; see Hull (2000). For the three SV models, the derivatives with respect to vtwhich enter (38) need to be computed numerically. The best performing model according
to this criterion is the model with the hedging error density most closely concentrated
around zero.
4.5 Model Averaging
As described in Section 4.2, the posterior probability of the four alternative models,
P (Mk|c, s), k = 1, 2, . . . , 4, can be estimated from the option and spot prices. These
posterior probabilities can be used to produce a model-averaged predictive density, which
can, in turn, be used as the tool for prediction rather than the predictive associated with
any one particular model. The typical rationale for model averaging is based on the
acknowledgement that no one model is ever representative of the ‘true’ data generating
process. In the current context, there is a further rationale, namely that option prices are
determined by the interaction of market participants who, themselves, potentially invoke
different distributional assumptions. For both reasons, the model-averaged predictive may
well have better coverage properties than the predictives associated with specific mod-
els.9 Given model-specific predictive densities, p(Cf |Mk, c, s), k = 1, . . . , 4, the averaged
predictive density, pa(Cf |c, s), is defined as:
pa(Cf |c, s) =4X
k=1
p(Cf |Mk, c, s)P (Mk|c, s). (40)
9Model-averaged fit and hedging error densities could also be produced. We focus only on the model-averaged predictive density as it has a clear interpretation as an inferential tool and, hence, better servesto illustrate the potential benefits of model averaging.
21
5 Numerical Illustration: News Corporation OptionPrices
5.1 Data Description
The methodology is demonstrated using data on News Corporation option and equity
trades over the four year period: 1998 to 2001. All option and spot price data has been
obtained from the Australian Stock Exchange. In order to use the option data to produce
inferences on the day to day movements in the underlying spot price process, option trades
are selected from one particular period of time each day, namely the last hour of trading.10
A range of option prices that maximizes the moneyness spread is then selected from the
set of prices observed during this period.11 Only options with maturities of between
15 and 90 days are included in the dataset. A total of 10904 observations are used for
estimation, with a maximum of 60 prices selected from any one day in the within-sample
period. The out-of-sample period constitutes the nine trading days from 11 December,
2001 to 21 December 2001, with 855 trades used for the out-of-sample assessments. Since
equity trades on News Corporation stock occur very frequently, for each option trade
it is possible to obtain a virtually simultaneous equity price: usually recorded within a
few seconds of the option trade. When several equity trades are recorded at exactly
the same time, a weighted average is taken, with the weights determined by the trading
volume. Observations on the discretized process for equity prices are taken as the average
of the spot prices observed during the last hour of trading. We use a single interest rate
observation for each day, rt, t = 1, 2, . . . , n, where rt denotes the 3 month bond rate on
day t.12 The dividends paid on News Corporation shares are typically about 0.1% of share
value, paid 6-monthly. The impact of dividends on share prices is therefore so small that
they have only been taken into account as a constant continuous discount factor. The
assumption of a constant variance for the pricing errors in (11) is justified as no discernible
pattern in the variance, across moneyness in particular, is found.
10That is, we attempt to minimize the variation across the day in Si.11The moneyness of an option contract refers to the relationship between the current spot price and
the strike price (K). Broadly speaking, options are said to be out-of-the-money if Si < K, in-the-moneyif Si > K and at-the-money if Si = K.12The interest rate data can be sourced at the Reserve Bank of Australia website: www.rba.gov.au.
22
5.2 Numerical Results
5.2.1 Posterior Results for the Heston Model
Table 1 presents point and interval estimates of both the parameters and selected variances
associated with the full Heston model (M1). All results are produced by running the
MCMC algorithm for 10000 iterations, with the first 2000 iterations discarded. The
elements of the full vector v are selected in blocks with an average size of 10. The actual
block lengths at each iteration are chosen randomly; see Shepherd and Pitt (1997) and
Strickland, Forbes and Martin (2005) for details. The starting values for the parameters,
chosen via preliminary experimentation with the algorithm, are as follows: κa = 5.0;
θa = 0.15; σv = 0.6; λ = −0.2; ρ = 0.2; µ = 0.16; β = (0, 1)0 and σu = 0.1. A normal
prior distribution is specified for λ, with mean and standard deviation respectively −0.2and 1.0. This prior assigns weight to a wide range of values for λ, in the positive and
negative regions, but centres the distribution in the negative region, as accords with
several empirical estimates of λ reported in the literature. A normal prior distribution
is also imposed for ρ, with mean and standard deviation of −0.2 and 0.3 respectively.These prior parameters imply only a very small prior probability of ρ falling beyond the
bounds of ±1 and, again, centre the distribution in the negative region, as tallies withthe typically negative empirical estimates of ρ , from returns data in particular. The
bounds of ±1 are imposed on the posterior distribution by way of discarding any drawsof ρ that fall beyond them. An exponential prior distribution is specified for σv, with
the single prior parameter specified as follows. Using News Corporation returns data
for the period 1992 to 2001, GARCH(1,1) models are estimated for each year and the
(annualized) conditional variance series for each year produced. The standard deviation
of each of the ten conditional variance series is recorded. The exponential prior for σvis then calibrated such that the maximum of these ten standard deviation values is the
median of the distribution for σv. This calibration produces a value of approximately 0.1
for the prior parameter which, in turn, produces a prior ordinate at zero, p (σv = 0|M1), as
used in the Bayes factor in (30), equal to approximately 10.0. Uniform priors are specified
for κa and θa, bounded at zero from below. As already noted a uniform prior is also
specified for µ, with the noninformative prior used for (β, σu)0 given in (21).
Table 1 reports the mean, mode and (approximate) 95% Highest Posterior Density
23
(HPD) interval13 for each of the parameters and selected random variances.14 The point
estimates of κa correspond to daily persistence measures of (1 − 5.480/252) = 0.9783
and (1 − 5.633/252) = 0.9777 respectively, with the interval estimate translating into
a (daily) persistence interval of (0.976, 0.979). All estimates are thus well within the
realm of typical returns-based estimates of such measures. For instance, the persistence
measure from a GARCH(1,1) model estimated for News Corporation returns over the
1998-2001 period is 0.968. The point estimates of θa correspond to an estimate of long-
run volatility of approximately√0.145 = 0.381. This value is somewhat smaller than the
unconditional mean of volatility associated with a GARCH(1,1) model estimated over the
1998-2001 period, namely 0.453. The point estimates of λ are negative, as is typical for this
parameter. This result is consistent with the market viewing volatility as a ‘negative-beta
investment’; see, for example, Guo (1998) and Bates (2000). The estimated probability of
λ < 0 is approximately equal to 0.76. However, as is clear from the interval estimate of λ,
although support for a negative value for the risk-premium parameter is quite strong, the
interval does cover both positive and zero values for λ, in addition to negative values. The
estimates of σv imply a moderate degree of excess kurtosis in the returns process, whilst
the estimates of ρ indicate that a small degree of (positive) returns skewness is implied
by the joint options and spot datasets.15 The point estimates of µ imply an (annualized)
rate of return on the News Corporation stock of approximately 22− 27%, well in excessof the returns-based mean rate of return of 15% estimated for the 1998 to 2001 period.
However, the interval estimate indicates the extent of the uncertainty associated with
13An HPD interval is one with the specified probability coverage, whose inner ordinates are not exceededby any density ordinates outside the interval. The intervals reported in Table 1 are approximate HPDintervals in that the kernel smoothing procedure used to estimate the marginal posteriors does not alwaysenable the ordinate condition described here to be satisfied exactly. Furthermore, for densities that aremultimodal the 95% intervals are contructed so that the ordinates of the lower and upper bounds areas close as possible to being equal, subject to the restriction that the tail probabilities sum to 5%.This implies that there may be ordinates within the interval that are smaller than ordinates beyond theinterval.14As β and σu are essentially nuisance parameters, we do not devote space to them in the reporting
of results in the text. However, we note that the posterior mean estimates of these parameters arerespectively (−0.063, 1.016) and 0.031.The acceptance rates for the MH subchains are as follows: κa:0.740; θa : 0.220; σv : 0.485; λ : 0.474; ρ : 0.314. The proportion of times that at least one element of thefull vector v is updated is 0.889.15Given that the returns in question are on the S&P500 market index, one might anticipate that the
negative value of λ would be accompanied by a negative value of ρ, given that ρ is a measure of thecorrelation between volatility and the market portfolio; see Bates (2000).
24
Table 1: Marginal Posterior Density Results for the Heston Model
Mode Mean 95% HPD interval
Parameter
κa 5.480 5.633 (5.170, 6.140)θa 0.144 0.147 (0.136, 0.157)σv 0.719 0.713 (0.634, 0.790)λ -0.049 -0.147 (-0.631, 0.247)ρ 0.164 0.163 (0.121, 0.205)µ 0.270 0.219 (-0.220, 0.650)
Variance
v100 0.138 0.145 (0.133, 0.162)v500 0.171 0.175 (0.156, 0.194)v990 0.130 0.127 (0.121, 0.135)
estimation of this parameter, with 95% probability assigned to the range (−0.220, 0.650).The point estimates of the variances reported in Table 1 vary over the sample period, but
are broadly consistent with the estimates of the long-run mean parameter θa.
5.2.2 Model Ranking
In this section we apply the methods discussed earlier to rank the three restricted models,
M2, M3 and M4, and the full model M1. From the interval estimates reported in Table 1,
it is evident that two of the three restricted models,M3 andM4, are rejected by the data,
with neither of the intervals covering the relevant parameter restrictions. The interval
estimate for λ however covers the value of λ = 0, thereby providing some evidence in
favour of M2.
To supplement these results, we estimate the three submodels, assessing the out-of-
sample performance of each along the lines discussed in Sections 4.3 and 4.4, as well as
constructing Bayes factors for each relative to M1. The details of the algorithm used to
25
Table 2: Bayes Factors and Model ProbabilitiesEntry (i, j) Indicates the Bayes Factor
in Favour of Mj versus Mi
M1 M2 M3 M4
(Heston) (λ = 0) (ρ = 0) (σv = 0)
M1 1.000 4.723 6.878× 10−236 2.899× 10−305M2 1.000 1. 456× 10−236 6. 138× 10−306M3 1.000 4. 216× 10−70M4 1.000
P (Mk|c, s) 0.175 0.825 0.000 0.000
estimate M2 and M3 are identical to those used to estimate M1, apart from the obvious
parameter restrictions associated with the nested models. Estimation of the BS model,
M4, in the manner described in Section 4.3, produces a marginal posterior for√θa that is
virtually degenerate around a single value, namely√θa = 0.369. With only this parameter
affecting the theoretical option prices, we report the fit, predictive and hedging results
conditional on this single value.
Table 2 reports the estimated Bayes factors for M2, M3 and M4, with M1 as the
reference model. The bottom row in the table gives the corresponding model probabilities
(reported to three decimal places), based on equal prior probabilities for all four models
and assuming that these four models span the model set. It is clear that the posterior
probabilities support M2, but with some weight also assigned to M1. This result tallies
with the reasonable posterior weight assigned to the region around λ = 0 by the marginal
posterior of λ derived from the full model, M1. Negligible posterior weight is assigned to
both M3 and M4, again a result which tallies with the relevant results reported in Table
1. Most notably, the data provides essentially no support for the constant variance BS
model (M4).
The fit and predictive results reported in Table 3 represent the proportion of times
26
that each criterion is satisfied, for each model, for the 855 out-of-sample observations. If
the model is correctly specified, this proportion should approximate the nominal coverage
level. It should be noted however, that the out-of-sample option contracts do not repli-
cate exactly the maturity and moneyness characteristics of the within-sample contracts,
with this feature expected to produce some deviation between the nominal and empirical
coverages for all models. As is evident, models M1, M2 and M3 are all quite close to the
nominal level in the case of the 95% fit and prediction intervals. All three of these models
overstate the 50% nominal coverage of both the fit and prediction intervals. There is
negligible difference between the out-of-sample performance of these three versions of the
Heston SV model.
In comparison with the SV model variants, the BS model is clearly inferior in terms
of out-of-sample fit, with zero not covered by any of the 50% intervals and only negligible
coverage by the 95% intervals. These results reflect the fact that the theoretical BS prices
used in the construction of the ith residual quantity, as per (31), are relatively inaccurate
estimates of the observed out-of-sample prices, for the hold-out sample considered. The BS
fit intervals are also very narrow, due to the degenerate nature of the estimated posterior
distribution for the BS volatility parameter√θa.16 However, when the additional variance
of the option pricing error is taken into account, the relative inaccuracy of the BS prices
is largely offset, with empirical coverages of the prediction intervals tallying reasonably
closely with the nominal values.
With regard to model-averaging, as the results in Table 2 indicate that only M1 and
M2 have non-negligible posterior probability, the weighting effectively occurs with respect
to the predictives of these models only. As these two models also have extremely sim-
ilar predictive behaviour, the averaging process has a minimal impact, with results not
reported as a consequence.
In Table 4 the hedging error results associated with the four models are reported.
Hedging errors are calculated using (39) for M1 and the version of (39) appropriate for
the remaining models, as described in the text immediately following (39). Errors are
calculated for one day and five days ahead, with the portfolio constructed at the end of
the estimation period and not rebalanced during the entire out-of-sample period. On each
16The only variation in the distribution of resi in the BS case arises from posterior variation in (β, σu).
27
Table 3: Fit and Predictive Performance Measures
Criterion(a) M1 M2 M3 M4
Zero in Interquartile Fit Interval(b) 0.629 0.630 0.625 0.000Zero in 95% Residual Interval(b) 0.991 0.991 0.993 0.002Cf in Interquartile Predictive Interval(b) 0.653 0.648 0.655 0.449Cf in 95% Predictive Interval(b) 0.995 0.995 0.989 1.000
(a) All figures represent proportions of 855.
(b) The (1 − α)% Interval is the interval which excludes α/2% in the lower and upper tails of thefit/predictive distribution. This interval equals the (1− α)% HPD interval only for those distrib-utions which are symmetric around a single mode.
of these days the hedging errors associated with all contracts are calculated. The errors
are then averaged across all contracts.17 It is these averaged hedging errors to which the
summary statistics in Table 4 relate.
The results in Table 4 make it clear that there is very little difference between all four
models according to this criterion. It is also clear that none of the HPD intervals cover
zero. That said, for all four models, the errors associated with the hedge portfolio are
minimal, in terms of both the point and interval estimates. These errors represent less
than 1% of the average magnitude of the option prices on the out-of-sample days. For
M1, M2 andM4, the magnitude of the errors increases over time without any rebalancing
taking place, as would be anticipated. ForM3 however, the errors five days out are slightly
smaller than those one day ahead, although both sets of errors are very small. The BS
model, M4, has the mean hedging error with the largest magnitude, five-days-ahead, and
the second largest mean error one-day-ahead.
17The data was not plentiful enough to construct meaningful hedging statistics for the different mon-eyness categories.
28
Table 4: Hedging Performance of the Different ModelsMeans of (Average) Hedging Error Densities with 95% HPD Intervals in Brackets ($)
M1 M2 M3 M(a)4
One Day Ahead
0.002 0.002 0.0044 -0.003(0.0016, 0.0028) (0.0016, 0.0027) (0.0042, 0.0047) n.a.
Five Days Ahead
-0.004 -0.004 0.0026 -0.007(-0.005, -0.002) (-0.006, -0.003) (0.0024, 0.0028) n.a.
(a) Since only a single point estimate of the BS volatility has been produced, HPD intervals cannotbe constructed.
6 Conclusions
In this paper a new methodology for producing option and spot price-based estimates
of the parameters of a stochastic volatility model is presented. The method has been
developed within the context of the Heston (1993) theoretical option pricing model and
certain variants thereof. The numerical scheme adopted exploits the state-space repre-
sentation of the Heston spot and volatility process. Simulation of the latent volatilities
occurs via the application of a bivariate Kalman filter and smoother, with information
from the option prices impacting on the filter via a measurement equation in which the
BS implied volatilities proxy the option prices. Construction of Bayes factors, in addition
to fit, prediction and hedging intervals, enables the alternative variants of the proposed
model to be ranked on the basis of observed option and spot price data.
Application of the methodology to Australian News Corporation stock and options
data produces estimates of the parameters of the Heston model that imply a very persis-
tent volatility process, with the degree of volatility in volatility indicating that a certain
amount of excess kurtosis characterizes the spot price data and/or has been factored into
the options data. Positive skewness in returns also features in the numerical results. The
29
model that imposes a zero premium for volatility risk is given some support by the data,
with this variant of the Heston model being assigned the highest posterior probability
of all models considered. However, the unrestricted Heston model, in which a non-zero
risk premium is allowed, is also given non-negligible posterior weight, with both point
and interval estimates of the relevant parameter suggesting that a negative risk premium
may have been factored into option prices over this period. The constant volatility BS
model is also clearly rejected by the data in the sense that the point and interval es-
timates of the variance of the Heston volatility process are non-zero, and the posterior
probability assigned to this model is essentially zero. The out-of-sample fit performance
of the BS model is also markedly inferior to that of all variants of the SV model, although
the predictive performance of the BS model is quite reasonable once allowance is made
for the variation in observed option prices. There is little to choose between any of the
models in terms of hedging performance, although a richer data set might allow for more
discrimination between specifications in terms of hedging results associated with different
contracts on the moneyness spectrum.
References
[1] Bakshi, G., C. Cao and Z. Chen. (1997). “Empirical Performance of Alternative
Option Pricing Models.” Journal of Finance, 52, 2003-2049.
[2] Bates, D.S. (1996). “Jumps and Stochastic Volatility: Exchange Rate Processes Im-
plicit in PHLX Deutsche Mark Options.” The Review of Financial Studies, 9, 69-107.
[3] Bates, D.S. (2000). “Post-87 Crash Fears in the S&P 500 Futures Option Market.”
Journal of Econometrics, 94, 181-238.
[4] Bauwens, L. and Lubrano, M. (2002), “Bayesian Option Pricing Using Asymmetric
GARCH Models.” Journal of Empirical Finance, 9, 321-342.
[5] Black, F. and M. Scholes. (1973). “The Pricing of Options and Corporate Liabilities.”
Journal of Political Economy, 81, 637-659.
[6] Campbell, J.Y., Lo, A.W. and MacKinlay, A.C. (1997). The Econometrics of Finan-
cial Markets, Princeton University Press, New Jersey.
30
[7] Carter, C.K. and Kohn, R. (1994), “On Gibbs Sampling for State Space Models,”
Biometrica, 81, 541-553.
[8] Chernov, M. and E. Ghysels. (2000). “A Study Towards a Unified Approach to the
Joint Estimation of Objective and Risk Neutral Measures for the Purpose of Options
Valuation.” Journal of Financial Economics, 56, 407-458.
[9] Chib, S. and E. Greenberg. (1996). “Markov chain Monte Carlo simulation methods
in Econometrics.” Econometric Theory, 12, 409-431.
[10] Cox, J.C., J.E. Ingersoll and S.A. Ross. (1985). “A Theory of the Term Structure of
Interest Rates.” Econometrica, 53, 385-408.
[11] De Jong, P. and N.G. Shephard. (1995). “The Simulation Smoother for Time Series
Models.” Biometrika, 82, 339-350.
[12] Duan, J-C. and Simonato, J-G. (2001). “American option pricing under GARCH by
a Markov chain approximation.” Journal of Economic Dynamics and Control, 25,
1689-1718.
[13] Durbin, J. and Koopman, S.J. (2002), “A Simple and Efficient Simulation Smoother
for Time Series Models.” Biometrika, 81, 341-353.
[14] Eraker, B. (2004). “Do Stock Prices and Volatility Jump? Reconciling Evidence from
Spot and Option Prices.” The Journal of Finance, LIX, 1367-1403.
[15] Eraker, B., M.J. Johannes N.G. and Polson. (2003). “The Impact of Jumps in Returnsand Volatility.” Journal of Finance, 53, 1269-1300.
[16] Frühwirth-Schnatter, S. (1994), “Data Augmentation and Dynamic Linear Models.”
Journal of Time Series Analysis, 15, 183-202.
[17] Geweke, J. (1999). “Using Simulation Methods for Bayesian Econometric Models:
Inference, Development and Communication.” Econometric Reviews, 18, 1-73.
[18] Guo, D. (1998). “The Risk Premium of Volatility Implicit in Currency Options.”
Journal of Business and Economic Statistics, 16, 498-507.
31
[19] Hafner, C.M. and Herwartz, H. (2001), “Option Pricing Under Linear Autoregressive
Dynamics, Heteroskedasticity and Conditional Leptokurtosis.” Journal of Empirical
Finance, 8, 1-34.
[20] Heston, S.L. (1993). “A Closed-form Solution for Options with Stochastic Volatility
with Applications to Bond and Currency Options.” The Review of Financial Studies,
6, 327-343.
[21] Heston, S. and Nandi, S. (2000), “A Closed-Form GARCHOption Valuation Model.”
Review of Financial Studies, 13, 585-625.
[22] Hull, J.C. (2000). Options, Futures, and Other Derivative Securities. 4rd ed., New
Jersey: Prentice Hall.
[23] Hull, J.C. and A. White. (1987). “The Pricing of Options on Assets with Stochastic
Volatilities.” Journal of Finance, 42, 281-300.
[24] Ingersoll, J.E.Jr. (1987). Theory of Financial Decision Making. Rowman & Littlefield
Studies in Financial Economics, USA.
[25] Jones, C. (2003). “The Dynamics of Stochastic Volatility: Evidence from Underlying
and Options Markets.” Journal of Econometrics, 116, 181-224.
[26] Koop, G. and S. Potter. (1999). “Bayes Factors and Nonlinearity: Evidence from
Economic Time Series.” Journal of Econometrics, 88, 251-281.
[27] Lim, G.C., Martin, G.M. and V.L. Martin, 2005, “Parametric Pricing of Higher Order
Moments in S&P500 Options.” Journal of Applied Econometrics, 20, 377-404.
[28] Martin, G.M., Forbes, C.S. and V.L. Martin, 2005, “Implicit Bayesian Inference
Using Option Prices.” Journal of Time Series Analysis, 26, 437-462.
[29] Pan, J. (2002). “The Jump-risk Premia Implicit in Options: Evidence from an Inte-
grated Time-series Study.” Journal Of Financial Economics, 63, 3-50.
[30] Polson, N. G. and Stroud, J. R. (2003). “Bayesian Inference for Derivative Prices.”
Bayesian Statistics, 7, 641-650.
32
[31] Shephard, N. and M. Pitt. (1997). “Likelihood Analysis of Non-Gaussian Measure-
ment Time Series.” Biometrica, 84, 653-667.
[32] Strickland, C.M., C.S. Forbes and G.M. Martin. (2005). “Bayesian Analysis of the
Stochastic Conditional Duration Model.” Forthcoming, Computational Statistics and
Data Analysis (Special Issue on Filtering and Signal Extraction).
[33] Tierney, L. (1994). “Markov Chains for Exploring Posterior Distributions.” The An-
nals of Statistics, 22, 1701-1762.
[34] Verdinelli, I. and L. Wasserman. (1995). “Computing Bayes Factors Using a General-
ization of the Savage-Dickey Ratio.” Journal of the American Statistical Association,
90, 614-618.
[35] Zellner, A. (1996). “An Introduction to Bayesian Inference in Econometrics”, Wiley
Classics Library Edition, New York.
33