inference for a class of stochastic volatility models using ...gmartin/forbes_etal_2005.pdf(2001),...

Inference for a Class ofStochastic Volatility Models Using Option and SpotPrices: Application of a Bivariate Kalman Filter∗

Catherine S. Forbes, Gael M. Martin and Jill Wright

Department of Econometrics and Business Statistics, PO Box 11E,

Monash University, Vic. 3800 Australia. Email: [email protected].

Phone 61 3 9905 1189; Fax 61 3 9905 5474

July 16, 2005

Abstract

In this paper Bayesian methods are applied to a stochastic volatility modelusing both the prices of the asset and the prices of options written on the asset.Posterior densities for all model parameters, latent volatilities and the market priceof volatility risk are produced via a hybrid Markov Chain Monte Carlo samplingalgorithm. Candidate draws for the unobserved volatilities are obtained in blocksby applying the Kalman filter and simulation smoother to a linearization of a state-space representation of the model. Crucially, information from both the spot andoption prices affects the draws via the specification of a bivariate measurementequation. Alternative models nested within the Heston (1993) framework are rankedvia posterior odds ratios, as well as via fit, predictive and hedging performance. Themethod is illustrated using Australian News Corporation spot and option price data.

Keywords: Option Pricing; Volatility Risk; Markov Chain Monte Carlo; Non-linear State Space Model; Multi-Move Sampler.

JEL Classifications: C11, C15, G13.

∗This research has been supported by an Australian Research Council Grant Discovery Grant anda Monash University Grant. The authors would like to thank Vance Martin for some very helpfulsuggestions and comments.

1

1 Introduction

Numerous empirical studies over the past two decades have demonstrated that the con-

stant volatility feature of a geometric diffusion process for an asset price is inconsistent

with both the observed time-variation in return volatility and the excess kurtosis and

skewness evident in the empirical distribution of returns; see Bollerslev, Chou and Kroner

(1992) for a review. The option pricing literature supports this finding of misspecifica-

tion, with certain empirical regularities, such as implied volatility ‘smiles’, being viewed

as evidence that asset prices deviate from the assumption of geometric Brownian motion

that underlies the Black and Scholes (1973) option price; see, for example, Bakshi, Cao

and Chen (1997), Hafner and Herwartz (2001) and Lim, Martin and Martin (2005).

In response to these now standard empirical findings, many alternative specifications

for asset prices have been proposed. In the option pricing context, the specifications that

have received the most attention have been variants of a continuous time stochastic volatil-

ity (SV) model, often augmented by random jump processes. This focus has occurred,

in part, as a consequence of the relative ease with which theoretical option prices can be

produced, with closed form solutions for the option price, up to one-dimensional integrals,

being available for some specifications; see, for example Hull and White (1987), Heston

(1993), Bates (2000) and Pan (2002). This ease of evaluation contrasts with the more

computationally intensive Monte Carlo methods needed to price options based on certain

conditional autoregressive heteroscedasticity (GARCH) formulations, in which the option

price is estimated as an average over many simulation paths; see Hafner and Herwartz

(2001), Bauwens and Lubrano (2002) and Martin, Forbes and Martin (2005), amongst

others.

Although the relative simplicity of pricing options under SV models is being increas-

ingly matched by that of new computationally efficient pricing methods suitable for al-

ternative time-varying volatility specifications (e.g. Heston and Nandi, 2000, Duan and

Simonato, 2001, and Lim, Martin and Martin, 2005) the SV formulation (with and with-

out jumps) remains the preeminent type of model used in the option pricing literature. In

particular, much attention is given to developing methods for extracting estimates of the

parameters of an SV model from observed option prices, including an estimate of the price

of volatility risk factored into those prices by market participants; see, for example, Guo

2

(1998), Chernov and Ghysels (2000), Pan (2002) and Eraker (2004). Of particular con-

cern in that literature is the complexity that results from both the option and spot price

processes being a function of a high-dimensional vector of unobserved latent volatilities.

Different contributions to the literature are distinguished, one from the other, primarily

in terms of how the proposed inferential methods manage that complexity.

This paper proposes a new Bayesian method for conducting inference within the frame-

work of the Heston (1993) class of SV models, using both option prices and spot prices on

the underlying asset. Posterior densities are produced for the parameters of the volatility

model, for the latent volatilities and for the market price of volatility risk. The method

involves augmenting the probability density function for a panel of option prices with the

density function describing the bivariate process for the spot price and the volatility (see

also Polson and Stroud, 2003, and Eraker, 2004). Posterior results are produced via a

hybrid Markov Chain Monte Carlo (MCMC) sampling algorithm. As part of this algo-

rithm, candidate draws of the volatilities are obtained via the application of the Kalman

filter and simulation smoother to a linearization of a non-linear state-space representa-

tion of the model. Information from both the spot and option prices affects the draws

via the specification of a bivariate measurement equation. For the options-based com-

ponent of this bivariate specification, a time series of implied volatilities, produced via

the Black-Scholes (BS) model, is taken as a noisy measurement of the evolution of the

assumed stochastic volatility process. The ‘multi-move’ approach to sampling the latent

volatilities is motivated by the Bayesian state space literature, in which such block sam-

plers are shown to be superior to alternative ‘single-move’ approaches (e.g. Carter and

Kohn, 1994, Frühwirth-Schnatter, 1994, de Jong and Shephard, 1995, and Durbin and

Koopman, 2002), and contrasts with the single-move algorithms of Jones (2003), Polson

and Stroud (2003) and Eraker (2004).

In addition to producing posterior point and interval estimates for the unknown ele-

ments of the full Heston model, we also present methods for comparing this model with

alternative specifications nested in the Heston framework. Specifically, we use various

Bayesian model selection criteria to rank the unrestricted Heston model against models

in which, respectively, volatility risk is not priced and a leverage effect is not accommo-

dated. We also compare these three SV models with the constant volatility BS model,

also obtained as a restricted version of the Heston specification. The selection criteria

3

constitute posterior odds ratios, constructed via Savage-Dickey density ratios (e.g. Koop

and Potter, 1999), as well as measures based on fit, predictive and hedging error densities.

The idea of Bayesian model averaging in an option pricing context is also introduced. The

methodology is applied to spot and option price data for News Corporation, as observed

over the period 1998 to 2001.

The outline of the paper is as follows. In Section 2, we define the European call option

price, contrasting the theoretical value of this price under the BS and Heston models.

In Section 3, we describe our Bayesian inferential method, including the hybrid MCMC

scheme that we adopt. Details of the bivariate Kalman filter and simulation smoother

used in the generation of candidate draws of the volatility vector are outlined. In Section

4 we outline the criteria used to rank the various volatility models, including the constant

volatility BS specification. Section 5 includes a description of the News Corporation data

to be used in the empirical demonstration of the method, followed by an outline of the

numerical results. We provide some concluding comments in Section 6.

2 Option Pricing

2.1 Black-Scholes Option Pricing

An option is an asset that gives the owner the right to either buy (call option) or sell (put

option) the underlying asset at some future point in time, at some pre-specified exercise,

or strike price, K. For the European call option that is the focus of this paper, the owner

has the right to buy the underlying asset (or exercise the option) only when the option

matures, at the future time point T. In this case, the price of the option is given by the

expected value of the discounted payoff of the option,

q = Et

£e−rτ max (S (T )−K, 0)

¤, (1)

where S (T ) denotes the spot price of the asset at the time of maturity, τ is the length

of the option contract, or time to maturity, and r denotes the risk-free interest rate

assumed to hold over the life of the option. Given the assumption that underlies the

BS model, namely that the asset price follows a geometric diffusion, and assuming an

absence of riskless arbitrage opportunities, a portfolio made up of both the option and

the underlying asset can be constructed in such a way that the return on the portfolio

4

is equal to r. The possibility of such a portfolio implies that options are priced as if

investors were risk-neutral, with the conditional expectation in (1) evaluated not under

the empirical, or ‘objective’ distribution, but under the risk-neutral distribution for S (T )

as a consequence; see Campbell, Lo and McKinley (1997, Chp. 9) and Hull (2000, Chp.

11) for more details. Risk-neutral pricing in the BS case effects a shift in the drift term

in the diffusion process for the logarithm of the spot price by a factor of r − µ, where µ

is the actual rate of return on the underlying asset. As such, the risk-neutral conditional

distribution of S (T ), given S (t), inherits the lognormal form of the objective distribution,

but with r replacing µ in the specification of this distribution. In this case, the expectation

in (1) has the form

qBS = S (t)Φ(d1)−K−rτΦ(d2), (2)

where d1 = (ln(S (t) /K) + (r + 0.5σ2) τ) /σ√τ , d2 = d1−σ

√τ , σ is the constant diffusion

(or volatility) parameter, and Φ(.) denotes the cumulative normal distribution. Note that

given an observed option price, the formula in (2) may be inverted to find an implied

value of the volatility parameter σ, as the latter is the only unknown element that affects

qBS.

2.2 Option Pricing Under The Heston SV Model

The assumption of a stochastic process for volatility introduces an additional element of

randomness into the option pricing problem, which needs to be accommodated when pro-

ducing a theoretical option price via the type of arbitrage arguments referred to above; see

Ingersoll (1987) for some general discussion of this issue. Heston (1993) uses a particular

set of arguments to produce a theoretical price of the form

qH = S(t)P1 −Ke−rτP2. (3)

5

The terms P1 and P2 in (3) are functions of the unknown quantities that characterize the

assumed risk-neutral bivariate spot price and volatility process, given by

dS(t) = rS(t)dt+pv(t)S(t)dε2(t) (4)

and

dv(t) = κ[θ − v(t)]dt+ σvpv(t)dε1(t), (5)

where ε1(t) and ε2(t) are correlated Weiner processes with correlation parameter ρ, and θ

is the long-run mean of v(t), to which v(t) reverts at rate κ > 0. The unknown quantities

that impact on qH at time t thus constitute: the parameters of the square root volatility

process in (5), namely κ, θ and σv, the correlation parameter ρ, as well as the unobservable

latent variance, v(t); see Heston for details.

The objective bivariate process associated with (5) is given by

dS(t) = µS(t)dt+pv(t)S(t)dε2(t) (6)

and

dv(t) = κa[θa − v(t)]dt+ σvpv(t)dε1(t), (7)

with the parameters in (5) and (7) related as follows:

θ =κaθa

κa + λ, (8)

κ = κa + λ. (9)

The volatility risk premium, λ(S(t), v(t)), is assumed to be proportional to v(t), that is,

λ(S(t), v(t)) = λv(t). (10)

A negative value for λ and, hence, for the risk premium, is often observed empirically.

With λ < 0, (5) implies slower reversion to a higher long-run mean than is implied by the

actual variance process in (7).

6

Given (8) and (9) it is clear that observed option prices can be used to estimate both

the parameters of the objective process and λ only if additional identifying information

on the actual process is incorporated in the inferential procedure. As is shown in Section

3, we achieve this identification by augmenting the density function associated with the

assumed generating process for the option prices with the density function that describes

the dynamics of the objective process for the spot prices and volatilities.

3 Bayesian Inference for the Heston Model

3.1 Specification of the Joint Posterior Density Function

We begin discussion of the Bayesian inferential method with details of the approach used

to estimate the general Heston model. Having developed a methodology for the general

model, analysis of the nested models follows in a straightforward manner, as detailed in

Section 4.1

Given that Bayesian inferences are to be produced in part from observed market

option prices, option prices need to be assigned a particular distributional model. Letting

Ci represent the ith observed market price of the call option and ri, Ki, τ i and Si represent

the observable factors that affect the ith option price, the option pricing model is specified

as

Ci = β0 + β1qH(ri, Ki, τ i, Si, φ) + ui, i = 1, 2, . . . , N. (11)

The term ui in (11) is an unobservable pricing error, assumed to have a normal dis-

tribution with zero mean and variance, σ2u, N is the number of observed option prices

and qH(ri, Ki, τ i, Si, φ) is the ith theoretical option price as defined in (3).2 The vector

φ = (vt, κa, θa, σv, λ, ρ)

0 comprises all unobservable elements that characterize qH , includ-

ing the latent variance. The index i in (11) indicates both variation over time and variation

across option contracts at a given point in time. However, in specifying the joint density

function for the spot prices and volatilities, as associated with a discretized version of (6)

1As the main aim of the paper is methodological, we choose to focus on the Heston model and threevariants, rather than expanding the model to accommodate the jump processes that may be needed insome empirical settings. Results on the empirical usefulness of allowing for random jumps in the assetprocess and/or the volatility process are rather mixed; see, for example, Eraker (2004).

2More sophisticated models describing the deviations between observed and theoretical option pricesare possible; see, for example, Bates (1996, 2000), Eraker (2004) and Martin, Forbes and Martin (2005).

7

and (7), we use only end-of-day spot prices. As such, the spot price process is assumed to

describe movement in the volatilities and spot prices from day to day. Hence the vector

of latent variances, v = (v1, v2, . . . , vn)0, has dimension equal to the number of days, n

say, in the pooled sample of option price data, where n < N. The tth element of v, vt,

enters the formulae for all qH associated with option contracts on day t. Variation in vt

across the day is not modelled. However, the spot price that enters qH for the ith option

contact is the price that is observed as synchronously as possible with the option price.

Consequently, there are N observations (not necessarily distinct) on Si that are used in

the specification of the generating process for the option prices, whilst only n closing

spot prices are used in the specification of the generating process for the spot prices and

volatilities.

Grouping all observable factors that influence the ith option price together in a vector

zi = (ri,Ki, τ i, Si)0, (11) becomes

Ci = β0 + β1qH(zi, φ) + ui, i = 1, 2, . . . , N, (12)

The presence of ui in (12) reflects the fact that the theoretical option model, qH(zi, φ), is

only an approximation of the process that has lead to the determination of an observed

option price. That is, ui encompasses ‘model error’. It may also encompass ‘market error’,

in which an observed option price differs from its theoretical counterpart as the result of

factors such as, for example, the non-synchronous recording of spot and option prices,

transaction costs and other market frictions.3 The joint density function for the vector of

option prices c = (C1, C2, . . . , CN)0, conditional on the known vector z = (z1, z2, . . . zN)0

and on the unknown φ, is thus given by

p(c|z,φ, β, σu) = (2πσ2u)−N/2NYi=1

exp

µ− 1

2σ2u[Ci − [β0 + β1qH(zi, φ)]]

2

¶, (13)

with β = (β0, β1)0.

As already noted, in order to produce simultaneous inference about the parameters

of both the risk-neutral and objective processes or, equivalently, about the parameters of

3To rule out arbitrage, the distribution of Ci should strictly speaking be truncated at a lower bound oflbi = max{0, Si−e−riτ iKi}. Since experimentation has established that the truncation has only minimalimpact on inferences, we choose to ignore it in the estimation procedure. We do however, invoke thetruncation when producing the predictive densities used in ranking alternative models.

8

the objective process and the market price of volatility risk, information about the way

in which the observed spot prices have evolved needs to be incorporated in the inferential

procedure. We incorporate this information by augmenting the data generating process in

(13) with a discretized version of the bivariate spot price and volatility process in (6) and

(7). This augmentation also serves to identify the process to which the estimated volatil-

ities must adhere. The vector of (closing) spot prices associated with the n days in the

sample is defined as s = (S1, S2, . . . , Sn)0. Suppressing the dependence of p(c|z, φ, β, σu)on all elements of z other than s (including all other intraday synchronous spot prices

that are used to calculate qH(zi, φ) for each i) and defining the vector δ = {φ, µ, β, σu} asthe full set of unknowns in the problem, the joint posterior density for δ is thus specified

as

p(δ|c, s) ∝ p(c|s, v, ωa, λ, ρ, β, σu)× p(s, v|ωa, ρ, µ)× p(ωa, ρ, λ, µ, β, σu)

∝ p(c|s, v, ωa, λ, ρ, β, σu)× p(s|v, ωa, ρ, µ)× p(v|ωa)

×p(ωa, ρ, λ, µ, β, σu). (14)

To simplify notation, the parameters of the objective volatility process in (7) are grouped

as ωa = (κa, θa, σv)0. The form of p(v|ωa) in (14) is determined via a Euler discretization

of (7),

vt = vt−1 + κa(θa − vt−1)∆t+ σv√vt−1√∆tε1t

= κaθa∆t+ (1− κa∆t)vt−1 + σv√vt−1√∆tε1t (15)

ε1t ∼ N(0, 1); 2, 3, . . . n,

with the initial value set equal to the long-run mean, θa. By convention, both the variances

and the parameters of the volatility model enter the theoretical option price formula,

qH(zi, φ), in annualized form. As (15) describes day-to-day movements in the annualized

variance, we set ∆t = 1/252 (years), assuming 252 trading days in the year; see also

Chernov and Ghysels (2000).4

4See Eraker, Johannes and Polson (2003) for evidence that the daily interval is small enough to renderthe discretization bias arising from (15) negligible.

9

The form of p(s|v, ωa, ρ, µ) in (14) is determined via an Euler discretization of the

logarithmic transform of the diffusion in (6), namely

lnSt = lnSt−1 + (µ− 0.5vt−1)∆t+√vt−1√∆tε2t, (16)

ε2t ∼ N(0, 1); 2, 3, . . . n, (17)

conditional on a specified value for lnS1. From (16) and (17), and from the fact that ε1tand ε2t are correlated, it follows that

p(s|v, ωa, ρ, µ) = (2π)−(n−1)/2[(1− ρ2)∆t]−(n−1)/2

×nQt=2

1

St√vt−1

exp

½−1/[2(1− ρ2)∆t](

lnSt − µlnSt.vt√vt−1

)2¾

(18)

where

µlnSt.vt = (lnSt−1 + [µ− 0.5vt−1]∆t) +ρ

σv(vt − [θaκa∆t+ (1− κa∆t)vt−1]). (19)

We further specify that

p(ωa, ρ, λ, µ, β, σu) = p(κa)× p(θa)× p(σv)× p(ρ)× p(λ)× p(µ)× p(β)× p(σu). (20)

That is, it is assumed that all parameters in the model are a priori independent.5

3.2 A Hybrid Gibbs-MH Algorithm

3.2.1 Overview of the Algorithm

Due to the large number of unknowns in the model and the manner in which they are

related, analytical treatment of the joint posterior distribution in (14) is not possible,

and an MCMC algorithm has been developed. To implement the MCMC algorithm, the

parameters in δ are ‘blocked’ into eight groups as follows: v, κa, θa, σv, λ, ρ, µ and

5The assumption of independence between λ and ρ could be questioned, given prior knowledge of thepossible relationship between the signs of λ and ρ. It is maintained, however, for the sake of computationalconvenience.

10

(β, σu).6 Starting values are chosen, and a Gibbs-based MCMC algorithm is then applied

to produce successive draws of the unknowns via the respective conditional posteriors:

1. p(v|ωa, λ, ρ, µ, β, σu, c, s)

2. p(κa|v, θa, σv, λ, ρ, µ, β, σu, c, s)

3. p(θa|v, κa, σv, λ, ρ, µ, β, σu, c, s)

4. p(σv|v, κa, θa, λ, ρ, µ, β, σu, c, s)

5. p(λ|v, ωa, ρ, β, σu, c, s)

6. p(ρ|v, ωa, λ, µ, β, σu, c, s)

7. p(µ|v, ωa, ρ, s)

8. p(β, σu|v, ωa, λ, ρ, c, s)

In the description of each conditional, we make explicit the relevant conditioning elements.

For example, the conditionals of λ and (β, σu) do not depend on µ and the conditional of

µ does not depend on any aspect of the model that relates to the option prices, namely

(β, σu), λ and c.

Given the assumption of normality for the error term in (17), in combination with a

uniform prior for µ, standard algebraic manipulations lead to a conditional posterior for

µ with mean and variance respectively,

bµ =⎛⎜⎜⎝

nPt=2

y∗t x∗t

nPt=2

(x∗t )2

⎞⎟⎟⎠× 1

∆t; σ2µ =

⎛⎜⎜⎝ (1− ρ2)nPt=2

(x∗t )2

⎞⎟⎟⎠× 1

∆t,

6Depending on the size of the sample to which the method is applied, the vector v may be furtherdivided into smaller blocks, in order to increase the acceptance rates associated with this component ofthe algorithm. However, we describe the simulation of v below in terms of the full vector, with details ofthe further blocking that is applied in the context of the empirical application provided in Section 5.

11

where

y∗t =

¡lnSt − µlnSt.vt

¢√vt−1

x∗t =1

√vt−1

,

with µlnSt.vt as defined in (19). Simulated values of µ are thus readily obtainable via the

generation of normal random variates. Using the expression for the joint density of the

vector of option prices in (13) and adopting the following noninformative prior for (β, σu),

p(β, σu) = p(β)× p(σu)

∝ 1

σu, (21)

the joint conditional posterior for (β, σu) has the usual normal-gamma form. The normal

conditional for β, p(β|σu, v, ωa, λ, ρ, c, s), has mean and variance,

bβ = (X 0X)−1X 0c; var(β) = σ2u(X0X)−1, (22)

respectively, where X = (ι, qH), with ι a (N × 1) vector of ones and qH denoting the

(N×1) vector of theoretical option prices with ith element qH(zi, φ). The inverted gammadensity for σu, p(σu|v, ωa, λ, ρ, c, s), has degrees of freedom νσu = (N − 2) and parametersσu =

q(c−Xbβ)0(c−Xbβ)/νσu; see, for example, Zellner (1996, Appendix A.4). Draws

of (β, σu) can be obtained from the joint conditional using standard simulation algorithms.

All of the six other conditionals listed above are non-standard, with MH subchains

being applied to produce draws. An MH algorithm involves sampling from the actual

conditional indirectly via a candidate distribution that is both easy to sample from and

a good match for the conditional distribution. The candidate draw is then accepted as

a draw from the actual conditional with a given probability; see, for example, Chib and

Greenberg (1996) and Geweke (1999). The MH draws are inserted into the outer Gibbs

chain, with convergence of the hybrid chain to the full joint posterior distribution ensured,

given the satisfaction of the usual convergence criteria; see Tierney (1994).

The most difficult conditional to draw from efficiently is p(v|ωa, λ, ρ, µ, β, σu, c, s). This

difficulty derives mainly from the complexity of the functional relationship between the

12

theoretical option price qH and the volatility at each point t which, in turn, produces a

complex non-linear relationship between the vector of observed option prices, c, and the

state vector v. The candidate distribution for v that we propose is based on a particular

linearization of this component of the model, with the BS implied variances used as proxies

for observed option prices. Details of this approach are given below. Draws from the other

nonstandard conditionals are produced via random walk MH algorithms; see Chib and

Greenberg (1996) for details. All posterior quantities of interest are calculated from the

full set of MCMC iterates, excluding those in the burn-in part of the chain. Marginal

posteriors are estimated from the simulated values for each parameter of interest using

kernel smoothing.

3.2.2 A Bivariate Kalman Filter and Smoother for Sampling the Volatilities

The discretized model in (15) and (16) can be written as a nonlinear state space model.

The measurement of the (continuously compounded) spot return, ∆ lnSt, depends upon

the unobservable volatilities vt and vt−1. To introduce information from the option prices,

we augment this model by specifying a second observation equation, in which the implied

BS variance for day t, vBSt , is viewed as a noisy linear measurement of the true stochastic

variance, vt. An implied BS variance is obtained by equating (2) with an observed price

and solving for the parameter σ2. In order to produce a vector of implied BS variances

that matches the dimension of v, the BS variances are averaged over the day, with a vector

of n BS variances produced as a result. The variance of this second measurement is fixed,

and denoted by σ2imp. A state space representation of the augmented model is therefore

given by the following two bivariate equations

∙∆ lnStvBSt

¸=

"ρσv−³∆t2+ ρ(1−κa∆t)

σv

´1 0

# ∙vtvt−1

¸+

∙µ∆t− ρθaκa∆t

σv

0

¸+

∙ pvt−1∆t (1− ρ2)ε2t

σimpε3t

¸(23)

and

∙vtvt−1

¸=

∙1− κa∆t 0

1 0

¸ ∙vt−1vt−2

¸+

∙θaκa∆t0

¸+

∙σv√vt−1∆tε1t0

¸, (24)

13

with ε3t ∼ N(0, 1), 1, 2, . . . n. To eradicate the nonlinear dependence of both the return,

∆ lnSt, and the current stochastic variance, vt, on the previous stochastic variance, vt−1,

we replace vt−1 in the error specification in both (23) and (24) with its long-run mean,

θa. It is this linearized approximation to (23) and (24) that is used in the MH algo-

rithm. Candidate draws of the volatility vector are generated using the Kalman filter

and smoothing algorithm of Carter and Kohn (1994). Note that the evaluation of the

candidate density in the MH ratio at the previously generated volatility vector requires

a rerunning of the Kalman smoothing algorithm using this vector to obtain the updated

means and variances.7

4 Incorporating Model Uncertainty

4.1 Alternative Models

The Heston model nests three alternative models for volatility associated respectively with

λ = 0, ρ = 0 and σv = 0. Setting λ = 0 is equivalent to imposing the assumption that

volatility risk is not priced. This assumption is invoked in the early stochastic volatility

analysis of Hull and White (1987) for computational convenience. It has however been

challenged by more recent work, in which estimates of λ that differ significantly from zero

have been reported (see Guo, 1998 and Eraker, 2004, amongst others). An assessment of

this restricted model thus amounts to an assessment of the attitude to volatility risk that

is implicit in option prices.

The model obtained by setting ρ = 0 implies a lack of skewness in the underlying spot

price process. In particular, it implies a lack of the so-called ‘leverage effect’ (associated

with ρ < 0), whereby negative returns are accompanied by an increase in volatility. An

assessment of this restricted model corresponds to an assessment of whether or not returns

are skewed and/or option prices have factored in skewed returns.

Finally, the restriction σv = 0 implies constant volatility.8 Given the assumption

of normal returns, this restriction equates to the assumption of BS option pricing. An

assessment of the empirical validity of this restriction thus amounts to an assessment of

7The Kalman filter need not be rerun, only the smoothing algorithm.8This restriction only implies constant variance in equilibrium. Hence, we also impose the restriction

in our algorithm that υ0 = θa, i.e. that the initial variance is equal to the equilibrium variance.

14

the validity of the BS model.

We refer to the alternative models corresponding to the restrictions λ = 0, ρ = 0

and σv = 0 as respectively M2, M3 and M4, and to the full Heston model as M1. In this

section, several criteria that are used to rank these alternative models in the Section 5 are

described. These criteria are used to supplement the results obtained by simply estimating

the full model, M1, and testing the restrictions via the construction of interval estimates

for each of the relevant parameters. The concept of averaging across the alternative

models, in particular with a view to improving predictive performance, is also discussed.

In what follows, we refer to the vectors of unobservables associated with the four models,

M1, M2, M3 and M4, as δ1, δ2, δ3 and δ4 respectively. These vectors are in turn defined

as follows:

1. δ1 = {v, ωa, λ, ρ, µ, β, σu}

2. δ2 = {v, ωa, ρ, µ, β, σu}

3. δ3 = {v, ωa, λ, µ, β, σu}

4. δ4 = {θa, µ, β, σu}

Due to the nested structure of these models, we can also re-express δ1 as, alternatively,

δ1 = {δ2, λ}, δ1 = {δ3, ρ} or δ1 = {δ4, v, κa, σv, λ, ρ}.

4.2 Bayes Factors using the Savage-Dickey Density Ratio

In the Bayesian framework, the four alternative models, M1, M2, M3 and M4, can be

ranked according to the magnitude of their respective posterior probabilities. In the

present context, the data on which posterior inference is based comprises the vector of

option prices, c, and the vector of spot prices, s. Hence, the ranking occurs via the

posterior model probabilities, P (M1|c, s), P (M2|c, s), P (M3|c, s) and P (M4|c, s). Theseprobabilities can, in turn, be derived via the set of posterior odds ratios for each model,

relative to the reference model, M1, subject to the restriction that the posterior probabil-

ities add to one. Given equal prior odds for all models, the posterior odds ratio for Mk

versus M1 reduces to the Bayes factor for Mk versus M1,

BFk1 =p(c, s|Mk)

p(c, s|M1); k = 2, 3, 4, (25)

15

where p(c, s|Mk) denotes the marginal likelihood for model Mk and is defined as

p(c, s|Mk) =

Zδk

p (c|s, δk,Mk) p(s|δk,Mk)p(δk|Mk)dδk; k = 1, 2, . . . 4. (26)

In the expression for p(c, s|Mk) in (26), δk denotes the vector of unobservables that char-

acterize modelMk, p (c|s, δk,Mk) denotes the joint density for the option prices underMk,

p(s|δk,Mk) denotes the joint density for the spot prices under Mk and p(δk|Mk) denotes

the prior density for δk under Mk. For models M1, M2 and M3, δk includes the vector

of stochastic variances. Hence, for these models, p(δk|Mk) is equal to the product of the

density for the stochastic variances, given the parameters, and the prior density for the

parameters.

In the present context, there is no closed form expression for p(c, s|Mk). Various

numerical approaches to the estimation of marginal likelihoods have been proposed; see

Geweke (1999). However, such numerical procedures would be particularly burdensome

in the present context, in particular due to the presence of the n−dimensional vector oflatent variances, v, in both the reference model and two of the alternative models,M2 and

M3. Fortunately, as each of the alternative models considered here are nested within the

Heston model,M1, (25) can be expressed in a particularly simple analytical form, namely

as the Savage-Dickey (SD) density ratio; see Verdinelli and Wasserman (1995) and Koop

and Potter (1999). For the case of M2 versus M1, the SD density ratio is given by:

SD21 =p(λ = 0|M1, c, s)

p(λ = 0|M1), (27)

where p(λ = 0|M1, c, s) denotes the ordinate of the marginal posterior for λ under the He-

ston model,M1, evaluated at λ = 0 and p(λ = 0|M1) denotes the ordinate of the marginal

prior for λ under M1, evaluated at λ = 0. For (27) to be an equivalent representation of

the Bayes factor for M2 versus M1, BF21, it must hold that

p(δ2|M1, λ = 0) = p(δ2|M2), (28)

where p(δ2|M1, λ = 0) denotes the density for δ2 in the model M1 with λ = 0 imposed.

The condition in (28) can be interpreted as the requirement that the prior distribution

over all unknowns in the Heston model, M1, conditional on λ = 0, is equivalent to the

prior assigned to these unknowns in the submodel, M2, in which λ = 0 is imposed from

16

the outset. The assumption of a priori independence between λ and all other parameters

in the Heston model (see (20)), means that this condition is satisfied. In addition, the

prior for λ must be proper, and to avoid indeterminacy of the SD21 density ratio, this

prior should have a nonzero ordinate at λ = 0.

For analogous reasons, the Bayes factor for the case of M3 (ρ = 0) versus M1 reduces

to

SD31 =p(ρ = 0|M1, c, s)

p(ρ = 0|M1), (29)

and the Bayes factor for the case of M4 (σv = 0) versus M1 reduces to

SD41 =p(σv = 0|M1, c, s)

p(σv = 0|M1). (30)

4.3 Fit and Predictive Performance

For modelMk with parameter vector δk, the residual associated with fitting the ith option

price, Ci, is given by

resi = Ci − [β0 + β1q(zi, φk)] , i = 1, 2, . . . , N, (31)

where q(zi, φk) denotes the theoretical option price associated with model Mk, k =

1, 2, . . . 4, and δk = {φk, µ, β, σu}. ForM1 the appropriate price is qH(zi, φ1), as defined in

(3). ForM2 andM3, the price is qH(zi, φ2) and qH(zi, φ3) respectively; that is, the Heston

option price, but with λ = 0 and ρ = 0 respectively imposed. For M4, the price is the

theoretical BS option price, given by

qBS(zi, φ4 = θa) = SiΦ (d1)−Kie−riτiΦ (d2) , (32)

with d1 and d2 as defined after (2), and with σ =√θa.

The quantity resi is a nonlinear function of the underlying parameters and latent

volatilities contained in φk, as well as being a linear function of β = (β0, β1)0, conditional

on φk. Hence, its posterior distribution can be derived via the appropriate transformation

of the posterior distribution of φk and β. As outlined in Section 3.2.1, p(β|σu, φk, c, s) isnormal, with mean and variance given in (22). As such, the conditional posterior for resi,

p(resi|σu, φk, c, s), is also normal, with mean

E(resi|σu, φk, c, s) = Ci −hbβ0 + bβ1q(zi, φk)i

17

and variance

var(resi|σu, φk, c, s) = σ2u [1, q(zi, φk)] (X0X)−1 [1, q(zi, φk)]

0 ,

with bβ0 and bβ1 being the elements of the two-dimensional vector bβ in (22) and X as

defined following (22). The marginal posterior of resi is given by

p(resi|c, s) =Zσu

Zφk

p(resi|σu, φk, c, s)p(σu, φk|c, s)dσudφk,

which can, in turn, be estimated via B MCMC draws of σu and φk, σ(j)u and φ

(j)k , j =

1, 2, . . . , B, as

p(resi|c, s) =1

B

BXj=1

p(resi|σ(j)u , φ(j)k , c, s).

For M1, the appropriate algorithm is the full MCMC scheme as described in Section 3.2.

For M2 and M3, the algorithm is a reduced version of that scheme, with λ = 0 and

ρ = 0 respectively imposed. ForM4, iterates of δ4 = {θa, µ, β, σu} are generated from theposterior density for δ4, given by

p(δ4|c, s) ∝ p(c|s,θa, β, σu)× p(s|θa, µ)× p(θa, µ),

where p(c|s,θa, β, σu) is given by

p(c|s,θa, β, σu) = (2π)−N/2σ−NuNYi=1

exp

µ− 1

2σ2u[Ci − [β0 + β1qBS(zi, θ

a)]]2¶

and p(s|θaµ) by

p(s|θa, µ) = (2π)−n/2(θa∆t)−n/2 ×nQt=2

1

St×

exp©(−1/[2θa∆t]) (lnSt − [lnSt−1 + (µ− 0.5θa)∆t])2

ª. (33)

The prior density, p(θa, µ), is specified as being uniform over both parameters, truncated

from below at θa = 0. Draws of δ4 can be produced via the conditionals for µ and

θa respectively. Since the conditional for µ is normal, with mean and variance given

respectively by bµ = Pnt=2(lnSt − lnSt−1 + 0.5θa∆t)

n− 1 × 1

∆t

18

and

σ2µ =

µθa

n− 1

¶× 1

∆t,

draws for µ given θa are readily obtained. Draws of θa conditional on µ can be obtained

via a simple approximation of the one-dimensional conditional distribution function for θ

(i.e. via ‘Griddy Gibbs’), with the boundary at θa = 0 imposed on the draws.

The proportion of 50% and 95% resi intervals that cover zero can be calculated for

each modelMk, with the best fitting model being the one for which this proportion is the

closest to the nominal level. If the observed option prices used to define (31) belong to the

vector c, the fit assessment is within-sample. If not, the fit assessment is out-of-sample.

The latter is the form of fit assessment used in the Section 5.

Given the distributional assumption in (11), for model Mk the predictive density for

an out-of-sample option price, Cf say, is given by:

p(Cf |c, s) =Zδk

p(Cf |c, s, φk, β, σu)p(δk|c, s)dδk, (34)

where p(Cf |c,s, φk, β, σu) is a normal density with mean β0+ β1q(zf , φk) and variance σ2u

and p(δk|c, s) is the joint posterior density for parameters of model Mk. The notation zf

is used to denote the known factors associated with the future option contract f. In order

to impose the lower bound that Cf must exceed in order for arbitrage opportunities to

be avoided, in the construction of (34) we specify p(Cf |c,s, φk) as the truncated normaldensity

p(Cf |c,s,φk, β, σu) =(2πσ2u)

−1/2

(1− Φ(slbf))exp

µ− 1

2σ2u[Cf − [β0 + β1q(zf , φk)]]

2

¶, (35)

where

slbf =lbf − [βo + β1q(zf , φk)]

σu

is the standardized version of the no-arbitrage lower bound, lbf = max{0, Sf−e−rf τfKf};see Hull (2000, Chp. 7). Again using the simulation output from the algorithm appropriate

for modelMk, repeated draws from p(δk|c, s), δ(j), j = 1, 2, . . . , B, can be used to constructan estimate of p(Cf |c, s) as

dp(Cf |c, s) =1

B

BXj=1

p(Cf |c,s, φ(j)k , β(j), σ(j)u ). (36)

19

In Section 5, prediction intervals constructed from (36) are used to rank the predictive

performance of the models.

4.4 Hedging Performance

An important measure of the performance of the alternative volatility models is the extent

to which they produce small hedging errors. In this paper we focus on the errors associated

with single instrument hedge portfolios, in which movements in the underlying spot price,

St, are hedged against by taking the appropriate position in a single option contract, with

price Ct, at time t. In this case the resulting cash position with minimum variance is

Ct −NkSt, (37)

where Nk denotes the number of shares in the underlying asset in which the investor goes

long for every call option in which the investor goes short, assuming model Mk. In the

case of the Heston model, M1, N1 is defined as

N1 =∂qH∂St

+ρσvSt

∂qH∂vt

,

where∂qH∂St

= P1

and∂qH∂vt

= St∂P1∂vt−Ke−rtτ

∂P2∂vt

, (38)

with qH as defined in (3); see Bakshi, Cao and Chen (1997) and Chernov and Ghysels

(2000) for more details. The hedging error over one day, say, for the Heston model, H1, is

defined as the difference between the minimum variance hedge portfolio, as constructed

on day t and invested at the risk-free rate rt, and the value of that same portfolio when

unwound one day later, namely,

H1 = (Ct −N1St) ertτ − (Ct+1 −N1St+1) , (39)

where Ct+1 and St+1 denote respectively the option and spot prices on day t + 1. Since

H1 is a function of the unknown parameters and variances, via N1, the posterior density

of H1, p(H1|c, s), can be estimated from the MCMC output associated with estimation of

20

the Heston model. That is, draws of φ1 can be used to produce draws of H1, which can

then be used to produce a nonparametric kernel estimate of p(H1|c, s). The distributionof hedging errors over any time period can be constructed in an analogous way.

The same sort of exercise can be performed for each of the alternative models, M2,

M3 and M4. For the first two of these alternative models, the relevant posterior densities

for the hedging errors, p(H2|c, s) and p(H3|c, s), can be estimated from the output of theMCMC schemes in which λ = 0 and ρ = 0 are imposed respectively. In the case of M4,

specification of N4 = Φ (d1) , with d1 as defined after (2), renders the portfolio in (37)

delta-hedged; see Hull (2000). For the three SV models, the derivatives with respect to vtwhich enter (38) need to be computed numerically. The best performing model according

to this criterion is the model with the hedging error density most closely concentrated

around zero.

4.5 Model Averaging

As described in Section 4.2, the posterior probability of the four alternative models,

P (Mk|c, s), k = 1, 2, . . . , 4, can be estimated from the option and spot prices. These

posterior probabilities can be used to produce a model-averaged predictive density, which

can, in turn, be used as the tool for prediction rather than the predictive associated with

any one particular model. The typical rationale for model averaging is based on the

acknowledgement that no one model is ever representative of the ‘true’ data generating

process. In the current context, there is a further rationale, namely that option prices are

determined by the interaction of market participants who, themselves, potentially invoke

different distributional assumptions. For both reasons, the model-averaged predictive may

well have better coverage properties than the predictives associated with specific mod-

els.9 Given model-specific predictive densities, p(Cf |Mk, c, s), k = 1, . . . , 4, the averaged

predictive density, pa(Cf |c, s), is defined as:

pa(Cf |c, s) =4X

k=1

p(Cf |Mk, c, s)P (Mk|c, s). (40)

9Model-averaged fit and hedging error densities could also be produced. We focus only on the model-averaged predictive density as it has a clear interpretation as an inferential tool and, hence, better servesto illustrate the potential benefits of model averaging.

21

5 Numerical Illustration: News Corporation OptionPrices

5.1 Data Description

The methodology is demonstrated using data on News Corporation option and equity

trades over the four year period: 1998 to 2001. All option and spot price data has been

obtained from the Australian Stock Exchange. In order to use the option data to produce

inferences on the day to day movements in the underlying spot price process, option trades

are selected from one particular period of time each day, namely the last hour of trading.10

A range of option prices that maximizes the moneyness spread is then selected from the

set of prices observed during this period.11 Only options with maturities of between

15 and 90 days are included in the dataset. A total of 10904 observations are used for

estimation, with a maximum of 60 prices selected from any one day in the within-sample

period. The out-of-sample period constitutes the nine trading days from 11 December,

2001 to 21 December 2001, with 855 trades used for the out-of-sample assessments. Since

equity trades on News Corporation stock occur very frequently, for each option trade

it is possible to obtain a virtually simultaneous equity price: usually recorded within a

few seconds of the option trade. When several equity trades are recorded at exactly

the same time, a weighted average is taken, with the weights determined by the trading

volume. Observations on the discretized process for equity prices are taken as the average

of the spot prices observed during the last hour of trading. We use a single interest rate

observation for each day, rt, t = 1, 2, . . . , n, where rt denotes the 3 month bond rate on

day t.12 The dividends paid on News Corporation shares are typically about 0.1% of share

value, paid 6-monthly. The impact of dividends on share prices is therefore so small that

they have only been taken into account as a constant continuous discount factor. The

assumption of a constant variance for the pricing errors in (11) is justified as no discernible

pattern in the variance, across moneyness in particular, is found.

10That is, we attempt to minimize the variation across the day in Si.11The moneyness of an option contract refers to the relationship between the current spot price and

the strike price (K). Broadly speaking, options are said to be out-of-the-money if Si < K, in-the-moneyif Si > K and at-the-money if Si = K.12The interest rate data can be sourced at the Reserve Bank of Australia website: www.rba.gov.au.

22

5.2 Numerical Results

5.2.1 Posterior Results for the Heston Model

Table 1 presents point and interval estimates of both the parameters and selected variances

associated with the full Heston model (M1). All results are produced by running the

MCMC algorithm for 10000 iterations, with the first 2000 iterations discarded. The

elements of the full vector v are selected in blocks with an average size of 10. The actual

block lengths at each iteration are chosen randomly; see Shepherd and Pitt (1997) and

Strickland, Forbes and Martin (2005) for details. The starting values for the parameters,

chosen via preliminary experimentation with the algorithm, are as follows: κa = 5.0;

θa = 0.15; σv = 0.6; λ = −0.2; ρ = 0.2; µ = 0.16; β = (0, 1)0 and σu = 0.1. A normal

prior distribution is specified for λ, with mean and standard deviation respectively −0.2and 1.0. This prior assigns weight to a wide range of values for λ, in the positive and

negative regions, but centres the distribution in the negative region, as accords with

several empirical estimates of λ reported in the literature. A normal prior distribution

is also imposed for ρ, with mean and standard deviation of −0.2 and 0.3 respectively.These prior parameters imply only a very small prior probability of ρ falling beyond the

bounds of ±1 and, again, centre the distribution in the negative region, as tallies withthe typically negative empirical estimates of ρ , from returns data in particular. The

bounds of ±1 are imposed on the posterior distribution by way of discarding any drawsof ρ that fall beyond them. An exponential prior distribution is specified for σv, with

the single prior parameter specified as follows. Using News Corporation returns data

for the period 1992 to 2001, GARCH(1,1) models are estimated for each year and the

(annualized) conditional variance series for each year produced. The standard deviation

of each of the ten conditional variance series is recorded. The exponential prior for σvis then calibrated such that the maximum of these ten standard deviation values is the

median of the distribution for σv. This calibration produces a value of approximately 0.1

for the prior parameter which, in turn, produces a prior ordinate at zero, p (σv = 0|M1), as

used in the Bayes factor in (30), equal to approximately 10.0. Uniform priors are specified

for κa and θa, bounded at zero from below. As already noted a uniform prior is also

specified for µ, with the noninformative prior used for (β, σu)0 given in (21).

Table 1 reports the mean, mode and (approximate) 95% Highest Posterior Density

23

(HPD) interval13 for each of the parameters and selected random variances.14 The point

estimates of κa correspond to daily persistence measures of (1 − 5.480/252) = 0.9783

and (1 − 5.633/252) = 0.9777 respectively, with the interval estimate translating into

a (daily) persistence interval of (0.976, 0.979). All estimates are thus well within the

realm of typical returns-based estimates of such measures. For instance, the persistence

measure from a GARCH(1,1) model estimated for News Corporation returns over the

1998-2001 period is 0.968. The point estimates of θa correspond to an estimate of long-

run volatility of approximately√0.145 = 0.381. This value is somewhat smaller than the

unconditional mean of volatility associated with a GARCH(1,1) model estimated over the

1998-2001 period, namely 0.453. The point estimates of λ are negative, as is typical for this

parameter. This result is consistent with the market viewing volatility as a ‘negative-beta

investment’; see, for example, Guo (1998) and Bates (2000). The estimated probability of

λ < 0 is approximately equal to 0.76. However, as is clear from the interval estimate of λ,

although support for a negative value for the risk-premium parameter is quite strong, the

interval does cover both positive and zero values for λ, in addition to negative values. The

estimates of σv imply a moderate degree of excess kurtosis in the returns process, whilst

the estimates of ρ indicate that a small degree of (positive) returns skewness is implied

by the joint options and spot datasets.15 The point estimates of µ imply an (annualized)

rate of return on the News Corporation stock of approximately 22− 27%, well in excessof the returns-based mean rate of return of 15% estimated for the 1998 to 2001 period.

However, the interval estimate indicates the extent of the uncertainty associated with

13An HPD interval is one with the specified probability coverage, whose inner ordinates are not exceededby any density ordinates outside the interval. The intervals reported in Table 1 are approximate HPDintervals in that the kernel smoothing procedure used to estimate the marginal posteriors does not alwaysenable the ordinate condition described here to be satisfied exactly. Furthermore, for densities that aremultimodal the 95% intervals are contructed so that the ordinates of the lower and upper bounds areas close as possible to being equal, subject to the restriction that the tail probabilities sum to 5%.This implies that there may be ordinates within the interval that are smaller than ordinates beyond theinterval.14As β and σu are essentially nuisance parameters, we do not devote space to them in the reporting

of results in the text. However, we note that the posterior mean estimates of these parameters arerespectively (−0.063, 1.016) and 0.031.The acceptance rates for the MH subchains are as follows: κa:0.740; θa : 0.220; σv : 0.485; λ : 0.474; ρ : 0.314. The proportion of times that at least one element of thefull vector v is updated is 0.889.15Given that the returns in question are on the S&P500 market index, one might anticipate that the

negative value of λ would be accompanied by a negative value of ρ, given that ρ is a measure of thecorrelation between volatility and the market portfolio; see Bates (2000).

24

Table 1: Marginal Posterior Density Results for the Heston Model

Mode Mean 95% HPD interval

Parameter

κa 5.480 5.633 (5.170, 6.140)θa 0.144 0.147 (0.136, 0.157)σv 0.719 0.713 (0.634, 0.790)λ -0.049 -0.147 (-0.631, 0.247)ρ 0.164 0.163 (0.121, 0.205)µ 0.270 0.219 (-0.220, 0.650)

Variance

v100 0.138 0.145 (0.133, 0.162)v500 0.171 0.175 (0.156, 0.194)v990 0.130 0.127 (0.121, 0.135)

estimation of this parameter, with 95% probability assigned to the range (−0.220, 0.650).The point estimates of the variances reported in Table 1 vary over the sample period, but

are broadly consistent with the estimates of the long-run mean parameter θa.

5.2.2 Model Ranking

In this section we apply the methods discussed earlier to rank the three restricted models,

M2, M3 and M4, and the full model M1. From the interval estimates reported in Table 1,

it is evident that two of the three restricted models,M3 andM4, are rejected by the data,

with neither of the intervals covering the relevant parameter restrictions. The interval

estimate for λ however covers the value of λ = 0, thereby providing some evidence in

favour of M2.

To supplement these results, we estimate the three submodels, assessing the out-of-

sample performance of each along the lines discussed in Sections 4.3 and 4.4, as well as

constructing Bayes factors for each relative to M1. The details of the algorithm used to

25

Table 2: Bayes Factors and Model ProbabilitiesEntry (i, j) Indicates the Bayes Factor

in Favour of Mj versus Mi

M1 M2 M3 M4

(Heston) (λ = 0) (ρ = 0) (σv = 0)

M1 1.000 4.723 6.878× 10−236 2.899× 10−305M2 1.000 1. 456× 10−236 6. 138× 10−306M3 1.000 4. 216× 10−70M4 1.000

P (Mk|c, s) 0.175 0.825 0.000 0.000

estimate M2 and M3 are identical to those used to estimate M1, apart from the obvious

parameter restrictions associated with the nested models. Estimation of the BS model,

M4, in the manner described in Section 4.3, produces a marginal posterior for√θa that is

virtually degenerate around a single value, namely√θa = 0.369. With only this parameter

affecting the theoretical option prices, we report the fit, predictive and hedging results

conditional on this single value.

Table 2 reports the estimated Bayes factors for M2, M3 and M4, with M1 as the

reference model. The bottom row in the table gives the corresponding model probabilities

(reported to three decimal places), based on equal prior probabilities for all four models

and assuming that these four models span the model set. It is clear that the posterior

probabilities support M2, but with some weight also assigned to M1. This result tallies

with the reasonable posterior weight assigned to the region around λ = 0 by the marginal

posterior of λ derived from the full model, M1. Negligible posterior weight is assigned to

both M3 and M4, again a result which tallies with the relevant results reported in Table

1. Most notably, the data provides essentially no support for the constant variance BS

model (M4).

The fit and predictive results reported in Table 3 represent the proportion of times

26

that each criterion is satisfied, for each model, for the 855 out-of-sample observations. If

the model is correctly specified, this proportion should approximate the nominal coverage

level. It should be noted however, that the out-of-sample option contracts do not repli-

cate exactly the maturity and moneyness characteristics of the within-sample contracts,

with this feature expected to produce some deviation between the nominal and empirical

coverages for all models. As is evident, models M1, M2 and M3 are all quite close to the

nominal level in the case of the 95% fit and prediction intervals. All three of these models

overstate the 50% nominal coverage of both the fit and prediction intervals. There is

negligible difference between the out-of-sample performance of these three versions of the

Heston SV model.

In comparison with the SV model variants, the BS model is clearly inferior in terms

of out-of-sample fit, with zero not covered by any of the 50% intervals and only negligible

coverage by the 95% intervals. These results reflect the fact that the theoretical BS prices

used in the construction of the ith residual quantity, as per (31), are relatively inaccurate

estimates of the observed out-of-sample prices, for the hold-out sample considered. The BS

fit intervals are also very narrow, due to the degenerate nature of the estimated posterior

distribution for the BS volatility parameter√θa.16 However, when the additional variance

of the option pricing error is taken into account, the relative inaccuracy of the BS prices

is largely offset, with empirical coverages of the prediction intervals tallying reasonably

closely with the nominal values.

With regard to model-averaging, as the results in Table 2 indicate that only M1 and

M2 have non-negligible posterior probability, the weighting effectively occurs with respect

to the predictives of these models only. As these two models also have extremely sim-

ilar predictive behaviour, the averaging process has a minimal impact, with results not

reported as a consequence.

In Table 4 the hedging error results associated with the four models are reported.

Hedging errors are calculated using (39) for M1 and the version of (39) appropriate for

the remaining models, as described in the text immediately following (39). Errors are

calculated for one day and five days ahead, with the portfolio constructed at the end of

the estimation period and not rebalanced during the entire out-of-sample period. On each

16The only variation in the distribution of resi in the BS case arises from posterior variation in (β, σu).

27

Table 3: Fit and Predictive Performance Measures

Criterion(a) M1 M2 M3 M4

Zero in Interquartile Fit Interval(b) 0.629 0.630 0.625 0.000Zero in 95% Residual Interval(b) 0.991 0.991 0.993 0.002Cf in Interquartile Predictive Interval(b) 0.653 0.648 0.655 0.449Cf in 95% Predictive Interval(b) 0.995 0.995 0.989 1.000

(a) All figures represent proportions of 855.

(b) The (1 − α)% Interval is the interval which excludes α/2% in the lower and upper tails of thefit/predictive distribution. This interval equals the (1− α)% HPD interval only for those distrib-utions which are symmetric around a single mode.

of these days the hedging errors associated with all contracts are calculated. The errors

are then averaged across all contracts.17 It is these averaged hedging errors to which the

summary statistics in Table 4 relate.

The results in Table 4 make it clear that there is very little difference between all four

models according to this criterion. It is also clear that none of the HPD intervals cover

zero. That said, for all four models, the errors associated with the hedge portfolio are

minimal, in terms of both the point and interval estimates. These errors represent less

than 1% of the average magnitude of the option prices on the out-of-sample days. For

M1, M2 andM4, the magnitude of the errors increases over time without any rebalancing

taking place, as would be anticipated. ForM3 however, the errors five days out are slightly

smaller than those one day ahead, although both sets of errors are very small. The BS

model, M4, has the mean hedging error with the largest magnitude, five-days-ahead, and

the second largest mean error one-day-ahead.

17The data was not plentiful enough to construct meaningful hedging statistics for the different mon-eyness categories.

28

Table 4: Hedging Performance of the Different ModelsMeans of (Average) Hedging Error Densities with 95% HPD Intervals in Brackets ($)

M1 M2 M3 M(a)4

One Day Ahead

0.002 0.002 0.0044 -0.003(0.0016, 0.0028) (0.0016, 0.0027) (0.0042, 0.0047) n.a.

Five Days Ahead

-0.004 -0.004 0.0026 -0.007(-0.005, -0.002) (-0.006, -0.003) (0.0024, 0.0028) n.a.

(a) Since only a single point estimate of the BS volatility has been produced, HPD intervals cannotbe constructed.

6 Conclusions

In this paper a new methodology for producing option and spot price-based estimates

of the parameters of a stochastic volatility model is presented. The method has been

developed within the context of the Heston (1993) theoretical option pricing model and

certain variants thereof. The numerical scheme adopted exploits the state-space repre-

sentation of the Heston spot and volatility process. Simulation of the latent volatilities

occurs via the application of a bivariate Kalman filter and smoother, with information

from the option prices impacting on the filter via a measurement equation in which the

BS implied volatilities proxy the option prices. Construction of Bayes factors, in addition

to fit, prediction and hedging intervals, enables the alternative variants of the proposed

model to be ranked on the basis of observed option and spot price data.

Application of the methodology to Australian News Corporation stock and options

data produces estimates of the parameters of the Heston model that imply a very persis-

tent volatility process, with the degree of volatility in volatility indicating that a certain

amount of excess kurtosis characterizes the spot price data and/or has been factored into

the options data. Positive skewness in returns also features in the numerical results. The

29

model that imposes a zero premium for volatility risk is given some support by the data,

with this variant of the Heston model being assigned the highest posterior probability

of all models considered. However, the unrestricted Heston model, in which a non-zero

risk premium is allowed, is also given non-negligible posterior weight, with both point

and interval estimates of the relevant parameter suggesting that a negative risk premium

may have been factored into option prices over this period. The constant volatility BS

model is also clearly rejected by the data in the sense that the point and interval es-

timates of the variance of the Heston volatility process are non-zero, and the posterior

probability assigned to this model is essentially zero. The out-of-sample fit performance

of the BS model is also markedly inferior to that of all variants of the SV model, although

the predictive performance of the BS model is quite reasonable once allowance is made

for the variation in observed option prices. There is little to choose between any of the

models in terms of hedging performance, although a richer data set might allow for more

discrimination between specifications in terms of hedging results associated with different

contracts on the moneyness spectrum.

References

[1] Bakshi, G., C. Cao and Z. Chen. (1997). “Empirical Performance of Alternative

Option Pricing Models.” Journal of Finance, 52, 2003-2049.

[2] Bates, D.S. (1996). “Jumps and Stochastic Volatility: Exchange Rate Processes Im-

plicit in PHLX Deutsche Mark Options.” The Review of Financial Studies, 9, 69-107.

[3] Bates, D.S. (2000). “Post-87 Crash Fears in the S&P 500 Futures Option Market.”

Journal of Econometrics, 94, 181-238.

[4] Bauwens, L. and Lubrano, M. (2002), “Bayesian Option Pricing Using Asymmetric

GARCH Models.” Journal of Empirical Finance, 9, 321-342.

[5] Black, F. and M. Scholes. (1973). “The Pricing of Options and Corporate Liabilities.”

Journal of Political Economy, 81, 637-659.

[6] Campbell, J.Y., Lo, A.W. and MacKinlay, A.C. (1997). The Econometrics of Finan-

cial Markets, Princeton University Press, New Jersey.

30

[7] Carter, C.K. and Kohn, R. (1994), “On Gibbs Sampling for State Space Models,”

Biometrica, 81, 541-553.

[8] Chernov, M. and E. Ghysels. (2000). “A Study Towards a Unified Approach to the

Joint Estimation of Objective and Risk Neutral Measures for the Purpose of Options

Valuation.” Journal of Financial Economics, 56, 407-458.

[9] Chib, S. and E. Greenberg. (1996). “Markov chain Monte Carlo simulation methods

in Econometrics.” Econometric Theory, 12, 409-431.

[10] Cox, J.C., J.E. Ingersoll and S.A. Ross. (1985). “A Theory of the Term Structure of

Interest Rates.” Econometrica, 53, 385-408.

[11] De Jong, P. and N.G. Shephard. (1995). “The Simulation Smoother for Time Series

Models.” Biometrika, 82, 339-350.

[12] Duan, J-C. and Simonato, J-G. (2001). “American option pricing under GARCH by

a Markov chain approximation.” Journal of Economic Dynamics and Control, 25,

1689-1718.

[13] Durbin, J. and Koopman, S.J. (2002), “A Simple and Efficient Simulation Smoother

for Time Series Models.” Biometrika, 81, 341-353.

[14] Eraker, B. (2004). “Do Stock Prices and Volatility Jump? Reconciling Evidence from

Spot and Option Prices.” The Journal of Finance, LIX, 1367-1403.

[15] Eraker, B., M.J. Johannes N.G. and Polson. (2003). “The Impact of Jumps in Returnsand Volatility.” Journal of Finance, 53, 1269-1300.

[16] Frühwirth-Schnatter, S. (1994), “Data Augmentation and Dynamic Linear Models.”

Journal of Time Series Analysis, 15, 183-202.

[17] Geweke, J. (1999). “Using Simulation Methods for Bayesian Econometric Models:

Inference, Development and Communication.” Econometric Reviews, 18, 1-73.

[18] Guo, D. (1998). “The Risk Premium of Volatility Implicit in Currency Options.”

Journal of Business and Economic Statistics, 16, 498-507.

31

[19] Hafner, C.M. and Herwartz, H. (2001), “Option Pricing Under Linear Autoregressive

Dynamics, Heteroskedasticity and Conditional Leptokurtosis.” Journal of Empirical

Finance, 8, 1-34.

[20] Heston, S.L. (1993). “A Closed-form Solution for Options with Stochastic Volatility

with Applications to Bond and Currency Options.” The Review of Financial Studies,

6, 327-343.

[21] Heston, S. and Nandi, S. (2000), “A Closed-Form GARCHOption Valuation Model.”

Review of Financial Studies, 13, 585-625.

[22] Hull, J.C. (2000). Options, Futures, and Other Derivative Securities. 4rd ed., New

Jersey: Prentice Hall.

[23] Hull, J.C. and A. White. (1987). “The Pricing of Options on Assets with Stochastic

Volatilities.” Journal of Finance, 42, 281-300.

[24] Ingersoll, J.E.Jr. (1987). Theory of Financial Decision Making. Rowman & Littlefield

Studies in Financial Economics, USA.

[25] Jones, C. (2003). “The Dynamics of Stochastic Volatility: Evidence from Underlying

and Options Markets.” Journal of Econometrics, 116, 181-224.

[26] Koop, G. and S. Potter. (1999). “Bayes Factors and Nonlinearity: Evidence from

Economic Time Series.” Journal of Econometrics, 88, 251-281.

[27] Lim, G.C., Martin, G.M. and V.L. Martin, 2005, “Parametric Pricing of Higher Order

Moments in S&P500 Options.” Journal of Applied Econometrics, 20, 377-404.

[28] Martin, G.M., Forbes, C.S. and V.L. Martin, 2005, “Implicit Bayesian Inference

Using Option Prices.” Journal of Time Series Analysis, 26, 437-462.

[29] Pan, J. (2002). “The Jump-risk Premia Implicit in Options: Evidence from an Inte-

grated Time-series Study.” Journal Of Financial Economics, 63, 3-50.

[30] Polson, N. G. and Stroud, J. R. (2003). “Bayesian Inference for Derivative Prices.”

Bayesian Statistics, 7, 641-650.

32

[31] Shephard, N. and M. Pitt. (1997). “Likelihood Analysis of Non-Gaussian Measure-

ment Time Series.” Biometrica, 84, 653-667.

[32] Strickland, C.M., C.S. Forbes and G.M. Martin. (2005). “Bayesian Analysis of the

Stochastic Conditional Duration Model.” Forthcoming, Computational Statistics and

Data Analysis (Special Issue on Filtering and Signal Extraction).

[33] Tierney, L. (1994). “Markov Chains for Exploring Posterior Distributions.” The An-

nals of Statistics, 22, 1701-1762.

[34] Verdinelli, I. and L. Wasserman. (1995). “Computing Bayes Factors Using a General-

ization of the Savage-Dickey Ratio.” Journal of the American Statistical Association,

90, 614-618.

[35] Zellner, A. (1996). “An Introduction to Bayesian Inference in Econometrics”, Wiley

Classics Library Edition, New York.

33

inference for a class of stochastic volatility models using ...gmartin/forbes_etal_2005.pdf(2001),...

Documents