efficient bayesian estimation for garch-type models via sequential … · 2020. 2. 20. ·...

Efficient Bayesian estimation forGARCH-type models via Sequential

Monte Carlo

Dan LiBachelor of Actuarial Studies

Bachelor of Finance

Master of Actuarial Studies

School of Mathematical SciencesScience and Engineering Faculty

Queensland University of Technology

2020

submitted in fulfillment of the requirements for the degree of

Master of Philosophy

Statement of Original Authorship

The work contained in this thesis has not been previously submit-

ted to meet requirements for an award at this or any other higher

education institution. To the best of my knowledge and belief,

the thesis contains no material previously published or written by

another person except where due reference is made.

Signature:

Date: 03 Jan 2020

i

QUT Verified Signature

Abstract

Bayesian statistics is increasing in popularity due to its ability to quantify uncer-

tainty in parameter estimates and model predictions in a principled manner. Se-

quential Monte Carlo (SMC) is an efficient method for sampling from the posterior

distribution in Bayesian statistics. SMC may be preferred over the popular Markov

chain Monte Carlo (MCMC) method for certain classes of problems, since SMC is

easily parallelisable, adaptable and more efficient in representing complex posterior

distributions. However, the useful properties of Bayesian statistics and SMC are

yet to be fully realised in certain areas of Econometrics. This research exploits

the advantages of SMC to develop parameter estimation and model selection meth-

ods for GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) style

models. This approach provides an alternative method for quantifying estimation

uncertainty relative to classical inference. We demonstrate that even with long time

series, the posterior distributions of model parameters are non-normal, highlighting

the need for a Bayesian approach and an efficient posterior sampling method. Ef-

ficient approaches for both constructing the sequence of distributions in SMC, and

leave-one-out cross-validation for long time series data, are also proposed. Finally,

we develop an unbiased estimator of the likelihood for the Bad Environment-Good

Environment model, a complex GARCH-type model, which permits exact Bayesian

inference not previously available in the literature.

iii

Keywords

Markov chain Monte Carlo; Time series analysis; Volatility distribution; Prediction;

Cross-validation; Evidence; Data annealing

v

Acknowledgements

First and foremost, I would like to express my great gratitude to my supervision

team, without their help it would have been difficult to complete this thesis. I

would like to thank my principal supervisor, Associate Professor Chris Drovandi,

for his constant support and guidance throughout this research journey. I am very

much grateful to Chris for his patience, encouragement, enthusiasm, and immense

knowledge. I would like to thank my associate supervisor, Professor Adam Clements,

who has given his time to advise me on this research project and offered me insightful

suggestions and expert guidance.

I would like to acknowledge the financial support of the Australian Research Council

Australian Centre for Mathematical and Statistical Frontiers (ACEMS) scholarship

throughout my research project.

I would like to thank Leah South who has spent much time helping me with my

coding and has encouraged me a lot at the early stage of my research journey. I

would also like to express my appreciation to my other friends at QUT, for their

help, support, caring, and laugher. Finally, I would like to thank my family for their

unconditional love.

vii

Contents

Declaration i

Abstract iii

Keywords v

Acknowledgements vii

Chapter 1 Introduction 1

1.1 Motivation and research problem . . . . . . . . . . . . . . . . . . . . 1

1.2 Aims and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Research contributions and significance . . . . . . . . . . . . . . . . 3

1.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 2 Background 5

2.1 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Markov chain Monte Carlo (MCMC) . . . . . . . . . . . . . . . . . . 6

2.3 Sequential Monte Carlo (SMC) . . . . . . . . . . . . . . . . . . . . . 8

2.4 GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Introduction to GARCH family models . . . . . . . . . . . . 14

2.4.2 Classical GARCH(1,1) model . . . . . . . . . . . . . . . . . . 15

2.4.3 GJR-GARCH(1,1) model . . . . . . . . . . . . . . . . . . . . 16

2.4.4 Bad Environment-Good Environment (BEGE) model . . . . 17

2.4.5 Parameter estimation methods . . . . . . . . . . . . . . . . . 18

2.5 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Chapter 3 A fully Bayesian approach for GARCH-type models 23

ix

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Unbiased estimation for the BEGE likelihood . . . . . . . . . . . . . 25

3.3 Efficient sequence construction method . . . . . . . . . . . . . . . . . 27

3.4 Bayesian posterior predictive distribution . . . . . . . . . . . . . . . 28

3.4.1 Bayesian posterior predictive distribution . . . . . . . . . . . 28

3.4.2 Forecast through maximum likelihood estimation (MLE) ap-

proach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.3 Forecast evaluation . . . . . . . . . . . . . . . . . . . . . . . . 29

3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5.1 Posterior Distributions . . . . . . . . . . . . . . . . . . . . . . 32

3.5.2 Posterior Predictive Distribution . . . . . . . . . . . . . . . . 39

3.5.3 Model Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Chapter 4 Conclusion 47

4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Future research directions . . . . . . . . . . . . . . . . . . . . . . . . 48

Appendix Importance sampling estimator of likelihood of the BEGE model 49

References 54

x

Chapter 1

Introduction

1.1 Motivation and research problem

The popularity of Bayesian statistics has increased in recent years because it provides

quantification uncertainty in model parameter estimates and model predictions, and

it tactfully avoids analytically intractable integrals via Monte Carlo methods. How-

ever, it can be challenging to devise an efficient algorithm for sampling from the pos-

terior distribution. Up until now, the most popular and extensively used sampling

method in Bayesian statistics is the Markov chain Monte Carlo (MCMC) method

(Metropolis et al., 1953; Hastings, 1970). An alternative method called sequential

Monte Carlo (SMC) for static Bayesian models (Chopin, 2002; Del Moral et al.,

2006) has been developed in recent times, which has been regarded as a comple-

mentary method to MCMC. SMC may be preferred over MCMC for certain types

of problems, since SMC is easily parallelisable, adaptable, more effective in repre-

senting irregular posterior distributions (South et al., 2019), and the outcomes from

the SMC with a sequence of distributions introduced by data annealing (Chopin,

2002) can be easily utilised for making posterior predictions if needed. Despite the

advantages of SMC being widely known to statisticians, the methods have not been

adopted and studied in some applied fields. The aim of this research is to realise

the potential of SMC in the field of Econometrics.

An important area in Econometrics involves the modelling of financial time series

data in order to undertake empirical analysis of financial market data (Mills and

Markellos, 2008). There are various econometric models developed and one of the

best-known and widely used models is the Generalized AutoRegressive Conditional

1

Heteroscedasticity (GARCH, Bollerslev 1986) process for modelling the dynamics

of volatility of financial asset returns. Frequentist inference based on maximum

likelihood estimation (MLE) is the most common approach to statistical inference

for the GARCH class of models. This standard approach is limited in that it only

provides point estimates of parameters, and the uncertainty in these estimates is

typically based on asymptotic theory where the MLE is assumed to be normally

distributed (Hamilton, 1994). In addition, the frequentist approach can only produce

point estimation of forecasts of volatility.

To address these limitations, we develop a Bayesian approach to statistical inference

for the GARCH class of models. In Bayesian statistics, the unknown parameters

are regarded as random variables, and the corresponding parameter posterior distri-

butions that quantify the uncertainty of parameters are constructed. The MCMC

method has been adopted for Bayesian analysis of the GARCH class of models (see,

for example, Kim et al. (1998), Vrontos et al. (2000), Takaishi (2009)). However,

as MCMC has some drawbacks and limitations, such as the difficulty of tuning

associated parameters, the conventional MCMC algorithm can become less effec-

tive. Therefore, we exploit the advantages of SMC to develop a novel and effi-

cient fully Bayesian approach for the widely used GARCH class of models. This

Bayesian approach is used for parameter estimation, prediction, and model selec-

tion for GARCH-type models. Finally, an unbiased likelihood estimator for the

Bad Environment-Good Environment (BEGE, Bekaert et al. (2015)) model is de-

veloped, which allows exact Bayesian inference for the first time for this popular

model. Specifically, this research addresses the following research questions:

1. How is a fully Bayesian approach developed for GARCH-type time series

models with the SMC?

2. How does this Bayesian approach differ from the existing frequentist ap-

proach and what advantages can be explored from this Bayesian approach?

3. How does the SMC approach outperform the conventional MCMC methods

for GARCH-type models?

2

1.2 Aims and objectives

The overall aim of this research is to apply and investigate the performance of the

SMC methods for Econometric models of interest, so that the awareness of the SMC

methods can be expanded in the Econometrics field. Specifically, in this research

a fully Bayesian approach for the GARCH class of models will be established by

adopting the SMC methods. To achieve this overall aim, the following research

objectives will be pursued.

− To develop efficient parameter estimation and model selection methods for

GARCH style models using SMC, which can quantify estimation uncertainty.

− To develop efficient approaches for making one-step ahead prediction, and

leave-one-out cross-validation that is used for evaluating the predictive perfor-

mance, for long time series data.

− To obtain an unbiased estimator for the likelihood of the BEGE model in order

to perform exact Bayesian inference.

1.3 Research contributions and significance

This thesis contributes in establishing an efficient fully Bayesian approach for the

widely used GARCH-type models using SMC methods, which has obvious advan-

tages over the standard approaches previously applied to such econometrics models.

This approach provides an alternative method for quantifying estimation uncertainty

relative to standard asymptotic assumptions typically made with classical statistical

inference. We demonstrate that even with long time series, the posterior distribu-

tion for our models of interest can be far from normally distributed, underlining the

need for a Bayesian approach and an efficient method sampling from the posterior

distributions. In addition to delivering more information about the parameters, this

approach also provides full predictive distributions of the unobservable volatility,

which offers more information that cannot be provided by the classical approach.

Moreover, an unbiased estimator of the likelihood for the BEGE model is proposed,

which facilitates exact Bayesian inference. We also contribute to the SMC literature

by exploring the most efficient way of constructing the sequence of distributions in

3

SMC for long time series data and showing how SMC can be used for efficient leave-

one-out cross-validation for time series data. Much of the content of this thesis has

been written up as a paper (Li et al., 2019) and is currently being considered for

publication.

1.4 Thesis outline

The structure of the thesis is organized as follows.

Chapter 2 provides background of Bayesian statistics, existing sampling methods,

model selection methods, and the econometrics models considered in the thesis.

Chapter 3 describes how to efficiently estimate parameters, make forecasts, and

select the best models from candidate GARCH-type models through a newly de-

veloped Bayesian approach based on SMC. To make exact Bayesian inference for

the BEGE model, the methods for developing an unbiased estimator of the BEGE

likelihood are provided here. An efficient strategy to choose the sequence of interme-

diate probability distributions in SMC for long time series data is also described. A

comparison of one-step ahead volatility forecasts produced using a fully Bayesian ap-

proach, a classical approach, and a data analytic approach involving a leave-one-out

cross-validation method for time series data for the purpose of evaluating predictive

performance is also provided. Finally, the results from using this fully Bayesian

approach are analysed. Supplementary material for Chapter 3 is provided in the

Appendix at the end of the thesis.

Chapter 4 summarises the methods and results, and also discusses limitations and

avenues for future research.

4

Chapter 2

Background

This chapter provides background of Bayesian statistics, MCMC and SMC sampling

methods, and model selection methods that will be used throughout this thesis.

Three specific GARCH-type models of interest are discussed in more detail in this

chapter.

2.1 Bayesian Statistics

Bayes’ theorem captures learning from observed data and experience. Here, y rep-

resents the observed data and θ represents the set of parameters of a model, which

is believed to have generated y. The likelihood function based on the chosen model

is f(y|θ), which is the probability distribution of the observed data y given the

unknown parameters θ. Before seeing the observed data, the information regarding

the parameters is reflected in a prior distribution π(θ). The posterior distribution

of θ is given by:

π(θ|y) = f(y|θ)π(θ)Z

,

where

Z = f(y) =∫θf(y|θ)π(θ)dθ,

which is the normalising constant of the posterior that is independent of θ. The

normalising constant is difficult to compute in practice, but is often not required

when sampling from the posterior.

There are many advantages of Bayesian statistics over frequentist statistics that

have been discussed in the literature. The Bayesian approach treats the unknown

parameters as random variates and offers quantification of uncertainty through the

5

posterior distribution, which is a full distribution on the parameters and thus it is

possible to make any statistical inference of interest without limiting just to point

estimates and intervals. According to Berger (2013), Bayesian analysis incorporates

prior information with observed data and effectively uses the prior information, es-

pecially when the available prior information is significant. Under the Bayesian

approach, inference is conditional on the actual data, however, the parameter prob-

abilities of frequentist inference are based on all possible random samples that could

have occurred but not the actual ones (Bolstad and Curran, 2016).

When our interest is in generating samples of θ from the posterior, the normalising

constant Z can be ignored and then the posterior function of θ can be expressed as:

π(θ|y) ∝ f(y|θ)π(θ).

Given that directly sampling from the posterior distribution π(θ|y) is generally

infeasible, the Markov chain Monte Carlo (MCMC) method (Metropolis et al., 1953;

Hastings, 1970) described below is the most widely used sampling method.

2.2 Markov chain Monte Carlo (MCMC)

MCMC produces an ergodic Markov chain whose stationary distribution is the pos-

terior distribution itself. If the chain is run for a sufficiently long time, then the

samples produced by the chain can be regarded as correlated samples from the

posterior distribution. One standard MCMC algorithm is the Metropolis-Hastings

(MH) algorithm, which is straightforward to implement. The MH algorithm is pre-

sented in Algorithm 2.1. Under the MH algorithm, a proposal distribution q(θ∗|θ)

should be carefully chosen, where the distribution of the new candidate value θ∗

depends on the current state θ. The decision of accepting or rejecting θ∗ depends

on the following acceptance probability:

α(θ → θ∗) = min(

1, f(y|θ∗)π(θ∗)q(θ|θ∗)f(y|θ)π(θ)q(θ∗|θ)

).

The reader is referred to Chib and Greenberg (1995) for the theoretical details of

MH algorithms.

6

Algorithm 2.1 Metropolis-Hastings1: Initialise θ0

2: for all iterations t ∈ 1...T do

3: Given the current parameter value θ = θt−1, draw proposed θ∗ ∼ q(θ∗|θ)

4: Calculate the acceptance probability:

α(θ → θ∗) = min(

1, f(y|θ∗)π(θ∗)q(θ|θ∗)

f(y|θ)π(θ)q(θ∗|θ)

),

5: Set θt ← θ∗ with probability α(θ → θ∗), otherwise set θt ← θ.

6: end for

When the likelihood f(y|θ) is intractable or it is computationally costly to directly

evaluate the likelihood, a pseudo-marginal MH approach can be used by constructing

an unbiased estimator of the likelihood f(y|θ) (Andrieu and Roberts, 2009). The

unbiased estimator f(y|θ) can be defined as

f(y|θ) := f(y|θ,u),

E[f(y|θ,u)

]=∫f(y|θ,u)f(u)du = f(y|θ),

where u represents the auxiliary random variables with a density function of f(u),

which are required to produce the unbiased likelihood estimator. In most imple-

mentations of pseudo-marginal MH these auxiliary random variables u are sim-

ply random numbers, drawn independently from θ. In addition, by assumption,

f(θ,u) = f(u)π(θ). Then the estimated joint posterior distribution of θ and the

random numbers can be written as

π(θ,u|y) = f(y|θ,u)f(u)π(θ)f(y) .

7

Now, the exact posterior distribution can be obtained by marginalizing u out from

π(θ,u|y): ∫π(θ,u|y)du =

∫f(y|θ,u)f(u)π(θ)

f(y) du

= π(θ)f(y)

∫f(y|θ,u)f(u)du

= π(θ)f(y)f(y|θ)

= π(θ|y).

Accordingly, we can apply MH MCMC to sample from the estimated target distri-

bution π(θ,u|y) and marginalize this joint distribution by abandoning the output

of the generated auxiliary variates u. Under the pseudo-marginal MH approach, the

stationary distribution of the Markov chain will still be the same as the one from the

true likelihood if it had been available. Therefore, the exact likelihood f(y|θ) can be

replaced by its unbiased estimator f(y|θ), with the corresponding MCMC algorithm

still converging to the posterior distribution π(θ|y) as demonstrated above.

Although MCMC is an undoubtedly useful method, it does have some limitations.

The proposal distribution q(θ∗|θ) can be difficult to tune for computationally ef-

ficient performance in the presence of challenging posterior distributions, such as

those that are strongly non-normally distributed and possibly multimodal. Adap-

tive MCMC methods (e.g. Haario et al., 2001, 2006; Roberts and Rosenthal, 2009)

can help to improve the efficiency of MCMC. However, they are delicate from a theo-

retical perspective and optimal acceptance rates targeted by some adaptive methods

are based on strong posterior distribution assumptions, which may not hold in chal-

lenging applications. An alternative method named sequential Monte Carlo (SMC)

was designed, which could be viewed as a complementary method to MCMC. SMC

has several advantages over MCMC that make it more appealing for some appli-

cations. Further, running multiple algorithms targeting the same posterior can be

used to verify the results of each algorithm. A more detailed description of this

method is given in the next section.

2.3 Sequential Monte Carlo (SMC)

Here, we are focusing on the use of SMC for static Bayesian models (Chopin, 2002).

The SMC discussed in this thesis can be viewed as an extension of the importance

8

sampling (IS, see e.g. Neal 2001) method, therefore we will firstly explain the method

of IS here. Suppose that directly sampling from our target posterior distribution

π(θ|y) is difficult, but we can easily sample from an importance distribution g(θ)

constructed to approximate the target distribution. Then we can draw a collection

of independent samples θi ∼ g(·) for i = 1, . . . , N and compute the importance

weights of these samples so that they reflect the target posterior distribution:

wi ∝ f(y|θi)π(θi)g(θi) .

Then we normalize the weights so that they sum to one, W i = wi/∑Nj=1w

j , and

now we have a weighted sample W i,θi from the target distribution. This weighted

sample can be used to estimate the density of our target distribution π(θ|y) and

any expectations with respect to it. The efficiency of this weighted sample can be

measured by the effective sample size (ESS, Liu (2008)), which refers to the num-

ber of independent and identically distributed samples among all the N generated

samples. The ESS is computed by:

ESS = N

1 + CV[w(θ)] ,

where CV[wt(θ)] is the coefficient of variation of the unnormalized weights of the

target distribution. The ESS can be approximated using the normalized weights as

1/∑Ni=1(W i)2. It is obvious that the value of ESS ranges from 1 to N . The value

of ESS is maximized when all the weights are equal, which means the ratio of the

target over the importance distribution equals one (i.e. the importance distribution

is exactly the target density). For IS to work well, the importance distribution

should be a good approximation to our target distribution and provide sufficient tail

coverage of the target. However, it is not always easy to find a good importance

distribution g(·), especially when the target is complex. SMC attempts to overcome

the drawbacks of IS by exploring a sequence of importance distributions.

In SMC, a set of N weighted particles,{W it ,θ

it

}Ni=1, are traversed through a sequence

of posterior distributions. Therefore, we need to build a sequence of distributions

πt(θ|y) for t = 0, . . . , T , which explore the ultimate target distribution gradually

9

and where πT (θ|y) equals π(θ|y). One method to construct the sequence of dis-

tributions is the data annealing strategy (Chopin, 2002). Under this method the

target sequence is created as πt(θ|y1:t) ∝ f(y1:t|θ)π(θ), where y1:t represents the

observation data up to time t. Another method called likelihood annealing (Neal,

2001) can also be used to generate the sequence when the whole observed dataset is

already available. The sequence formed by the likelihood annealing method is given

by:

πt(θ|y) = f(y|θ)γtπ(θ)Zt

∝ f(y|θ)γtπ(θ),

where γt is referred to as the temperature and 0 = γ0 ≤ · · · ≤ γt ≤ · · · ≤ γT = 1.

We use both of the methods in this thesis, however, the data annealing approach

may be preferred when data are observed sequentially. A more detailed analysis and

comparison between these two sequence constructions is demonstrated in Chapter

3. As with MCMC, unbiased likelihood estimators in SMC and exact Bayesian

inference (up to Monte Carlo error) can be produced (Chopin et al., 2013; Duan and

Fulop, 2015).

In order to generate a properly weighted sample for target t, we need to reweight the

particle set at target t − 1,{W it−1,θ

it−1}Ni=1. The reweighting step can be achieved

by IS using:

wit = W it−1w

it = W i

t−1ηt(θit−1|y)ηt−1(θit−1|y)

,

(Chopin, 2002), for i = 1, . . . , N where wit is an unnormalised weight, and ηt(·) is

the unnormalised version of πt(·). The so-called incremental weight wit is calculated

by dividing the unnormalised target density at t, ηt(θit−1|y), by the unnormalised

target density at t− 1, ηt−1(θit−1|y). Unfortunately, the reweighting step generally

leads to reductions in the ESS, so we need to resample the particles periodically by

duplicating particles with high weights and eliminating particles with low weights.

This resets the approximate ESS back to N .

However, the resampling strategy will result in duplication, especially for particle

values with relatively large weight. A moving step is required to increase the number

of unique particles. The moving step is usually the most computationally expensive

part of the SMC process, and the MCMC kernels of invariant distribution πt are

10

commonly employed in the literature. Generally, in static Bayesian models Rt itera-

tions of a multivariate normal random walk MCMC kernel (Chopin, 2002) are taken,

the approach adopted here is designed to diversify the particles. We propose to use

the MH MCMC algorithm in Algorithm 2.1 for the moving step. It is important to

note that the population of particles can be used to automatically tune the param-

eters of the MH proposal density. For example, if a multivariate normal random

walk proposal is used, the covariance matrix may be estimated from the population

of particles. As the proposals of particles can be rejected by the MCMC kernel, it

is then necessary to repeat the MCMC kernel Rt times to maintain a target level of

diversity.

According to Drovandi and Pettitt (2011), with a probability of 1 − c, Rt can

be determined adaptively so that there is at least one accepted proposal: Rt =

ceiling (ln (c)/ ln (1− pt−1)), where pt−1 is the aforementioned MCMC acceptance

probability at the previous move step. This method is straightforward to imple-

ment, but the previous MCMC acceptance probability may not be a good proxy

for the current MCMC acceptance probability. Furthermore, it focuses on accep-

tance rate rather than the distance the particles are moved. Salomone et al. (2018)

propose a more robust method that considers the distance the particles are moved

and using trial MCMC iterations at the current target in SMC to obtaining a more

suitable value of Rt.

Salomone et al. (2018) propose that Rt can be determined adaptively by choosing an

optimal tuning scale for the covariance matrix of the MCMC proposal distribution.

In this thesis the proposal distribution used in the moving step is a multivariate

normal distribution with mean θit and covariance Σ, q(θi,∗t |θit) = N (θit, h2Σt), where

the covariance Σ is estimated by the covariance matrix Σt obtained from the popu-

lation of particles and adjusted by the optimal scale h that needs to be determined

appropriately. The optimal scale is selected based on the expected square jumping

distance (ESJD; Sherlock and Roberts, 2009). We randomly assign each particle one

of several tuning scales and we estimate the ESJD for each particle using a single

MCMC iteration. The tuning scale which resulted in the highest median ESJD value

11

is selected as optimal. The ESJD is given by:

ESJD = α(θit → θ∗)Λit = α(θit → θ∗)(θit − θ∗)T Σ−1t (θit − θ∗),

which is the product of the MH acceptance ratio and the Mahalanobis square jump-

ing distance (Salomone et al., 2018), i = 1, . . . , N , and t = 0, . . . , T . After obtaining

the optimal scale, we keep applying an MCMC kernel to all the particles until a

target number of particles’ ESJD values are higher than a threshold. We set this

threshold at the median value of the Mahalanobis distances between each pair of

particles before the move step. As can be seen, the total number of MCMC itera-

tions can be adaptively attained after meeting the threshold. The SMC sampling

method is summarized in Algorithm 2.2.

12

Algorithm 2.2 SMC Sampler1: Sample θi0 ∼ π(·) and set W i

0 = 1/N for i = 1, . . . , N

2: for t = 1, . . . , T do

3: Reweight: wit = W it−1

ηt(θit−1|y)ηt−1(θit−1|y)

4: Set θit = θit−1 for i = 1, . . . , N

5: Normalization of the weights: W it = wit/

∑Nk=1w

kt for i = 1, . . . , N

6: Computation of ESS: ESS = 1/∑Ni=1(W i

t )2

7: if ESS < αN , with α ∈ (0, 1] then

8: Resample the particles and set W it = 1/N , producing a new weighted par-

ticle set{W it ,θ

it

}Ni=1

9: Compute Σt from the particle set{θit}Ni=1

10: Compute the median value of the Mahalanobis distance Ddesired between

each pair of particles

11: Move each particle with an MCMC kernel with randomly selected scales h

12: Estimate the ESJD for each particle, and set the optimal tuning scale h as

the one that produces the highest median ESJD

13: while The number of the particles that satisfies “ESJD > Ddesired” below

a target number do

14: Move each particle according to an MCMC kernel with the proposal dis-

tribution q(θi,∗t |θit)

15: end while

16: end if

17: end for

13

2.4 GARCH models

2.4.1 Introduction to GARCH family models

Research into GARCH type models remains a highly active area due to their success

in modelling the volatility of financial time series data (see, for example, Giraitis

et al. (2007), Alberg et al. (2008), Kristjanpoller and Minutolo (2015, 2016), and

Dyhrberg (2016)). The standard GARCH model introduced by Bollerslev (1986)

successfully established a dynamic model of time-varying volatility which captures

some of the important characteristics of financial asset returns. The features of

financial asset returns generally consist of unpredictability, numerous extreme values

indicating distributions with heavy tails, and volatility clustering (Engle, 1982). The

standard GARCH model has been shown to capture these features (Engle, 2004).

This thesis is focusing on the application of the GARCH model to stock returns.

Despite the long history of, and extensive literature on the classical GARCH model,

the model is not free of limitations. The classical GARCH model does not account

for the stylised feature of the leverage effect (Black, 1976), an asymmetric impact

of past negative and positive returns on future volatility. In many empirical studies

on stocks (Koutmos, 1998; Bouchaud et al., 2001; Khedhiri and Muhammad, 2008),

volatility increases by a larger amount resulting from a decrease in the stock return

than stock return increasing. However, the classical GARCH model only considers

the magnitude but not the sign of returns when modeling the conditional variance.

The standard GARCH model also imposes a strict non-negativity constraint on

coefficients to ensure a positive variance.

Numerous extensions have been proposed to address the problems and limitations of

the standard GARCH model. The exponential GARCH or EGARCH (Nelson, 1991)

successfully captures the asymmetric response of volatility to decrease and increase

of stock returns. Since the conditional variance under the EGARCH model is in

logarithmic form, the results are on the whole real line and the positivity constraints

of the parameters can be avoided. In this thesis the EGARCH model will not be

investigated in further detail. Instead, the GJR-GARCH (Glosten et al., 1993)

model, which is one of the most popular GARCH-type models and an improvement

over the classical GARCH model, is considered here. The GJR-GARCH model has

14

a simpler expression compared with the EGARCH model, and also accounts for

leverage effect. Engle and Ng (1993) suggest that the GJR-GARCH model captures

the leverage effect in stock returns best among other popular nonlinear models such

as the EGARCH model.

However, the GJR-GARCH model originally assumed a Gaussian distribution of

the return innovation. However, such a specification usually fails to generate a

non-Gaussian distribution for returns which is one of the most important properties

of financial time series (Francq and Zakoıan, 2011). Thus, we also consider an

alternative model that accommodates conditional non-Gaussianity and was based

on the GJR-GARCH model, which is called bad environment-good environment

(BEGE) model (Bekaert et al., 2015). The three GARCH-type models investigated

in this thesis are specifically presented below.

2.4.2 Classical GARCH(1,1) model

GARCH models are observation driven, where the conditional variance is a function

of not only it past values, but past innovations to returns. To begin, define the

innovations in returns, ut as a shock to returns:

rt = µ+ ut, (2.1)

ut = σtεt, εti.i.d.∼ N (0, 1) (2.2)

where rt is the time series of asset returns with fixed mean µ, with ut having a time

varying conditional variance σ2t , with different GARCH models defining different

processes for σ2t . In what follows we present three models within the GARCH family

considered in this paper.

Under the classical GARCH(p, q) model, the conditional variance is given by:

σ2t = α0 +

q∑j=1

αju2t−j +

p∑i=1

βiσ2t−i, t ∈ Z, (2.3)

where α0 > 0, αj > 0, βi > 0, p > 0, q > 0, i = 1, . . . , p, j = 1, . . . , q, and the sta-

tionary condition requires that∑qj=1 αj +

∑pi=1 βi < 1 (Bollerslev, 1986). The wide

success of the parsimonious GARCH(1, 1) model in extensive empirical applications

15

makes this model the workhorse of financial applications (see, for example, Engle

(2001); Alexander and Lazar (2006); Francq and Zakoıan (2011)). The GARCH(1, 1)

model considered here is given by:

σ2t = α0 + β1σ

2t−1 + α1u

2t−1, (2.4)

where α0, α1, and β1 are subject to the constraints discussed above in the general

GARCH(p, q) model. The log-likelihood function of the GARCH(1, 1) model is given

by:

L(r|θGarch) =T∑t=1

12

[− ln σ2

t −u2t

σ2t

− ln 2π], (2.5)

where r is the observed time series of asset returns, and θGarch is a parameter set of

the GARCH(1, 1) model.

2.4.3 GJR-GARCH(1,1) model

The asymmetric GJR-GARCH(1, 1) model is a GARCH-type model with a threshold

indicator function, and has a different expression for conditional volatility σ2t from

the classical GARCH model, which is defined as:

σ2t = α0 + βσσ

2t−1 + φu2

t−1 + φ−u2t−1(Iut−1<0), (2.6)

where I is a dummy variable which takes the value of 1 if the innovation is negative,

and 0 otherwise; and α0 > 0, βσ > 0, φ− > 0 and φ > 0 to ensure the conditional

variance is positive and βσ + φ + 0.5φ− < 1 to ensure the conditional variance is

stationary. In this model, negative shocks increase the conditional variance relative

to positive shocks due to the leverage effect in stock returns. The log-likelihood

function of the GJR-GARCH(1, 1) model has the same expression with the one of

the GARCH(1, 1) model provided in Equation (2.5).

16

2.4.4 Bad Environment-Good Environment (BEGE) model

The BEGE model is based on non-Gaussian innovations and is defined as:

ut = σpωp,t − σnωn,t, where

ωp,t ∼ Γ(pt, 1), and

ωn,t ∼ Γ(nt, 1),

(2.7)

where Γ(k, θ)1 is the so-called centered gamma distributions (Bekaert et al., 2015)

with a zero mean. The innovation ut is a linear combination of a bad environment

component ωn,t with a shape parameter of nt, and a good environment component

ωp,t with a shape parameter of pt. The conditional shock distribution of ut is gen-

erated from two gamma-distributed shocks. The time-varying shape parameters nt

and pt are given by:

pt = p0 + ρppt−1 +φ+p

2σ2p

u2t−1Iut−1≥0 +

φ−p2σ2

p

u2t−1(1− Iut−1≥0), and (2.8)

nt = n0 + ρnnt−1 + φ+n

2σ2n

u2t−1Iut−1≥0 + φ−n

2σ2n

u2t−1(1− Iut−1≥0). (2.9)

The overall conditional variance σ2t of ut in the BEGE model is:

σ2t ≡ vart(rt) = σ2

ppt + σ2nnt, (2.10)

which is derived from the moment generating function of the centered gamma distri-

bution (Bekaert et al., 2015). As with the conditional variance, moments of higher

order such as conditional skewness and kurtosis can be easily derived with tractable

expressions. These higher-order conditional moments are time-varying compared

with traditional asymmetric volatility GARCH models.

The log-likelihood function of the BEGE model is written as:

L(r|θBEGE) =T∑t=1

ln fBEGE(rt|rt−1, . . . , r1;θBEGE), (2.11)

1The probability density function of Γ(k, θ) is f(x) = 1Γ(k)θk (x + kθ)k−1 exp

(− 1θ(x+ kθ)

)for

x > −kθ.

17

where θBEGE is a parameter set of the BEGE model, and the conditional likeli-

hood of observation rt fBEGE(rt|rt−1, . . . , r1;θBEGE) is approximated numerically

according to Bekaert et al. (2015). The cumulative distribution function (CDF) of

fBEGE(rt|rt−1, . . . , r1;θBEGE) at two points just above and below the observation rt

are numerically evaluated. Then a finite difference approximation method is used to

find the derivative of the CDF, which approximates the probability density function

of rt. As a result of the nature of approximation in this deterministic numerical

method, the approximated likelihood of rt is biased. In Chapter 3, we propose an

unbiased estimation for the likelihood of rt of the BEGE model with importance

sampling.

2.4.5 Parameter estimation methods

As researchers explore increasingly complex extensions to the standard GARCH

model, the corresponding parameter estimation problem becomes more challeng-

ing. The generally favored approach for the estimation of GARCH-type models

is maximum likelihood estimation (MLE). The straightforward implementation of

MLE technique makes it popular. Bollerslev and Wooldridge (1992) established the

asymptotic distribution of the quasi-maximum likelihood estimator (QMLE) for the

standard GARCH model under high level assumptions. Traditionally, a Gaussian

innovation is assumed when applying the QMLE technique to estimate the GARCH

model, whilst the true innovation may be far from Gaussian.

However, for many financial time series the empirical evidence indicates that the

distribution of the innovation is asymmetric and heavy-tailed, which leads to the

investigation of QMLE for heavy tailed errors, see e.g. Berkes et al. (2003) and

Hall and Yao (2003). Due to the simplicity principle, the QMLE approach has

been widely adopted in the literature, see e.g. Caporale et al. (2015), Bekaert

et al. (2015), Hansen and Huang (2016) and Guesmi et al. (2019). However, some

difficulties may be encountered in practice when dealing with the MLE of GARCH-

type models (Ardia et al., 2008), such as the cumbersome inequality constraints

are normally considered in the numerical optimization procedure. The Bayesian

approach provides some advantages compared to classical estimation techniques and

offers an attractive alternative.

18

Importance sampling has been used to sample from the posterior distribution of the

parameters of GARCH-type models (e.g. Geweke (1989)). However, the choice of

the importance distribution can be the main issue for applicability. Alternatively,

MCMC methods have been popular for the Bayesian estimation of GARCH-type

models. Ardia and Hoogerheide (2010) state that the simple Gibbs sampling pro-

cedure cannot be used because of the recursive nature of the conditional variance

of GARCH-type models. The Griddy-Gibbs sampler (Ritter and Tanner, 1992) ap-

proach has been adopted by Bauwens and Lubrano (1998, 2002). This approach is

intuitive and easy to implement, but the procedure is extremely time consuming.

MH MCMC is another algorithm used for estimating GARCH-type model parame-

ters. As the proposal distribution in MH MCMC requires to be tuned to produce

reasonable acceptance rates and hence a better Markov chain, the algorithm is not

automatic which makes make it undesirable. The MH MCMC has been used by

Muller and Pole (1998), Vrontos et al. (2000), Chen et al. (2006) and Gerlach and

Chen (2008), among others. Nakatsuma (1998, 2000) propose an automatic MH

algorithm for helping to construct better proposal densities. The reader is referred

to Virbickaite et al. (2015) for a comprehensive review of the Bayesian approach for

GARCH-type models. To the best of our knowledge, from the perspective of the

Bayesian approach, SMC methods have not been established for the GARCH-type

models with fixed parameters in the literature, which stimulates the work of this

thesis.

2.5 Model Selection

Practitioners are not only interested in single model analysis, but also the relative

performance of competing models. Under a fully Bayesian framework, the relative

performance of one model against another candidate model is assessed using the

Bayes factor (Jeffreys, 1935; Kass and Raftery, 1995). Suppose there are M can-

didate models (m = 1, . . . ,M), the posterior model probability P (m|y) is given by

Bayes’ theorem:

P (m|y) = P (y|m)P (m)P (y) ,

19

the posterior odds of two different models, m1 and m2, is then given by:

P (m = m1|y)P (m = m2|y) = P (y|m = m1)

P (y|m = m2)P (m = m1)P (m = m2) .

When the prior probabilities of the two models are equal, then the posterior odds

is just the Bayes factor. The Bayes factor comparing two different models with

parameter vectors θm1 and θm2 can be expressed as:

BFm1,m2 = P (y|m = m1)P (y|m = m2) =

∫P (y|θm1 ,m = m1)P (θm1 |m = m1)dθm1∫P (y|θm2 ,m = m2)P (θm2 |m = m2)dθm2

,

which is just the ratio of normalising constants under each model. As the Bayes

factor transforms the prior odds to posterior odds by observing data, the Bayes factor

indicates how strongly one model is supported by the data compared with another

model. The normalising constant is also referred to as the marginal likelihood or

evidence. The larger the evidence in favour of model m1 relative to m2 is, the greater

the Bayes factor BFm1,m2 . One challenge of implementing Bayes factor is that the

evidence is generally not easy to calculate quickly and accurately. A review of some

commonly used methods of estimating the model evidence is given in Friel and Wyse

(2012).

Fortunately, an estimate of the evidence can be obtained as a by-product from

SMC methods. The estimate of the evidence Z can be computed as Z = ZT /Z0 =∏Tt=1 Zt/Zt−1 with Z0 = 1 (Del Moral et al., 2006). Under likelihood annealing

SMC, the ratio of normalizing constants Zt/Zt−1 is given by:

ZtZt−1

=∫θf(y|θ)γt−γt−1πt−1(θ|y)dθ.

Then the estimate of log evidence lnZ is given by:

lnZ =T∑t=1

ln Eπt−1 [f(y|θ)γt−γt−1 ]

≈T∑t=1

ln(

N∑i=1

wit

),

20

where wit is the unnormalised weight from Algorithm 2.2. A similar estimator under

data annealing can be easily derived. In Chapter 3 we use this estimator to compare

the three GARCH models on various datasets.

Other widely used model selection tools such as the Bayesian information criterion

(BIC) (Schwarz et al., 1978), which provides a crude approximation to the logarithm

of Bayes factor (Kass and Raftery, 1995), and deviance information criterion (DIC)

(Spiegelhalter et al., 2002) have restrictive assumptions and limitations. In the BIC

approach, one chooses the model that minimizes

BIC = −2 lnP (y|θ) + p lnn,

where n is sample size, or equivalently, the number of observations, p is the number

of model parameters, and θ is the maximum likelihood estimate for θ. The BIC is an

asymptotic derivation which means the probability that BIC will select the correct

model approaches one as the sample size goes to infinity (Trevor et al., 2009). When

the implicit prior assumed by BIC is not a multivariate normal distribution, the BIC

will behave poorly (Kuha, 2004).

Spiegelhalter et al. (2002) introduced the DIC as a somewhat Bayesian measure of

model fit and complexity. The DIC is defined as

DIC = −2 lnP (y|θBayes) + 2pD,

pD = 2(lnP (y|θBayes)−Eθ|y (ln p(y|θ))

),

where θBayes = E(θ|y) is the posterior mean, pD is the effective number of param-

eters, and the second term of pD is the posterior expectation of the log-likelihood.

With the rise of simulation methods such as MCMC to obtain the posterior distri-

butions, the DIC becomes more popular as it could be easily computed from the

simulation outputs. Nevertheless, the DIC is only valid when the posterior can be

approximated well with a multivariate normal distribution (Gelman et al., 2014).

However, the fully Bayesian approach based on the Bayes factor, which is adopted

in this thesis, does not rely on the restrictive approximations.

21

Chapter 3

A fully Bayesian approach for GARCH-type models

3.1 Introduction

In many financial applications, volatility, defined as the conditional standard devi-

ation of asset returns, is an important quantity. Many financial applications are

based on volatility forecasts, such as risk management, asset pricing and portfolio

optimization (Engle, 2001). Therefore modelling of volatility has attracted a lot of

research attention. One of the most influential class of models used to describe the

dynamics of volatility is the AutoreRressive Conditional Heteroscedasticity (ARCH)

model by Engle (1982) and the Generalized ARCH (GARCH) model by Bollerslev

(1986), which is a more parsimonious extension of ARCH. The standard GARCH

model of Bollerslev (1986) has successfully captured many characteristics of financial

asset returns, and forms the basis for a wide range of expressions used to capture var-

ious empirical features of returns data. For instance, in order to capture the stylised

fact of the leverage effect (Black, 1976), a range of models such as the exponen-

tial GARCH (EGARCH) model (Nelson, 1991), the Glosten-Jagannathan-Runkle

GARCH model (GJR-GARCH) of Glosten et al. (1993), the absolute value GARCH

(AGARCH) model of Hentschel et al. (1995), and the Bad Environment-Good En-

vironment (BEGE) model (Bekaert et al., 2015) have been developed. Given the

wide range of GARCH type models, it is crucial to develop an efficient statistical

inference framework to select between these models.

Frequentist inference based on maximum likelihood estimation (MLE) is commonly

employed with the GARCH class of models. A limitation of MLE is that it only

produces point estimates of parameters, with the uncertainty of these parameter

23

estimates based on asymptotic theory where parameter estimates are assumed to

be normally distributed (Hamilton, 1994). With the development of new financial

instruments based on volatility, such as volatility options, modelling the full predic-

tive density of volatility is attracting attention (e.g. Corradi et al. (2009, 2011); and

Griffin et al. (2016)). To produce such distributions, here we develop a Bayesian

approach to statistical inference for the GARCH class of models. Under this ap-

proach, the unknown parameters are regarded as random variables, whose posterior

distributions may be far from normal. In contrast to the confidence interval under

classical statistics, the credible interval of Bayesian statistics is easily interpreted

and gives a more direct and intuitive probability statement about uncertainty in

the parameter estimates, (O’Hagan, 2004). Furthermore, our Bayesian approach

can also provide full predictive distributions of the conditional volatility required for

pricing volatility instruments.

An issue with Bayesian estimation is developing an efficient method for sampling

from the posterior distribution based on Monte Carlo methods (Robert and Casella,

2004). The most commonly used approach is MCMC, which has been utilised for

Bayesian analysis of the GARCH class of models (see, for example, Kim et al.

(1998), Vrontos et al. (2000), Takaishi (2009)). Despite the popularity of MCMC,

it can be costly and difficult to tune the associated parameters when using MCMC

(Roberts and Rosenthal, 2009) with conventional MCMC algorithms becoming less

effective when the target distribution is far from normally distributed. In more

recent times, the sequential Monte Carlo (SMC) method for static Bayesian mod-

els (Chopin, 2002; Del Moral et al., 2006) has been developed as a complementary

method to MCMC. SMC traverses a set of particles through a sequence of probabil-

ity distributions, which starts with one that is easy to sample from and ends with

the ultimate target posterior distribution. The sequence of probability distributions

can be attained either using a data annealing strategy (Chopin, 2002) or the method

of likelihood annealing (Neal, 2001). SMC iteratively applies three main steps to

traverse the population of particles, which include re-weighting, re-sampling, and

moving (or mutation). SMC is parallelisable and easy to adapt to become more effi-

cient compared with the MCMC method. It is possible to use parallelisation to save

24

computation costs especially when the likelihood function is computationally inten-

sive, which we demonstrate is easily achievable for the GARCH class of models. In

addition, SMC is efficient at exploring irregular posterior distributions, often found

with GARCH type models. The intermediate samples created by SMC embedded

with data annealing strategy can be reused to perform posterior predictions. Also,

the estimate of evidence, which is a by-product from SMC methods, can be used in

Bayesian model selection.

In this chapter, we utilise SMC methods to develop a novel fully Bayesian approach

for the widely used GARCH class of models. This Bayesian framework can be

used for estimation, prediction and model selection, and generating a full predictive

density for conditional volatility. Our Bayesian SMC approach provides an attractive

alternative to the MCMC or frequentist approaches. This chapter also contributes to

the SMC literature as we demonstrate that SMC data annealing is useful for leave-

one-out cross-validation when dealing with time series data. Finally, we develop an

unbiased likelihood estimator for the BEGE model, which permits exact Bayesian

inference for this model. Bekaert et al. (2015) use an approximation to the likelihood,

which as we show produces biased posterior distributions. South et al. (2019) use

the BEGE model with biased likelihood as an example to illustrate a new SMC

approach for parameter estimation. Our work is a comprehensive demonstration of

the advantages that SMC can produce for GARCH-type models.

In Section 3.2, we provide the methods for deriving an unbiased estimator for the

likelihood of the BEGE model. An efficient strategy to choose the sequence of

intermediate probability distributions of SMC is discussed in Section 3.3. Section 3.4

provides a comparison of one-step forward predictions of volatility that are derived

from the fully Bayesian approach and the classical approach, respectively. A leave-

one-out cross-validation method for time series data is also described for evaluating

the predictive performance of the forecasted volatility. Finally, a summary of the

results is provided in Section 3.5, and concluding discussion is given in Section 3.6.

3.2 Unbiased estimation for the BEGE likelihood

According to Bekaert et al. (2015), the BEGE-distributed variable ut can be rewrit-

ten as ut = ωp,t − ωn,t, where ωp,t ∼ Γ(pt, σp), ωn,t ∼ Γ(nt, σn), and ut = rt − µ.

25

Then the BEGE density can take the form as

fBEGE(ut) =∫ωp,t

fut(ut|ωp,t)f(ωp,t)dωp,t

=∫ωp,t

fωn,t(ωp,t − ut)f(ωp,t)dωp,t.(3.1)

An unbiased estimator of fBEGE(ut) can be constructed using Monte Carlo methods.

Firstly, we sample M independent samples ω1p,t, . . . , ω

Mp,t from f(ωp,t), then fBEGE(ut)

can be estimated as

fBEGE(ut) = 1M

M∑i=1

fωn,t(ωip,t − ut),where ωip,ti.i.d.∼ f(ωp,t), i = 1, . . . ,M. (3.2)

One main issue of the Monte Carlo integration method is that a large number of

Monte Carlo draws, M , may be required to increase the accuracy of the Monte Carlo

estimates, which makes it more computationally intensive. Therefore, we adopt

importance sampling to reduce variance of Monte Carlo estimates and improve the

computational efficiency. When it is hard to sample from our target distribution h(·)

directly, we can construct an importance distribution, g(·), which is easy to sample

from. The following identity is the basis for importance sampling

Eg[h(x)g(x)

]=∫h(x)g(x)g(x)dx =

∫h(x)dx

≈ 1M

M∑i=1

h(xi)g(xi)

, xii.i.d.∼ g(·).

(3.3)

In the case of the BEGE model, the target distribution is h(ωp,t) = fωn,t(ωp,t −

ut)f(ωp,t), and the importance sampling estimator of fBEGE(ut) is given by

fIS(ut) = 1M

M∑i=1

h(ωip,t)g(ωip,t)

, ωip,ti.i.d.∼ g(ωp,t), (3.4)

where a sound proposal distribution g(ωip,t) needs to be selected. To derive the

importance distribution, we firstly determine the mode and curvature of the tar-

get distribution h(ωip,t), and then a parametric importance distribution that shares

the same mode and curvature of the target is developed. The process of construct-

ing an efficient importance distribution for estimating fBEGE(ut) is provided in the

Appendix. The general importance sampling estimator in (3.4) is unbiased.

26

As discussed in Section 2.2, Andrieu and Roberts (2009) show that using an unbiased

likelihood estimator within an MCMC produces a Markov chain that converges

to the idealised posterior that assumes the likelihood can be computed exactly.

This has been demonstrated by marginalizing the auxiliary random variables u out

from the joint posterior π(θ,u|y) in Section 2.2. Chopin et al. (2013) and Duan

and Fulop (2015) show that using an unbiased likelihood estimator in an SMC

algorithm for data annealing and likelihood annealing, respectively, produces an

algorithm that also targets the exact posterior. We adopt a Bayesian statistical

framework for parameter estimation and model selection for GARCH-type models

that are described in Chapter 2. In the following two sections, we develop a Bayesian

statistical framework for model prediction, together with an efficient strategy for

constructing the sequence of distributions of SMC, for GARCH-type models.

3.3 Efficient sequence construction method

Under the SMC approach, the sequence of target distributions can be defined through

both likelihood annealing and data annealing. These two annealing methods provide

similar estimates of posterior distributions for the parameters of the three GARCH-

type models, which are shown in Section 3.5.1. Under the data annealing strategy,

the sequence of target distributions are obtained smoothly when the observations

arrive one at a time. The sequence of posterior distributions constructed by using

data annealing method are given by:

πt(θ|y1:t) ∝ f(y1:t|θ)π(θ),

where t = 0, . . . , T , and y1:t represents the data available up to current time t. The

corresponding unnormalised weight of the particle i at time t is thus:

wit = W it−1

f(y1:t|θit−1)f(y1:t−1|θit−1)

= W it−1f(yt|y1:t−1,θ

it−1),

which needs to be normalised. It is important to note that to update the weights

only the conditional likelihood of the introduced data point needs to be computed.

As more returns become available, the re-weighting step in data annealing does not

27

need to re-process previously collected data. This provides extensive computational

savings, especially when the time series is long.

The data annealing strategy is also useful for performing posterior prediction. One

way to consider if a GARCH-type model is appropriate would be to focus on the

model’s ability to forecast. Our Bayesian approach provides a clear method to

deduce the posterior predictive distribution conditional on currently available data.

The intermediate posterior distribution πt−1(θ|y1:t−1) from data annealing is utilised

when forecasting one-step ahead posterior prediction for data at time t. Meanwhile,

the corresponding posterior predictive density that is used to measure the prediction

accuracy can be estimated. Section 3.4 details the procedure of estimating the

posterior predictive distribution.

3.4 Bayesian posterior predictive distribution

In this section, the fully Bayesian and traditional classical approaches are compared

in the context of generating one-step ahead forecasts of conditional variances. The

distribution of the one-step ahead conditional variances together with 95% prediction

intervals are offered by the fully Bayesian approach, which cannot be provided by

the classical approach. We also provide a fully Bayesian approach for assessing time

series models’ predictive performance, which can be achieved easily with SMC data

annealing.

3.4.1 Bayesian posterior predictive distribution

Under the fully Bayesian approach, the posterior predictive distribution for the

unknown observations can be produced. New observations yt+1 can be drawn from

the posterior predictive distribution given the original data y1:t up to date t:

P(yt+1|y1:t) =∫θ

P(yt+1|θ, y1:t)πt(θ|y1:t)dθ.

The posterior predictive distribution accounts for the uncertainty associated with

the parameters without making asymptotic assumptions. Here, we consider one-step

ahead forecasts for the posterior predictions of the conditional variance σ2t+1 given

its past values and the real data up to time t, σ21:t and y1:t.

28

The posterior predictive distribution is particularly easy to estimate via SMC with

data annealing. Denote the SMC weighted sample from the posterior πt(θ|y1:t) as{W it ,θ

it

}Ni=1. Then, the posterior predictive distribution can be approximated by{

W it , (σ2

t+1)i}Ni=1 where (σ2

t+1)i ∼ p(σ2t+1|y1:t, σ

21:t,θ

it) for i = 1, . . . , N . From the

weighted sample, a point estimate (e.g. mean or median) and predictive interval can

be easily estimated. The computation associated with constructing the approxima-

tion to the predictive posterior can be easily vectorised and parallelised.

3.4.2 Forecast through maximum likelihood estimation (MLE) approach

We compare the Bayesian SMC approach for forecasting with MLE. In order to form

one-step ahead forecasts with MLE, at time t we firstly find the maximum likelihood

estimate θtMLE based on the observations up to time t. MLE parameter estimates

are obtained by maximising each model’s log-likelihood provided in Section 2.4 via

numerical optimisation. After obtaining the MLE parameter estimates at time t, a

point estimate of the prediction of the conditional variance from the three GARCH-

type models at time t+ 1 can be obtained by utilizing one of Equations (2.4), (2.6)

or (2.10) in Section 2.4.

3.4.3 Forecast evaluation

In order to assess which model provides more accurate forecasts, we adopt the leave-

future-out cross-validation (LFO-CV) method (Burkner et al., 2019). Burkner et al.

(2019) state that the common leave-one-out cross-validation (LOO-CV) is not suit-

able for assessing predictive performance when the data is sequentially ordered in

time since future observations may provide some information that affects in-sample

model fit, and thus the predictions may be affected if only leaving one observation

out at a time. Therefore, when making one-step ahead forecasts at time t+1 we use

LFO-CV that leaves out all the future values after time t to assess predictive per-

formance for time series models. To measure the predictive accuracy, the expected

log posterior density (elpd; Vehtari et al., 2017) is approximated by:

elpdLFO =T−1∑t=τ−1

ln p(yt+1|y1:t),

29

(Burkner et al., 2019) where τ is time point that we decide to start assessing the pre-

dictions, T is the number of observations in the dataset, and the density p(yt+1|y1:t)

can be approximated by evaluating an estimated density, which is constructed us-

ing kernel density estimation based on the weighted sample{W it , y

it+1}Ni=1, at the

true observation yt+1. The standard approach to LFO-CV would be to re-compute

the posterior distribution (referred to here as a refit) as each new data point is in-

troduced, however, this would be computationally intensive. Burkner et al. (2019)

propose an algorithm to approximate LFO-CV and reduce the number of refits re-

ducing computation time. However, our data annealing SMC naturally re-estimates

the posterior distribution in a computationally efficient manner as more data are

introduced, and hence, we demonstrate it is valuable for performing LFO-CV. The

weighted sample from the posterior{W it ,θ

it

}Ni=1 based on different subsets of obser-

vations y1:t is a by-product of posterior simulation via data annealing SMC, which

means there is no need to refit the time series model again to perform and test

predictions.

A simulation study is performed in Section 3.5.2 to evaluate the predictive perfor-

mance for conditional variance and explore the relative merits of the fully Bayesian

approach relative to the MLE approach. To assess predictive accuracy we provide

an estimated elpdLFO and a 95% one-step forward prediction interval for each candi-

date model with different simulated datasets. We also illustrate that the predictive

distribution of the conditional volatility can provide us with different levels of con-

fidence about the magnitude of the volatility at the next time point. We count and

record the true conditional variances from the simulated data that fall inside the

2.5% and 97.5% quantiles. We show in Section 3.5.2 that the posterior predictive

interval from each true model, which accounts for all sources of variability in the

model, provides a satisfactory coverage of the true values.

3.5 Results

The plan of this section is as follows. In Section 3.5.1 we present results for the

posterior distributions obtained by SMC algorithms for the GARCH-type models

presented in Section 2.4. The one-step ahead forecasts of conditional variances

by using both classical MLE and Bayesian approaches are illustrated in Section

30

3.5.2. The model selection results obtained from the fully Bayesian approach are

presented in Section 3.5.3. Computer code for implementing our methods is available

at https://github.com/DanLAct/GARCH-SMC.

In this study we useN = 10, 000 particles in SMC to generate posterior distributions.

The time series data we use are monthly log stock returns for the S&P 500 Compos-

ite Index from July 1926 to January 2018 from the Center of Research in Securities

Prices (CRSP). The parameters in the GARCH-type models we aim to estimate

respectively are θGarch(1,1) = (µG, αG0 , α1, β1), θGJR = (µGJR, αGJR

0 , βσ, φ, φ−), and

θBEGE = (µB, p0, n0, ρp, ρn, φ+p , φ

+n , φ

−p , φ

−n , σp, σn). Some constraints on the param-

eters are imposed initially in addition to the stationarity requirements. Such initial

constraints are chosen based on classical estimation results (e.g. Bekaert et al.,

2015). Thus, we set the priors on the parameters of the GARCH(1, 1) model as:

αG0 ∼ U(0, 0.3), α1 ∼ U(0, 0.5),

β1 ∼ U(0, 0.99), µG ∼ U(−0.9, 0.9),

where U(·) denotes a uniform probability density function. To satisfy stationarity

it is necessary to enforce α1 + β1 ≤ 0.9999. The priors on the parameters of the

GJR-GARCH(1, 1) model are given by:

αGJR0 , φ, φ− ∼ U(0, 0.3),

βσ ∼ U(0, 0.99),

µGJR ∼ U(−0.9, 0.9).

To satisfy stationarity we need φ+ 12φ−+βσ ≤ 0.9999. The priors on the parameters

of the BEGE model are given by:

p0, φ+p , φ

−p ∼ U(0, 0.5),

σp, σn ∼ U(0, 0.3), ρp, ρn ∼ U(0, 0.99),

n0 ∼ U(0, 1), φ+n ∼ U(−0.2, 0.1),

φ−n ∼ U(0, 0.75), µB ∼ U(−0.9, 0.9).

31

https://github.com/DanLAct/GARCH-SMC

Additionally, to satisfy stationarity we require that ρp + 12φ

+p + 1

2φ−p ≤ 0.995 and

ρn + 12φ

+n + 1

2φ−n ≤ 0.995. All parameters are assumed to be independent in the

prior.

3.5.1 Posterior Distributions

The univariate posterior distributions obtained by our SMC algorithms for the pa-

rameters of the GARCH(1, 1), GJR-GARCH(1, 1) and BEGE models are shown in

Figures 3.1, 3.3 and 3.5, respectively. The scatterplots of bivariate posterior distri-

butions of the parameters of the three GARCH-type models are shown in Figures

3.2, 3.4, and 3.6, respectively.

0 0.5 1 1.5 2 2.5 3

10-4

0

0.5

1

1.5

2

Den

sity

104 0G

0.05 0.1 0.15 0.2 0.250

5

10

15

20

25

Den

sity

1

0.7 0.75 0.8 0.85 0.9 0.950

5

10

15

20

Den

sity

1

0 0.005 0.01 0.0150

100

200

300

400

Den

sity

G

SMC: Data AnnealSMC: Likelihood Anneal

Figure 3.1: Marginal posterior distributions of the parameters of the GARCH(1,1) modelby using SMC.

The marginal posterior estimates are constructed by using a nonparametric kernel

density estimation method. As can be seen in the figures, each parameter’s pos-

terior distribution derived by data annealing SMC is very close to that obtained

under likelihood annealing SMC. Except for the parameters of the fixed mean value

µ = (µG, µGJR) under GARCH(1, 1) and GJR-GARCH(1, 1) model, all other poste-

rior distributions demonstrate that they are not symmetric and, in the BEGE case,

are far from normally distributed, which contradicts the assumption of normally

distributed parameter estimates under standard asymptotic inference. In addition,

32

the non-normality of the joint posterior distributions are demonstrated in the scat-

terplots of bivariate distributions.

Figure 3.2: Scatterplots of bivariate distributions based on samples drawn from the poste-rior distribution of the parameters of the GARCH(1,1) model by using SMC.

0 1 2 3

10-4

0

5000

10000

15000

Densi

ty

0GJR

0 0.05 0.1 0.15 0.20

5

10

15

20

Densi

ty

0 0.1 0.2 0.30

2

4

6

8

10

Densi

ty

-

0.7 0.8 0.90

5

10

15

20

Densi

ty

0 0.005 0.010

100

200

300

400

Densi

ty

GJR

SMC: Data AnnealSMC: Likelihood Anneal

Figure 3.3: Marginal posterior distributions of the parameters of the GJR-GARCH(1,1)model by using SMC.

Figure 3.5 demonstrates that the approximate likelihood of fBEGE(ut) can lead to

biased posteriors, especially for σp and µB. Even though the unbiased estimators of

33

Figure 3.4: Scatterplots of bivariate distributions based on samples drawn from the poste-rior distribution of the parameters of the GJR-GARCH(1,1) model by using SMC.

0 0.2 0.4 0.60

1

2

3

Densi

ty

p0

-0.01 0 0.01 0.020

100

200

300

Densi

ty

p

0.4 0.6 0.8 10

5

10

15

Densi

ty

p

0 0.1 0.2 0.3 0.40

5

10

Densi

ty

p+

0 0.2 0.4 0.60

5

10

Densi

ty

p-

0 0.5 10

1

2

Densi

ty

n0

0 0.01 0.02 0.03 0.040

100

200

Densi

ty

n

0 0.5 10

2

4

Densi

ty

n

-0.2 -0.1 0 0.1 0.20

5

10

Densi

ty

n+

-0.2 0 0.2 0.4 0.6 0.80

2

4

Densi

ty

n-

0 0.005 0.01 0.0150

200

400

Densi

ty

B

Likelihood Anneal SMC: IS UnbiasedData Anneal SMC: IS UnbiasedData Anneal SMC: MC UnbiasedData Anneal SMC: Biased

Figure 3.5: Marginal posterior distributions of the parameters of the BEGE model by usingSMC.

34

the likelihood produced by Monte Carlo integration and importance sampling pro-

duce similar posterior distributions, the importance sampling estimating method is

more efficient. To test the efficiency of these two estimation methods, we choose

three sets of parameters and 100 independent likelihood estimates for the two esti-

mators with different Monte Carlo sample sizes M . As shown in Table 3.1, the Monte

Carlo integration requires at least 5,000 draws to guarantee the standard deviation

is lower than 1, while the importance sampling only needs 1,000 draws to achieve a

similar level of variance. Therefore, we adopt importance sampling estimator when

using SMC to sample from the posterior for the BEGE model below.

Figure 3.6: Scatterplots of bivariate distributions based on samples drawn from the poste-rior distribution of the parameters of the BEGE model by using SMC.

The likelihood function for the BEGE model is computationally expensive. We sig-

nificantly accelerate the likelihood calculations for the proposed parameters of all

particles in one iteration of the MCMC step of the SMC algorithm by taking ad-

vantage of parallel computing. More specifically, we allocate a set of these proposed

parameters to each core on a multi-core machine. Using the importance sampling

estimator of the likelihood and 12 cores, the SMC algorithm with data annealing

applied to the BEGE model runs 6.7 times faster than a completely serial SMC

35

Table 3.1: Comparison of unbiased Monte Carlo integration estimator and importancesampling estimator

Number of MonteCarlo Iterations Standard Deviation

ParametersMethods Monte Carlo

IntegrationImportanceSampling

Parameter Set 1

100 4.86 1.631000 3.20 0.745000 0.96 0.5010000 0.64 0.39

Parameter Set 2

100 6.09 1.631000 1.30 0.975000 0.59 0.7710000 0.42 0.69

Parameter Set 3

100 7.4 1.451000 1.47 0.845000 0.59 0.6210000 0.41 0.55

Parameter Set 1: p0 = 0.081, σp = 0.004, ρp = 0.857, φ+p = 0.084, φ−

p = 0.151, n0 = 0.817,σn = 0.023, ρn = 0.466, φ+

n = −0.131, φ−n = 0.323, µB = 0.007;


p = 0.155, n0 = 0.700,σn = 0.022, ρn = 0.571, φ+

n = −0.180, φ−n = 0.475, µB = 0.009;


p = 0.155, n0 = 0.792,σn = 0.023, ρn = 0.557, φ+

n = −0.183, φ−n = 0.433, µB = 0.008.

algorithm. Furthermore, the likelihood annealing SMC applied to the BEGE model

runs 6.5 times faster than a completely serial SMC algorithm.

Figure 3.7 shows the posterior distributions for the BEGE model explored by using

MH MCMC and SMC methods. The MH MCMC requires tuning of the proposal

distribution. To make implementing the MH MCMC easier, the final population

of SMC particles are used to form the proposal distribution. More specifically, we

use a multivariate normal random walk with a covariance estimated from the SMC

particles. However, even though we give MH MCMC an advantage by avoiding the

tuning step, the posteriors obtained by MCMC still perform poorly compared with

the ones from the SMC methods in Figure 3.7. For instance, the posterior distribu-

tion for parameter σp obtained by MCMC has an obvious unexpected bump, and

the trace plot in Figure 3.8 shows that MCMC can get stuck and the Markov chain

explores the parameter space slowly. The runtime of the MH MCMC algorithm with

1 million iterations is more than 2.2 times slower than the data annealing SMC that

36

exploits parallel computing. Inefficiency factor, also called the integrated autocor-

relation time, is an MCMC diagnostic and measures the computational inefficiency

of an MCMC sampler. The sampler with the lowest inefficiency factors is the most

efficient one, and a very efficient sampling algorithm has inefficiency factors close to

1. The estimated inefficiency factors are reported in Table 3.2. The table shows that

all the parameters’ inefficiency factors are above 1000, which reinforces inefficiency

of the MCMC algorithm.

0 0.2 0.4 0.60

1

2

3

Dens

ity

p0

-0.01 0 0.01 0.020

100

200De

nsity

p

0.4 0.6 0.8 10

10

20

Dens

ity

p

0 0.2 0.40

5

10

15

Dens

ity

p+

0 0.2 0.4 0.60

5

10

Dens

ity

p-

0 0.5 10

1

2

Dens

ity

n0

0 0.02 0.040

100

200

Dens

ity

n

0 0.5 10

2

4

Dens

ity

n

-0.2 0 0.20

5

10

Dens

ity

n+

-0.2 0 0.2 0.4 0.6 0.80

2

4

Dens

ity

n-

0 0.005 0.01 0.0150

200

400

Dens

ity

B

Likelihood Anneal SMC: IS Unbiased

Data Anneal SMC: IS Unbiased

MCMC: IS Unbiased

Figure 3.7: Marginal posterior distributions of the parameters of the BEGE model by usingSMC and MCMC methods for real data.

Table 3.2: Estimated inefficiency factors (IF)p0 σp ρp φ+

p φ−p n0 σn ρn φ+n φ−n µB

IF 1405.71 1677.05 1272.53 1742.26 1926.63 2748.03 1990.45 2317.56 1587.57 1445.31 1388.45

Both the MH MCMC and SMC methods are used to sample the posterior distribu-

tions for the BEGE model with simulated data, and the results are shown in Figure

3.9. We choose SMC particle parameter value with the highest likelihood to be the

initial values for the MCMC, which are the red dashed lines in Figure 3.9. As can

be seen, the estimates of the posterior distribution produced by MH MCMC are less

smooth than SMC and the point estimates used to generate simulated data are also

37

0 5 10

105

0

0.5

Va

lue

p0

0 5 10

105

0

0.01

0.02

Va

lue

p

0 5 10

105

0.6

0.8

1

Va

lue

p

0 5 10

105

0

0.2

0.4

Va

lue

p+

0 5 10

105

0

0.5

Va

lue

p-

0 5 10

105

0

0.5

1

Va

lue

n0

0 5 10

105

0

0.02

0.04

Va

lue

n

0 5 10

105

0

0.5

1

Va

lue

n

0 5 10

105

-0.2

0

0.2

Va

lue

n+

0 5 10

105

0

0.5

1

Va

lue

n-

0 5 10

105

0

0.01

0.02

Va

lue

B

0 0.2 0.4 0.60

2

4

De

nsity

p0

-0.01 0 0.01 0.020

100

200

De

nsity

p

0.6 0.8 10

10

20

De

nsity

p

0 0.2 0.40

10

20

De

nsity

p+

0 0.2 0.40

5

10

De

nsity

p-

0 0.5 10

1

2

De

nsity

n0

0.01 0.02 0.03 0.040

100

200

De

nsity

n

0 0.5 10

2

4

De

nsity

n

-0.2 0 0.20

5

10

De

nsity

n+

-0.2 0 0.2 0.4 0.6 0.80

2

4

De

nsity

n-

0 0.005 0.01 0.0150

200

400

De

nsity

B

Figure 3.8: Trace plots and marginal posterior distributions of the parameters of the BEGEmodel by using MCMC method for real data.

contained by the posterior distributions. This simulation study can be viewed as a

validation of the effectiveness of the proposed SMC methods.

0 0.2 0.4 0.60

2

4

6

Dens

ity

p0

-0.01 0 0.01 0.020

100

200

Dens

ity

p

0.6 0.8 10

5

10

Dens

ity

p

Likelihood Anneal SMC: IS Unbiased

Data Anneal SMC: IS Unbiased

MCMC: IS Unbiased

Input

0 0.2 0.4 0.60

5

10

Dens

ity

p+

0 0.2 0.40

5

10

Dens

ity

p-

0 0.5 10

2

4

6

Dens

ity

n0

0.01 0.02 0.03 0.040

100

200

Dens

ity

n

0 0.5 10

5

10

Dens

ity

n

-0.2 -0.1 0 0.10

10

20

Dens

ity

n+

0 0.2 0.4 0.6 0.80

2

4

6

Dens

ity

n-

5 10 15

10-3

0

200

400

Dens

ity

t

Figure 3.9: Marginal posterior distributions of the parameters of the BEGE model by usingSMC and MCMC methods for simulated data.

38

3.5.2 Posterior Predictive Distribution

In this section we perform a simulation study which shows the relative merits of the

Bayesian approach over the classical approach for forecasting the conditional vari-

ance. Firstly, we simulate the conditional variance of stock returns σ21:1099 from the

GARCH(1, 1) model, the GJR-GARCH(1, 1) model and the BEGE model, respec-

tively. The parameters used to generate the three simulated data sets are randomly

selected from the final population of SMC particles when modelling real time series

data with the three GARCH-type models. The initial values of conditional vari-

ance at time t = 1 of the three GARCH-type models are (σ21)GARCH = 0.0026,

(σ21)GJR−GARCH = 0.0022 and (σ2

1)BEGE = σ2pp1 +σ2

nn1 = 0.0013. The total number

of simulated data in our study is 1099. We consider the performance of one-step

ahead prediction based on simulated conditional variances t = 201 to t = 1099.

Besides using the true model to fit the simulated data, we also perform a study of

model misspecification by using misspecified models to fit the simulated data and

make predictions.

Here, we present the results for the one-step ahead forecasts of conditional vari-

ances of the three GARCH-type models obtained via data annealing SMC and MLE

approaches. In order to provide a clearer visual interpretation, in Figure 3.10 we

illustrate the results for forecasted conditional volatilities, which are the square roots

of the forecasted conditional variances. Figure 3.10 shows the 95% prediction inter-

val bounds for the last 100 predicted volatilities σ1000:1099 simulated from the three

GARCH-type models by using our Bayesian approach. The 95% prediction interval

is built simply by computing the 0.025 and 0.975 quantiles of the square roots of

the posterior predictive sample of the conditional variances. However, the classical

approach cannot produce such prediction intervals for the conditional volatility. It

is obvious that these prediction intervals capture the simulated data successfully.

We take posterior medians as point estimators for the Bayesian approach. After

illustrating these posterior medians together with the classical point estimators of

three GARCH-type models in Figures 3.10, we can see that the differences between

the Bayesian and classical point estimators of predictions are small.

39

1000 1010 1020 1030 1040 1050 1060 1070 1080 1090 11000.02

0.04

0.06

0.08

0.1

0.12

(a)

1000 1010 1020 1030 1040 1050 1060 1070 1080 1090 11000.02

0.03

0.04

0.05

0.06(b)

1000 1010 1020 1030 1040 1050 1060 1070 1080 1090 1100

0.05

0.1

0.15(c)

Posterior MedianSimulated Volatility from True ModelBayesian 95% Prediction IntervalMLE Point Estimate

Figure 3.10: One-step ahead forecasts for conditional standard deviation of theGARCH(1, 1) model, the GJR-GARCH(1, 1) model and the BEGE model. (a) One-stepahead 95% prediction interval of conditional standard deviation of GARCH(1, 1) modelwith simulated data generated from GARCH(1, 1) model; (b) One-step ahead 95% predic-tion interval of conditional standard deviation of GJR-GARCH(1, 1) model with simulateddata generated from GJR-GARCH(1, 1) model; (c) One-step ahead 95% prediction inter-val of conditional standard deviation of BEGE model with simulated data generated fromBEGE model.

Table 3.3: Frequency of simulated conditional variances falling outside 95% predictioninterval

ModelData Simulated Data

from GARCH(1, 1)

Simulated Datafrom

GJR-GARCH(1, 1)

Simulated Datafrom BEGE

GARCH(1, 1) 0.9% 34.9% 47.8%GJR-GARCH(1, 1) 8.8% 0.8% 28.4%

BEGE 2.8% 0.1% 1.2%

40

The 95% Bayesian prediction intervals for the one-step ahead forecasted conditional

variance of the three GARCH-type models contain true conditional variances with

a satisfactory coverage level. Besides checking the accuracy of Bayesian predictive

distributions for conditional volatility by using the true model, we also use mis-

specified models to fit the simulated data and compare their performance with the

true model. Table 3.3 summarises the frequency of true values of simulated con-

ditional variances t = 201 to t = 1099 falling outside each model’s 95% prediction

interval. For the data simulated from the BEGE model, there are 1.2% of true

squared volatilities falling outside the predicted interval for the true BEGE model,

but the other two misspecified models provide poor predictive performance. For

the data simulated from the GARCH(1, 1) model and GJR-GARCH(1, 1) model,

there are less than 1% of the true conditional variances outside the associated 95%

prediction interval for these true models. Overall, the Bayesian prediction intervals

demonstrate reasonable coverage rates of true conditional volatilities for the three

GARCH-type models in the simulation study, and such a conclusion is reinforced by

considering each model’s estimated elpdLFO in Table 3.4. The model with the high-

est value of elpdLFO is the one that has the best predictive performance. We can see

that there is an agreement between the results of elpdLFO and coverage rates of the

95% prediction intervals used to assess candidate models’ predictive performance.

It is worth noting that the BEGE model performs well for forecasting conditional

variances under the dataset simulated from the GJR-GARCH(1, 1) model, which in-

dicates that the performance of the BEGE model is robust to model misspecification

in this respect. Therefore, even though the real volatility of the real stock returns is

unobservable, this simulation study provides some confidence that the real volatility

can be adequately captured and covered by the prediction interval with adoption of

the Bayesian approach.

Table 3.4: Approximated elpdLFO for one-step ahead forecasts of simulated conditionalvariances from t = 201 to t = 1099

ModelData Simulated Data

from GARCH(1, 1)

Simulated Datafrom

GJR-GARCH(1, 1)

Simulated Datafrom BEGE

GARCH(1, 1) 6531.4 3895.3 -847.9GJR-GARCH(1, 1) 6278.8 6364.0 5492.9

BEGE 6364.4 6393.9 6188.1

41

Figure 3.11 shows the time series for real stock returns and one-step ahead fore-

casts for conditional variances obtained via the Bayesian approach. The Bayesian

posterior median value of conditional variance is set as the point estimate of one-

step ahead prediction. It is immediately apparent that the Bayesian approach can

provide efficient predictions for the three GARCH-type models, which successfully

capture the features of the data such as the evident growth of volatility following

the negative shock of 1987 crash.

1944 1950 1960 1970 1980 1990 2000 2010-0.3

-0.2

-0.1

0

0.1

0.2

Month

ly L

og-r

etu

rn

1944 1950 1960 1970 1980 1990 2000 20100

0.005

0.01

0.015

Conditio

nal V

ariance

GARCH(1,1)

GJR-GARCH(1,1)

BEGE

Figure 3.11: One-step ahead Bayesian forecasts for conditional variance of GARCH(1, 1),GJR-GARCH(1, 1) and BEGE with real time series data by using the Bayesian approach.The prediction period is from March 1943 to January 2018. The top panel shows monthlylog-return series, and the bottom panel shows the results of one-step ahead forecasted con-ditional variances from the three GARCH-type models.

Another promising aspect of the Bayesian approach is reflected in the provision of

the predictive distribution of conditional volatility, which quantifies the uncertainty

in the one-step ahead predicted volatility. However, this predictive distribution

cannot be analytically derived. From the predictive distribution, more information

about the one-step forward unobserved volatility can be extracted. For instance,

42

the probability of the conditional volatility at the next time point being within

any particular range can be computed. Figure 3.12 (a) shows a distribution of the

predicted conditional volatility of the GJR-GARCH(1, 1) model at time t = 586

conditional on real data t = 1 to t = 585. From this distribution, we can see that

the conditional volatilities with values around 0.076 have a high probability of being

observed. In contrast, the predictive distribution in Figure 3.12 (b) indicates that

there is a very low chance of the conditional volatility being greater than 0.076, but

it shows a high chance of a relatively low conditional volatility being observed at

time t = 743 from the BEGE model. Figure 3.12 (c) shows that there is a lot of

uncertainty about the relatively high volatility at time t = 580.

0.064 0.066 0.068 0.07 0.072 0.074 0.076 0.078 0.08 0.082 0.0840

50

100

150

200

GJR-GARCH: t = 586

(a)

0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.0950

20

40

60

80

BEGE: t = 743

(b)

0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.140

10

20

30

40

50

60

BEGE: t = 580

(c)

Figure 3.12: Predictive distribution for real data. (a) One-step ahead predicted conditionalvolatility at time t = 586 of GJR-GARCH(1, 1) model; (b) One-step ahead predicted condi-tional volatility at time t = 743 of BEGE model; (c) One-step ahead predicted conditionalvolatility at time t = 580 of BEGE model.

43

3.5.3 Model Choice

In this section we perform a simulation study to assess the performance of the SMC

estimator of the Bayesian model evidence for model selection for the GARCH-type

models. We first simulate five sets of data from each competing model of interest,

producing 15 datasets in total. The associated parameter values used to generate

simulated data are randomly selected from the SMC posterior samples of the three

GARCH-type models in Section 3.5.1. After obtaining simulated data, we use SMC

to estimate the log evidence for each candidate model given different sets of simulated

data.

Table 3.5: Estimation of Log Evidence

Simulated Data from GARCH(1, 1)

ModelData 1 2 3 4 5

GARCH(1, 1) 1811.4 1578.5 1843.6 1936.2 1728.8GJR-GARCH(1, 1) 1810.2 1576.6 1842.1 1934.2 1727.9

BEGE 1805.1 1572.5 1837.8 1930.1 1719.9Simulated Data from GJR-GARCH(1, 1)

ModelData 1 2 3 4 5

GARCH(1, 1) 1716.0 1804.1 1941.5 2017.7 1778.5GJR-GARCH(1, 1) 1730.1 1811.1 1952.6 2021.2 1784.0

BEGE 1723.8 1805.4 1943.7 2014.3 1777.0Simulated Data from BEGE

ModelData 1 2 3 4 5

GARCH(1, 1) 1931.0 1856.1 1819.2 1876.5 1786.4GJR-GARCH(1, 1) 1945.2 1865.2 1831.0 1886.9 1803.9

BEGE 1978.8 1881.4 1873.1 1915.7 1862.6

ModelData Real Time Series Data

Data Annealing SMC Likelihood Annealing SMCGARCH(1, 1) 1811.5 1811.4

GJR-GARCH(1, 1) 1817.5 1818.0BEGE 1845.4 1844.9

The results of each model’s estimation of log evidence given the simulated data

are presented in Table 3.5. As the data generating process (DGP) is known, the

performance of the evidence can be assessed by checking if the true model is cor-

rectly selected according to the estimated value of the log evidence. As can be seen

44

from Table 3.5, for each simulated dataset the model with the highest estimated

log evidence corresponds to the DGP. This demonstrates the effectiveness of the

fully Bayesian approach for model choice and the SMC evidence estimator for the

GARCH-type models and length of time series investigated here. It is worth not-

ing that for simulated data from the GARCH(1, 1) model, the GJR-GARCH(1, 1)

model has only a slightly smaller log evidence value compared to the GARCH(1, 1)

model. However, when data is generated from the GJR-GARCH(1, 1) model, the log

evidence of the GARCH(1, 1) model is much smaller than the GJR-GARCH(1, 1)

model. When the GARCH(1, 1) model is true, there is a possibility that the GJR-

GARCH(1, 1) model can mimic the behaviour of the GARCH(1, 1) model since the

GARCH(1, 1) model is nested within the GJR-GARCH(1, 1) model.

Moreover, each model’s estimated log evidence for real time series data is also pro-

vided in Table 3.5. There is strong evidence that the BEGE model is preferred over

the other two GARCH-type models for this dataset. This is consistent with the re-

sult from Bekaert et al. (2015) and reflects the fact that the BEGE model endowed

with non-Gaussian innovations outperforms the traditional GARCH model and the

standard asymmetric GJR-GARCH model when applied to equity market returns.

3.6 Discussion

In this chapter, a novel Bayesian framework for the analysis of GARCH-type models

has been developed, that allows for parameter estimation, model selection and pre-

diction. To this end, an efficient fully Bayesian approach has been developed using

SMC. To obtain exact Bayesian inference for the BEGE model, we proposed to use

an unbiased likelihood estimator in place of the biased approximation adopted in

Bekaert et al. (2015). This approach, in comparison to traditional estimation infer-

ence and forecasting techniques, offers a number of important advantages. Overall,

in contrast to traditional methods, the Bayesian framework offers a direct method

for quantifying the uncertainty surrounding both model parameter estimates and

volatility forecasts. It is shown that SMC offers a number of advantages over stan-

dard Monte-Carlo methods. For such applications on long time series data, the

SMC approach proposed here offers an efficient method generating the full predic-

tive distribution of volatility, computation of the evidence for model comparison

45

and undertaking leave-one-out cross-validation. Although this study has focused

on GARCH-type models, the benefits of the proposed approach would translate to

other time series models for alternative economic or financial problems. We have

demonstrated here that SMC data annealing can be an efficient approach for com-

paring the predictive performance of time series models. An interesting avenue for

future research would be to explore this idea more comprehensively across differ-

ent time series models and to draw comparisons with the approximate approach of

Burkner et al. (2019).

46

Chapter 4

Conclusion

4.1 Summary

The aim of this thesis was to develop and demonstrate the benefits of a computa-

tional Bayesian method called sequential Monte Carlo for estimation, prediction and

model selection for GARCH-type models for financial time series data. By complet-

ing this research, the objectives discussed in Chapter 1 have been met, and in doing

so a contribution to the Bayesian literature has been made.

In Chapter 3, we established a fully Bayesian approach to statistical inference for

the GARCH-type models via SMC, with the framework was discussed comprehen-

sively in Chapter 2. Comparing with the classical approach and MCMC Bayesian

approach, the potential advantages of using SMC have been demonstrated through a

novel application of the SMC method to three specific GARCH-type models used to

forecast the asset return volatility. These advantages include handling non-normal

posterior distributions, easy exploitation of parallel computing, and the ability to

perform Bayesian model selection by approximating the model evidence. For the

BEGE model, which has an intractable likelihood function, we developed an unbi-

ased likelihood estimator that permits exact Bayesian inference for the first time.

After deriving an effective importance distribution, we demonstrated that the impor-

tance sampling is a more efficient way to generate an unbiased version of the BEGE

likelihood relative to the standard Monte Carlo method. We demonstrated the effi-

ciency and superiority of SMC data annealing for performing posterior predictions

and comparing the predictive performance of time series models.

47

However, this research also has its limitations. The Bayes factor, which was used as

a Bayesian approach to model selection in this thesis, can be very sensitive to the

prior specification for the parameter of interest (Liu and Aitkin, 2008). In particular,

candidate models with informative priors are typically favored over the ones with

non-informative prior distributions. This drawback can be assessed by conducting a

sensitivity analysis using a wide range of potential priors, which has not been done

in this thesis as it would be very computationally intensive.

4.2 Future research directions

For future work, we expect that the benefits of our fully Bayesian approach can be

realised in alternative econometrics and other time series models. We showed the

potential of using SMC data annealing for leave-one-out cross-validation for time

series data. It would be interesting to investigate this idea more comprehensively

across a wider set of time series models and compare with the approximate approach

of Burkner et al. (2019) and other predictive comparison tools.

One of the main foci of the thesis is on the Bayesian approach for one-step ahead

forecasts, where the parameter uncertainty has been considered. Blasques et al.

(2016) have considered multiple-step ahead predictions with incorporating future

innovation uncertainty, which is related to the unknown true underlying process of

the volatility. It is crucial to consider the uncertainty of future innovations when

making multiple-step ahead forecasts according to Blasques et al. (2016). Therefore,

it would be of interest to derive multiple-step ahead predictions through a Bayesian

approach with taking into account the future innovation uncertainty, and comparing

this with the methods proposed in the literature.

For model selection tools, the widely applicable (or Watanabe-Akaike) information

criterion (WAIC, Watanabe 2010) might be preferable as it is less sensitive to the

prior distribution and easier to compute. Also, it still integrates over the posterior

distribution and does not rely on point estimates like BIC and DIC. However, cur-

rently the WAIC is not available for time series data, and thus it could be an avenue

for future research.

48

Appendix

Importance sampling estimator of likelihood of the BEGE

model

As described in Section 3.2, we need an importance distribution that will produce

a low variance estimator of the likelihood with a small Monte Carlo sample size.

This appendix shows that the likelihood requires to be estimated only when both

pt > 1 and nt > 1, and in other cases an exact likelihood can be computed. For

simplicity, we only consider here the likelihood for a single observation. Four cases

are considered to compute the likelihood fBEGE(u) with the following expression

fBEGE(u) =∫ωpfωn(ωp − u)f(ωp)dωp,where

fωp(ωp) = 1Γ(p)σpp

(ωp − ωp)p−1 exp(− 1σp

(ωp − ωp)), for ωp > ωp

fωn(ωp − u) = 1Γ(n)σnn

(ωp − u− ωn)n−1 exp(− 1σn

(ωp − u− ωn)), for ωp − u > ωn

where p ≥ 1, n ≥ 1, ωp = −pσp, and ωn = −nσn. The lower limit of integration

in the expression for fBEGE(u) is ωp, which must be greater than ωp and ωn + u.

Therefore, we define the lower limit of integration as δ = max (ωp, ωn + u).

Firstly, define σ = 1σp

+ 1σn

. When p = n = 1 an exact expression for fBEGE(u) can

be found by

fBEGE(u) =∫ ∞δ

1σp

exp(− 1σp

(ωp − ωp))

1σn

exp(− 1σn

(ωp − (u+ ωn)))

dωp

= k1

∫ ∞δ

exp (−σωp) dωp

= k11σ

exp(−σδ),where k1 = 1σp

1σn

exp(ωpσp

)exp

(u+ ωnσn

).

49

Then, the log-likelihood of the BEGE model in this case is ln [fBEGE(u)] = ln k1 −

ln σ − σδ. When p = 1 and n > 1, an exact expression for the likelihood can be

found as

fBEGE(u) =∫ ∞δ

1σp

exp(− 1σp

(ωp − ωp))

1Γ(n)σnn


(ωp − u− ωn))

dωp,

define k2 = 1σp

1Γ(n)σnn

exp(ωpσp

), and c1 = k2 exp

(− 1σp

(u+ ωn)), then the likelihood

can be written as

fBEGE(u)

=∫ ∞δ

k2 exp(− 1σn

(ωp − u− ωn))

exp(−ωpσp

+ u+ ωnσp

− u+ ωnσp

)(ωp − u− ωn)n−1 dωp

=∫ ∞δ

c1 exp (−σ (ωp − u− ωn)) (ωp − u− ωn)n−1 dωp

= c1Γ(n)σn

∫ ∞δ

σn

Γ(n) (ωp − u− ωn)n−1 exp (−σ (ωp − u− ωn)) dωp.

We set xp = ωp − u − ωn = ωp − (u + ωn), which gives ωp = xp + u + ωn and

dωp = dxp. The likelihood then can be written as

fBEGE(u) = c1Γ(n)σn

∫ ∞δ−(u+ωn)

σn

Γ(n)xn−1p exp (−σxp) dxp

=

c1

Γ(n)σn , if δ = u+ ωn

c1Γ(n)σn [1− F (ωp − (u+ ωn), n, σ)] , if δ = ωp,

where F is the CDF of a gamma distribution with parameters n and σ over the

interval of (0, ωp − (u+ ωn)). Similarly, in the case of n = 1 and p > 1 an exact

expression for the likelihood can be found as

fBEGE(u) =∫ ∞δ

1σn

exp(− 1σn

(ωp − (u+ ωn))) 1

Γ(p)σpp(ωp − ωp)p−1 exp

(− 1σp

(ωp − ωp))

dωp,

50

define k3 = 1σn

1Γ(p)σpp

exp(u+ωnσn

)and c2 = k3 exp

(−ωpσn

), then the likelihood can be

written as

fBEGE(u)

=∫ ∞δ

k3 exp(− 1σp

(ωp − ωp))

exp(−ωpσn

+ ωpσn− ωpσn

)(ωp − ωn)p−1 dωp

=∫ ∞δ

c2 (ωp − ωn)p−1 exp (−σ (ωp − ωp)) dωp

= c2Γ(p)σp

∫ ∞δ

σp

Γ(p) (ωp − ωn)p−1 exp (−σ (ωp − ωp)) dωp.

We set xp = ωp−ωp, which gives ωp = xp +ωp and dωp = dxp. The likelihood then

can be written as

fBEGE(u) = c2Γ(p)σp

∫ ∞δ−ωp

σp

Γ(p)xp−1p exp (−σxp) dxp

=

c2

Γ(p)σp , if δ = ωp

c2Γ(p)σp [1− F (u+ ωn − ωp, p, σ)] , if δ = u+ ωn,

where F is the CDF of a gamma distribution with parameters p and σ over the

interval of (0, u+ ωn − ωp).

In the case of p > 1 and n > 1, we use importance sampling method to estimate

likelihood of the BEGE model. Here we attempt to find a parametric importance

distribution g(ωp) that matched closely the target distribution h(ωp). Firstly, we

determine the mode and curvature of h(ωp). The expression for h(ωp) is given by

h(ωp) = 1Γ(p)σpp

(ωp − ωp)p−1 exp(− 1σp

(ωp − ωp))

1Γ(n)σnn


(ωp − u− ωn)).

We set m(ωp) = ln [h(ωp)], which is given by

m(ωp) =− ln Γ(p)− p ln σp + (p− 1) ln (ωp − ωp)−1σp

(ωp − ωp)

− ln Γ(n)− n ln σn + (n− 1) ln (ωp − u− ωn)− 1σn

(ωp − u− ωn).

51

The first and second derivatives of m(ωp) are given by

m′(ωp) = p− 1ωp − ωp

+ n− 1ωp − u− ωn

− σ,

m′′(ωp) = 1− p(ωp − ωp)2 + 1− n

(ωp − u− ωn)2 .

The mode ωp of m(ωp) can be found by setting m′(ωp) = 0, then we get

(p− 1)(ωp − u− ωn) + (n− 1)(ωp − ωp)− σ(ωp − ωp)(ωp − u− ωn) = 0,

σω2p + [−σ(u+ ωn + ωp) + (2− n− p)]ωp + [σωp(u+ ωn) + (p− 1)(u+ ωn) + (n− 1)ωp] = 0.

By setting a = σ, b = −σ(u + ωn + ωp) + (2 − n − p), and c = σωp(u + ωn) +

(p − 1)(u + ωn) + (n − 1)ωp, then we can write aω2p + bωp + c = 0 and the mode

ωp = −b+√b2−4ac

2a can be easily derived. The variance of m(ωp) is estimated by using

the second-derivative of m(ωp) around the mode ωp, which is given by

V = −[

1− p(ωp − ωp)2 + 1− n

(ωp − u− ωn)2

]−1

.

We propose a gamma distribution, X ∼ Γ(i, j) for x > 0, with mode ωp and variance

V to be the importance distribution. The mode of this gamma distribution is (i−1)j

and the variance is ij2. As ωp > δ, we need to make the importance distribution

greater than δ and a shift gamma distribution, Y = X + δ for y > δ, is selected. As

x = y − δ, we can show that∣∣∣dxdy

∣∣∣ = 1. Then an expression for the density of this

importance distribution g(y) is given by

g(y) = fX(x(y))∣∣∣∣dxdy

∣∣∣∣ = 1jiΓ(i)(y − δ)i−1 exp(−y − δ

j),

where the mode of this distribution is (i− 1)j+ δ and the variance is ij2. The value

of parameter i and j can be derived by solving (i− 1)j + δ = ωp and ij2 = V , and

the results are given by

i =−bi +

√b2i − 4V 2

2V,where bi = −2V − (ωp − δ)2

j = ωp − δi− 1 .

52

After obtaining the importance distribution g(·), we can unbiasedly estimate fBEGE(u)

by following Equation 3.4. It is worth noting that this importance distribution can

be used only when the mode ωp is greater than δ, which is the lower limit of inte-

gration in the expression for fBEGE(u). For this case, it can be proved that ωp is

always greater than δ, and the details of proof are shown below.

(1) when δ = ωp > (u+ ωn),

(2aδ + b)2 = (2σωp + b)2

= b2 + 4σ2ω2p + 4σωp [−σ(u+ ωn + ωp) + (2− n− p)]

= b2 + 4σ2ω2p − 4σ2ω2

p − 4σ2ωp(u+ ωn) + 4σωp(2− n− p)

= b2 −[4σ2(u+ ωn)ωp + 4σ(n− 1)ωp + 4σ(p− 1)ωp

]= b2 − 4ad,where d = σ(u+ ωn)ωp + (n− 1)ωp + (p− 1)ωp,

∵ c = σ(u+ ωn)ωp + (n− 1)ωp + (p− 1)(u+ ωn), and ωp > (u+ ωn)

∴ d > c, 4ad > 4ac, b2 − 4ad < b2 − 4ac, (2aδ + b)2 < b2 − 4ac, 2aδ + b <√b2 − 4ac,

δ <−b+

√b2 − 4ac

2a

δ < ωp;

(2) when δ = (u+ ωn) > ωp,

(2aδ + b)2 = (2σ(u+ ωn) + b)2

= b2 + 4σ2(u+ ωn)2 + 4σ(u+ ωn) [−σ(u+ ωn + ωp) + (2− n− p)]

= b2 + 4σ2(u+ ωn)2 − 4σ2(u+ ωn)2 − 4σ2ωp(u+ ωn)− 4σ(u+ ωn)(n+ p− 2)

= b2 − 4σ [σ(u+ ωn)ωp + (n− 1)(u+ ωn) + (p− 1)(u+ ωn)]

= b2 − 4ae,where e = σ(u+ ωn)ωp + (n− 1)(u+ ωn) + (p− 1)(u+ ωn),

∵ c = σ(u+ ωn)ωp + (n− 1)ωp + (p− 1)(u+ ωn), and (u+ ωn) > ωp

∴ e > c, 4ae > 4ac, b2 − 4ae < b2 − 4ac, (2aδ + b)2 < b2 − 4ac, 2aδ + b <√b2 − 4ac,

δ <−b+

√b2 − 4ac

2a

δ < ωp.

53

References

Alberg, D., Shalit, H., and Yosef, R. (2008). Estimating stock market volatility using

asymmetric GARCH models. Applied Financial Economics, 18(15):1201–1208. 14

Alexander, C. and Lazar, E. (2006). Normal mixture GARCH(1, 1): Applications to ex-

change rate modelling. Journal of Applied Econometrics, 21(3):307–336. 16

Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte

Carlo computations. The Annals of Statistics, 37(2):697–725. 7, 27

Ardia, D. et al. (2008). Financial risk management with Bayesian estimation of GARCH

models. Springer. 18

Ardia, D. and Hoogerheide, L. F. (2010). Efficient bayesian estimation and combination

of GARCH-type models. Rethinking Risk Measurement and Reporting: Examples and

Applications from Finance, 2. 19

Bauwens, L. and Lubrano, M. (1998). Bayesian inference on GARCH models using the

Gibbs sampler. The Econometrics Journal, 1(1):C23–C46. 19

Bauwens, L. and Lubrano, M. (2002). Bayesian option pricing using asymmetric GARCH

models. Journal of Empirical Finance, 9(3):321–342. 19

Bekaert, G., Engstrom, E., and Ermolov, A. (2015). Bad environments, good environments:

A non-Gaussian asymmetric volatility model. Journal of Econometrics, 186(1):258–275.

55

2, 15, 17, 18, 23, 25, 31, 45

Berger, J. O. (2013). Statistical decision theory and Bayesian analysis. Springer Science &

Business Media. 6

Berkes, I., Horvath, L., and Kokoszka, P. (2003). Garch processes: Structure and estimation.

Bernoulli, pages 201–227. 18

Black, F. (1976). Studies of stock market volatility changes. Proceedings of the American

Statistical Association, the Business and Economics Section. 14, 23

Blasques, F., Koopman, S. J., Lasak, K., and Lucas, A. (2016). In-sample confidence bands

and out-of-sample forecast bands for time-varying parameters in observation-driven mod-

els. International Journal of Forecasting, 32(3):875–887. 48

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of

Econometrics, 31(3):307–327. 2, 14, 15, 23

Bollerslev, T. and Wooldridge, J. M. (1992). Quasi-maximum likelihood estimation and infer-

ence in dynamic models with time-varying covariances. Econometric reviews, 11(2):143–

172. 18

Bolstad, W. M. and Curran, J. M. (2016). Introduction to Bayesian statistics. John Wiley

& Sons. 6

Bouchaud, J.-P., Matacz, A., and Potters, M. (2001). Leverage effect in financial markets:

The retarded volatility model. Physical review letters, 87(22):228701. 14

Burkner, P.-C., Gabry, J., and Vehtari, A. (2019). Approximate leave-future-out cross-

validation for time series models. arXiv preprint arXiv:1902.06281. 29, 30, 46, 48

Caporale, G. M., Ali, F. M., and Spagnolo, N. (2015). Oil price uncertainty and sectoral

stock returns in China: A time-varying approach. China Economic Review, 34:311–321.

18

56

Chen, C. W., Gerlach, R., and So, M. K. (2006). Comparison of nonnested asymmetric

heteroskedastic models. Computational Statistics & Data Analysis, 51(4):2164–2178. 19

Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The

American Statistician, 49(4):327–335. 6

Chopin, N. (2002). A sequential particle filter method for static models. Biometrika,

89(3):539–552. 1, 8, 10, 11, 24

Chopin, N., Jacob, P. E., and Papaspiliopoulos, O. (2013). SMC2: an efficient algorithm for

sequential analysis of state space models. Journal of the Royal Statistical Society: Series

B (Statistical Methodology), 75(3):397–426. 10, 27

Corradi, V., Distaso, W., and Swanson, N. R. (2009). Predictive density estimators for daily

volatility based on the use of realized measures. Journal of Econometrics, 150(2):119–138.

24

Corradi, V., Distaso, W., and Swanson, N. R. (2011). Predictive inference for integrated

volatility. Journal of the American Statistical Association, 106(496):1496–1512. 24

Del Moral, P., Doucet, A., and Jasra, A. (2006). Sequential Monte Carlo samplers. Journal

of the Royal Statistical Society: Series B (Statistical Methodology), 68(3):411–436. 1, 20,

24

Drovandi, C. C. and Pettitt, A. N. (2011). Likelihood-free Bayesian estimation of multivari-

ate quantile distributions. Computational Statistics and Data Analysis, 55(9):2541–2556.

11

Duan, J.-C. and Fulop, A. (2015). Density-tempered marginalized sequential Monte Carlo

samplers. Journal of Business & Economic Statistics, 33(2):192–202. 10, 27

Dyhrberg, A. H. (2016). Bitcoin, gold and the dollar–A GARCH volatility analysis. Finance

Research Letters, 16:85–92. 14

57

Engle, R. (2001). GARCH 101: The use of ARCH/GARCH models in applied econometrics.

Journal of Economic Perspectives, 15(4):157–168. 16, 23

Engle, R. (2004). Risk and volatility: Econometric models and financial practice. American

Economic Review, 94(3):405–420. 14

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the

variance of United Kingdom inflation. Econometrica: Journal of the Econometric Society,

pages 987–1007. 23

Engle, R. F. and Ng, V. K. (1993). Measuring and Testing the Impact of News on Volatility.

The Journal of Finance, 48(5):1749–1778. 15

Francq, C. and Zakoıan, J.-M. (2011). GARCH Models: Structure, Statistical Inference and

Financial Applications. John Wiley & Sons. 15, 16

Friel, N. and Wyse, J. (2012). Estimating the evidence–a review. Statistica Neerlandica,

66(3):288–308. 20

Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information

criteria for Bayesian models. Statistics and Computing, 24(6):997–1016. 21

Gerlach, R. and Chen, C. W. (2008). Bayesian inference and model comparison for asym-

metric smooth transition heteroskedastic models. Statistics and Computing, 18(4):391.

19

Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration.

Econometrica: Journal of the Econometric Society, pages 1317–1339. 19

Giraitis, L., Leipus, R., and Surgailis, D. (2007). Recent advances in ARCH modelling. In

Long Memory in Economics, pages 3–38. Springer. 14

58

Glosten, L. R., Jagannathan, R., and Runkle, D. E. (1993). On the relation between the

expected value and the volatility of the nominal excess return on stocks. The Journal of

Finance, 48(5):1779–1801. 14, 23

Griffin, J., Liu, J., and Maheu, J. (2016). Bayesian nonparametric estimation of ex-post

variance. Technical report, University Library of Munich, Germany. 24

Guesmi, K., Saadi, S., Abid, I., and Ftiti, Z. (2019). Portfolio diversification with virtual

currency: Evidence from bitcoin. International Review of Financial Analysis, 63:431–437.

18

Haario, H., Laine, M., Mira, A., and Saksman, E. (2006). DRAM: Efficient adaptive MCMC.

Statistics and Computing, 16(4):339–354. 8

Haario, H., Saksman, E., Tamminen, J., et al. (2001). An adaptive Metropolis algorithm.

Bernoulli, 7(2):223–242. 8

Hall, P. and Yao, Q. (2003). Inference in ARCH and GARCH models with heavy–tailed

errors. Econometrica, 71(1):285–317. 18

Hamilton, J. D. (1994). Time Series Analysis, volume 2. Princeton university press Prince-

ton, NJ. 2, 24

Hansen, P. R. and Huang, Z. (2016). Exponential GARCH modeling with realized measures

of volatility. Journal of Business & Economic Statistics, 34(2):269–287. 18

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their

applications. Biometrika, 57(1):97–109. 1, 6

Hentschel, L. et al. (1995). All in the family: Nesting symmetric and asymmetric GARCH

models. Journal of Financial Economics, 39(1):71–104. 23

59

Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. In

Mathematical Proceedings of the Cambridge Philosophical Society, volume 31, pages 203–

222. Cambridge University Press. 19

Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical

Association, 90(430):773–795. 19, 21

Khedhiri, S. and Muhammad, N. (2008). Empirical analysis of the UAE stock market

volatility. 14

Kim, S., Shephard, N., and Chib, S. (1998). Stochastic volatility: likelihood inference and

comparison with ARCH models. The Review of Economic Studies, 65(3):361–393. 2, 24

Koutmos, G. (1998). Asymmetries in the conditional mean and the conditional variance:

Evidence from nine stock markets. Journal of Economics and Business, 50(3):277–290.

14

Kristjanpoller, W. and Minutolo, M. C. (2015). Gold price volatility: A forecasting approach

using the artificial neural network–GARCH model. Expert Systems with Applications,

42(20):7245–7251. 14

Kristjanpoller, W. and Minutolo, M. C. (2016). Forecasting volatility of oil price using an

artificial neural network-GARCH model. Expert Systems with Applications, 65:233–241.

14

Kuha, J. (2004). AIC and BIC: Comparisons of assumptions and performance. Sociological

Methods and Research, 33(2):188–229. 21

Li, D., Clements, A., and Drovandi, C. (2019). Efficient Bayesian estimation for GARCH-

type models via sequential Monte Carlo. arXiv preprint arXiv:1906.03828. 4

Liu, C. C. and Aitkin, M. (2008). Bayes factors: Prior sensitivity and model generalizability.

Journal of Mathematical Psychology, 52(6):362–375. 48

60

Liu, J. S. (2008). Monte Carlo strategies in scientific computing. Springer Science & Business

Media. 9

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953).

Equation of state calculations by fast computing machines. The Journal of Chemical

Physics, 21(6):1087–1092. 1, 6

Mills, T. C. and Markellos, R. N. (2008). The econometric modelling of financial time series.

Cambridge University Press. 1

Muller, P. and Pole, A. (1998). Monte Carlo posterior integration in GARCH models.

Sankhya: The Indian Journal of Statistics, Series B, pages 127–144. 19

Nakatsuma, T. (1998). A Markov-chain sampling algorithm for GARCH models. Studies in

Nonlinear Dynamics & Econometrics, 3(2). 19

Nakatsuma, T. (2000). Bayesian analysis of ARMA–GARCH models: A Markov chain

sampling approach. Journal of Econometrics, 95(1):57–69. 19

Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11(2):125–

139. 9, 10, 24

Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach.

Econometrica: Journal of the Econometric Society, pages 347–370. 14, 23

O’Hagan, A. (2004). Bayesian statistics: principles and benefits. Frontis, pages 31–45. 24

Ritter, C. and Tanner, M. A. (1992). Facilitating the Gibbs sampler: the Gibbs stopper and

the griddy-Gibbs sampler. Journal of the American Statistical Association, 87(419):861–

868. 19

Robert, C. and Casella, G. (2004). Monte Carlo statistical Methods. Springer New York. 24

Roberts, G. O. and Rosenthal, J. S. (2009). Examples of adaptive MCMC. Journal of

Computational and Graphical Statistics, 18(2):349–367. 8, 24

61

Salomone, R., South, L. F., Drovandi, C. C., and Kroese, D. P. (2018). Unbiased and

consistent nested sampling via sequential Monte Carlo. arXiv preprint arXiv:1805.03924.

11, 12

Schwarz, G. et al. (1978). Estimating the Dimension of a Model. The Annals of Statistics,

6(2):461–464. 21

Sherlock, C. and Roberts, G. (2009). Optimal scaling of the random walk Metropolis on

elliptically symmetric unimodal targets. Bernoulli, pages 774–798. 11

South, L. F., Pettitt, A. N., and Drovandi, C. C. (2019). Sequential Monte Carlo samplers

with independent Markov chain Monte Carlo proposals. To appear in Bayesian Analysis.

1, 25

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesian

measures of model complexity and fit. Journal of the Royal Statistical Society: Series B

(Statistical Methodology), 64(4):583–639. 21

Takaishi, T. (2009). An adaptive Markov chain Monte Carlo method for GARCH model.

In International Conference on Complex Sciences, pages 1424–1434. Springer. 2, 24

Trevor, H., Robert, T., and JH, F. (2009). The elements of statistical learning: data mining,

inference, and prediction. 21

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using

leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5):1413–1432. 29

Virbickaite, A., Ausın, M. C., and Galeano, P. (2015). Bayesian inference methods for

univariate and multivariate GARCH models: A survey. Journal of Economic Surveys,

29(1):76–96. 19

62

Vrontos, I. D., Dellaportas, P., and Politis, D. N. (2000). Full Bayesian inference for GARCH

and EGARCH models. Journal of Business & Economic Statistics, 18(2):187–198. 2, 19,

24

Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable

information criterion in singular learning theory. Journal of Machine Learning Research,

11(Dec):3571–3594. 48

63

efficient bayesian estimation for garch-type models via sequential … · 2020. 2. 20. ·...

Documents