quantifying heteroskedasticity metrics - home -...

108
DEAKIN UNIVERSITY Quantifying Heteroskedasticity Metrics by Marwa Hassan Aly Hassan A thesis submitted in fulfilment for the degree of Doctorate of Philosophy In the Faculty of Science and Technology Institute for Intelligent Systems Research and Innovation (IISRI) November 2016

Upload: others

Post on 20-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

DEAKIN UNIVERSITY

Quantifying Heteroskedasticity Metrics

by

Marwa Hassan Aly Hassan

A thesis submitted in fulfilment for the

degree of Doctorate of Philosophy

In the

Faculty of Science and Technology

Institute for Intelligent Systems Research and Innovation (IISRI)

November 2016

sfol
Retracted Stamp
sfol
Retracted Stamp

Abstract

Heteroskedasticity is a statistical term that describes the differing variances of the

error terms in a time series dataset. The presence of heteroskedasticity in data

imposes serious challenges to different forecasting models. The lack of information

and accuracy of the available information can be overwhelming to the decision

maker, leading to many types of uncertainties. Heteroskedasticity of the data

affects the relationship between the predictor variable and the outcome. This

leads to false positive and false negative decisions in the hypothesis testing, and

invalidates the results of statistical tests.

The available approaches for studying heteroskedasticity, developed thus far, adopt

the strategy of accommodating heteroskedasticity in the time series and consider

it to be an inevitable source of noise. In these solutions, two forecasting models

are prepared for normal and heteroskedastic scenarios and a statistical test is to

determine whether or not the data is heteroskedastic. Practically, however, it has

been observed that time series data such as S&P 500 Index (financial time series),

has no homogeneity of variance over time. Consequently, if one assumes that the

data is homoskedastic when it is indeed heteroskedastic, then the regression model

2

3

assumptions (Gauss-Markov assumptions) will be violated. When this violation

occurs, the ordinary least squares estimates will not feature a minimum variance.

Moreover, one will not be able to determine if the estimated variances are too large

or too small. Thus, heteroskedasticity results in a violation of the Gauss-Markov

assumptions, and it needs to be rectified.

This study takes the heteroskedasticity tests to the next level and proposes a quan-

tification measure of heteroskedasticity in the time series. In this study, two meth-

ods are introduced for quantifying heteroskedasticity, namely Slope of Local Vari-

ance Index (SoLVI) and a statistical divergence method using the Bhattacharya

coefficient. Based on the experiments, both the Bhattacharrya divergence and

Slope of Local Variance (SoLVI) heteroskedasticity measures provide a quantifi-

able measure of heteroskedasticity. Both metrics maintain a lower and asymptotic

upper bound. The proposed measures can identify how heteroskedastic the data

is and how far the dataset under investigation is from being homoskedastic. The

introduced measures can identify for how long the data can be homoskedastic,

which can help in designing a prediction interval for the time series being exam-

ined. The proposed measurements are obtained by calculating the local variances

using linear filters, estimating variance trends, calculating the changes in variance

slopes, and finally obtaining the average slope angles. Data were drawn from series

of theoretical and real data sets. Finally, the proposed measures showed reliability

in measuring and quantifying heteroskedasticity in comparison to the hypothesis

and numerical tests of heteroskedasticity.

List of Publications

1. Hassan, M. and Hossny, M. and Nahavandi, S. and Creighton, D.; “Quantifying

Heteroskedasticity Using Slope of Local Variances Index,” Proceedings of the 15th

International Conference on Computer Modelling and Simulation, 2013.

2. Hassan, M. and Hossny, M. and Nahavandi, S. and Creighton, D.; “Quantifying

Heteroskedasticity via Binary Decomposition,” Proceedings of the 15th Interna-

tional Conference on Computer Modelling and Simulation, 2013.

3. Hassan, M. and Hossny, M. and Nahavandi, S. and Creighton, D.; “Het-

eroskedasticity Variance Index,” Proceedings of the 14th International Conference

on Computer Modelling and Simulation, 2012.

4. Hassan, M. and Hossny, M. and Nahavandi, S. and Creighton, D.; ” Quantifying

Heteroskedasticity via Statistical Divergence Measure,” Submitted to Economics

Letters on the 23rd of June, 2016.

5. Hassan, M. and Hossny, M. and Nahavandi, S. and Creighton, D.; ” Quantifying

Heteroskedasticity,” Submitted to IEEE Systems Journal on the 24th of June,

2016.

4

Research Contributions

This research introduces two novel approaches to quantify heteroskedasticity.

Local Variance Estimation methods use the definition of heteroskedasticity

as time-varying variance and derives an estimation of local variances in the time

series. SoLVI is introduced to quantify heteroskedasticity. Two variations were

developed in this method, variance of variance and slope of local variances. A

variance of local variances gives a reasonable estimate but suffers from a quadratic

growth and provides unbounded function. The slope of local variances solves both

problems.

Statistical Divergence methods investigate heteroskedasticity from a different

view point. These methods operate under the assumption that for a time series to

be heteroskedastic, the estimated local variances must follow a uniform distribu-

tion. Therefore, a statistical distribution for the estimated local variances is de-

rived and measured the distance between the derived distribution and the uniform

distribution using statistical divergence measures. The Bhattacharyya coefficient

is selected as the divergence metric as it reduced computational complexity and

asymptotic upper-bound.

5

Acknowledgements

Deepest gratitude and immeasurable appreciation for the help and support to make this

study possible and make this dream come true are extended to the following persons:

Professor Saeid Nahavandi: for offering me the opportunity to pursue my academic

dream

Associate Professor Douglas Creighton: for his help and support through this emotional

journey, with words of advice and encouragement

Dr Mohammed Hossny: for his time, help and support with coding, data and priceless

advice

Miss Trish O’Toole: for always making IISRI a home away from home

All the library staff at Deakin University: for keeping us updated with the latest research

in the field

6

Contents

Declaration of Authorship 1

Abstract 2

List of Publications 4

Research Contributions 5

Acknowledgements 6

List of Figures 10

1 Introduction 1

1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Scope and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Heteroskedasticity in Time Series 7

2.1 Time Series Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Continuous and Discrete Data Sets . . . . . . . . . . . . . . . . . . 8

2.1.2 Stationary and Non-Stationary Data Sets . . . . . . . . . . . . . . 8

2.1.3 Deterministic and Stochastic Data Sets . . . . . . . . . . . . . . . 9

2.2 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Descriptive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.2 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.3 Explanative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.5 Intervention or Control Analysis . . . . . . . . . . . . . . . . . . . 14

2.2.6 ARMA-ARIMA Time Series Forecasting Techniques . . . . . . . . 15

2.3 ARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 GARCH model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4.1 GARCH modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.2 Restrictions on GARCH model . . . . . . . . . . . . . . . . . . . . 27

2.4.3 Outliers in GARCH model . . . . . . . . . . . . . . . . . . . . . . 28

2.5 Heteroskedasticity Challenges and Tests . . . . . . . . . . . . . . . . . . . 30

2.5.1 Heteroskedasticity Detection . . . . . . . . . . . . . . . . . . . . . 31

2.5.2 The Goldfeld-Quandt test . . . . . . . . . . . . . . . . . . . . . . . 33

7

Contents 8

2.5.3 Ordinary likelihood ratio test . . . . . . . . . . . . . . . . . . . . . 34

2.5.4 Conditional likelihood ratio test . . . . . . . . . . . . . . . . . . . 34

2.5.5 Modified likelihood ratio test . . . . . . . . . . . . . . . . . . . . . 35

2.5.6 Residual likelihood ratio test . . . . . . . . . . . . . . . . . . . . . 36

2.5.7 Breusch-Pagan Test . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.5.8 White Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.5.9 Levene’s Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5.10 Spearman’s Rank Correlation Coefficient . . . . . . . . . . . . . . 40

2.5.11 Heteroskedasticity challenges . . . . . . . . . . . . . . . . . . . . . 41

2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Critical Review of the Literature 44

3.1 ARCH model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 GARCH model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2.1 GARCH modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.2 Restrictions on GARCH model . . . . . . . . . . . . . . . . . . . . 52

3.2.3 Outliers in GARCH modeling . . . . . . . . . . . . . . . . . . . . . 53

3.3 Prediction Interval Modelling . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4 SoLVI: Slope of Local Variance Index 59

4.1 Local Estimation of Statistical Parameters . . . . . . . . . . . . . . . . . . 60

4.1.1 Estimation of Local Average . . . . . . . . . . . . . . . . . . . . . 60

4.1.2 Generalisation with Gaussian Filters . . . . . . . . . . . . . . . . . 62

4.1.3 Estimation of Local Variance . . . . . . . . . . . . . . . . . . . . . 62

4.1.4 ARMA Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 HVI: Heteroskedasticity Variance Index . . . . . . . . . . . . . . . . . . . 66

4.3 SoLVI: Slope of Local Variance Index . . . . . . . . . . . . . . . . . . . . . 68

4.3.1 Local Variance Regression . . . . . . . . . . . . . . . . . . . . . . . 69

4.3.2 Average Slope of Local Variance . . . . . . . . . . . . . . . . . . . 70

4.3.3 Selection of Kernel Size w . . . . . . . . . . . . . . . . . . . . . . . 71

4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 Divergence Heteroskedasticity Measure 73

5.1 Mutual Information (MI) . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.1.1 Tsallis Driven Mutual Information (MIα) . . . . . . . . . . . . . . 76

5.2 Jensen-Shannon Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.3 Renyi Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.4 Bhattacharyya Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.4.1 Hellinger Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.4.2 Bhattacharyya Heteroskedasticity Measure . . . . . . . . . . . . . 81

5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6 Conclusions and Future Work 84

Contents 9

References 87

List of Figures

1.1 A comparison between homoskedastic and heteroskedastic data. . . 4

2.1 A comparison between stationary and non-stationary time-series. . 92.2 Fat-tailing Phenomenon. . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 Local variances of homoskedastic and heteroskedastic samples. . . . 644.2 Local variances of a heteroskedastic time series using convolution. . 654.3 Local variances of a heteroskedastic time series using autoregression. 664.4 Local variance comparison using convolution and autoregressive fil-

tering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.5 HVI with different kernel sizes . . . . . . . . . . . . . . . . . . . . . 684.6 SolVI scores for time series generated using different 64 sigmas. The

graphs demonstrate different kernel sizes w. . . . . . . . . . . . . . 70

5.1 Probability density function of local variances for homoskedasticand heteroskedastic time series . . . . . . . . . . . . . . . . . . . . . 73

5.2 The effect of HVI window size on Bhattacharyya coefficient. Theresult are of time series generated using different 64 sigmas. Thegraphs demonstrate different kernel sizes w. . . . . . . . . . . . . . 79

5.3 Bhattacharyya distance of a time series generated using different 64sigmas. The graphs demonstrate different kernel sizes w. . . . . . . 80

5.4 Hellinger coeffecient of a time series generated using different 64sigmas. The graphs demonstrate different kernel sizes w. . . . . . . 82

10

To my Dad: I am everything I am today because you loved me.

Thank you for being my father. Hope I made you proud.

My Mum : Thank you for always believing in me, always pushing

me towards success every single day.

My brothers, Mohammed and Mostafa: Thank you for being my

support and my rock when it gets hard along the way.

My Husband, Mohammed: words cannot describe how grateful I

am. Thank you for making my dream come true. And above all,

thank you for joining me in this journey.

My beautiful boys, Ahmed and Omar: You are the reason I wake

up every morning determined to be the best I can be. Thank you

for being the joy of my life. Love you to infinity and beyond.

11

Chapter 1

Introduction

1.1 Overview

Statistical uncertainty is defined as a lack of knowledge. This lack of knowledge,

either of information or in the context, can cause model-based predictions to devi-

ate from reality [1–3]. Some researchers have linked uncertainty to risk, due to the

lack of information or the lack of control [4–6]. This resulted in the assumption

that: to minimise the risk, the decision maker has to minimise uncertainty. On

the other hand, other researchers did not link uncertainty and risk, claiming they

are two different concepts. This led to two different scenarios: either uncertainty

depends on risk, or the other way around, risk depends on uncertainty.

Researchers have developed many strategies to deal with uncertainties. These

strategies range from simply ignoring the presence of the uncertainties, generat-

ing the missing knowledge causing uncertainties, interaction through communica-

tion, negotiation and dialogical learning, and finally applying the coping strategy

[5, 7, 8]. Uncertainty can be classified according to its predictability into: Aleatory

1

Introduction 2

and Epistemic uncertainties. Aleatory uncertainty is this type of uncertainty

that is caused by sudden disruption. This type of uncertainty is usually formu-

lated by probability distribution. Aleatory uncertainty is objective, stochastic and

irreducible [5, 7, 9–11]. Epistemic uncertainty is a predictable uncertainty be-

cause it results from missing or confusing knowledge. This can be detected at

the simulation or runtime phases [6, 12, 13]. The system performance can be im-

proved by increasing the knowledge and information. Epistemic uncertainty can

be classified into: model uncertainty, parametric uncertainty and completeness un-

certainty. Model uncertainty includes structural uncertainty, which is formulated

using geometric modelling and behavioural uncertainty formulated using algebraic

models. Parametric uncertainty is the type of uncertainty where the parameters

are known and predictable. It includes possible inaccuracies, due to small data

sets. Finally, completeness uncertainty is identifiable when some of the significant

properties and parameters are not studied in either uncertainty or risk analysis.

Uncertainty can be classified into logical uncertainty, mathematical uncertainty,

probability uncertainty and fuzzy logic. In this research, we are interested in

probability uncertainty. It is represented in the confidence intervals. The main

idea is to formulate a prediction interval that can represent the future based on

the examined data [6, 13].

A heteroskedastic time series features unpredictable measures of dispersion. This

uncertainty in statistical distribution parameters imposes a serious challenge to

Introduction 3

the forecasting models. In regression analysis, heteroskedasticity affects the re-

sults of hypothesis tests, which may lead to biased inference. Heteroskedasticity

of the data will provide an unbiased estimate for the relation between the predictor

variable and the outcome. This research investigates quantifying the heteroskedas-

ticity of the data and improving the forecasting accuracy of time series.

The study of heteroskedasticity has received much attention from researchers,

especially in the field of regression analysis. Robert Engle, a 2003 Noble laureate, is

one of the pioneers in the field. His research studies the conditional variances in the

financial return series and led to the development of the widely used Autoregressive

Conditional Heteroskedasticity (ARCH) model. This allowed for capturing the

non-stability of return series because of the persistence in volatility shocks over

time [14–19]. ARCH was a major breakthrough in econometrics modelling, helping

in identifying the stochastic process of the errors when fitted to the empirical data

[20].

The ARCH model and its generalised GARCH model have been recognised as

having potential in financial applications that required forecasting volatility. They

were applied in modelling exchange rates, interest rates and stock index returns.

Bauwens et al. [21] and Bollerslev et al. [22] listed a variety of applications of

volatility models in their survey.

The GARCH family has been extended into a variety of GARCH models includ-

ing IGARCH, EGARCH, GARCH-M, MGARCH, the F-GARCH, the full factor

Introduction 4

0 200 400 600 800 1000 1200 1400 1600 1800 2000

−30

−20

−10

0

10

20

30

40

sample at time t

y(t)

(a) Homoskedastic

0 200 400 600 800 1000 1200 1400 1600 1800 2000

−30

−20

−10

0

10

20

30

40

sample at time t

y(t)

(b) Heteroskedastic

Figure 1.1: A comparison between homoskedastic and heteroskedastic data. a) Ho-moskedastic (constant variance) sample. b) Heteroskedastic (varying variance).

FF-GARCH model, the orthogonal O-GARCH. ARCH and GARCH models are

important because they are widely used in econometrics and finance and they have

general applications. Based on the literature, ARCH and GARCH specifications

of errors and heteroskedasticity of data allow for a more accurate and reliable way

in forecasting volatility [2, 22–30]. Each of the previously listed models have en-

riched the literature of statistical and regression analysis with methods by which

to address the features of time series data for a better presentation. They also

aim to solve such problems in a more comprehensive way.

1.2 Motivation

A variety of statistical tests has been developed over the past three decades to de-

termine whether or not time series feature heteroskedastic behaviour. Regression

analysis, Monte Carlo and other simulation techniques were developed to minimise

the effect of uncertainty on decision-making. Yet, these tests cannot measure the

Introduction 5

degree of heteroskedasticity in the data, which can be valuable when developing a

long-term plan and decision-making. However, the monotonically increasing fre-

quency of global challenges such as economic crises, climate change and epidemics

highlighted the limitations of relying on hypothesis testing with binary results and

justified the need for quantifiable measures for heteroskedasticity. These measures,

however, must satisfy few constraints such as a zero-valued lower bound to indicate

homoskedasticity and an asymptotic upper bound for heteroskedasticity.

There have been many attempts to detect the heteroskedasticity in time series.

However, these tests do not quantify the amount of heteroskedasticity in the ex-

amined datasets. On the other hand, quantifying heteroskedasticity can provide

extra information about the behaviour of time series. Studying this behaviour will

improve forecasting of behavioural dependent time series data.

This research addresses the need to investigate the heteroskedasticity of the data,

the stationarity and the non-stationarity of the time series being tested and finally,

introducing a measure that can quantify heteroskedasticity.

1.3 Scope and Assumptions

This research focuses on error variance or linear regression models described by

Guass-Markov theorem in [31]. Consequently the Guass-Markov assumptions

state that the errors are uncorrellated and have an equal variance over time (ho-

moskedasticity) as follows.

Introduction 6

E (ui|x1, . . . , xn) = 0 (1.1)

V ar (ui|x1, . . . , xn) = σ2u, where 0 < σ2

u < ∞ (1.2)

E (uiuj|x1, . . . , xn) = 0, i �= j (1.3)

1.4 Thesis Outline

This research is structured as follows: Chapter 2 will address heteroskedasticity

in time series. It discusses time series data sets and the types of time series

analysis. In addition, we discuss the heteroskedasticity literature, how to deal with

heteroskedasticity of the data and its implications on the data analysis of different

models addressing heteroskedasticity, and the advantages and disadvantages of

these models. The available solutions are discussed and critically reviewed in

Chapter 3.

Chapter 4 investigates an approach to quantify heteroskedasticity and explores

the advantages and limitations of the proposed Heteroskedasticity Variance Index

(HVI) model. The experiments and test results will be illustrated and discussed.

The Slope of Local Variance Index (SoLVI) model will be introduced.

Chapter 5 introduces the Bhattacharyya Heteroskedasticity measure, that relies

on the Bhattacharya distance and coefficient metrics. Finally, the conclusions and

areas of future work are discussed in Chapter 6.

Chapter 2

Heteroskedasticity in Time Series

Time series analysis is defined as the analysis of observations with equal time in-

tervals. Fields of interest include statistics, signal processing, economics, financial

analysis, earthquake prediction and weather forecast. These time series are char-

acterised by exhibiting various degrees of correlation and/or volatility clustering

over time [32]. More statisticians are devoted to refining the existing models and

introducing new ones. Their goal is to provide the best models to meet today’s

complex tasks. Let us start our journey by expanding our definition of time series

analysis, its methods, and the types of data that it represents.

2.1 Time Series Data Sets

Another consideration when studying time series analysis is to understand the

various types of time series data sets. Time series data has specific characteristics

that affect long-term planning and forecasting. The main target of analysing

time series is to construct a prediction interval. This prediction interval aims

to predict future values of the series. They are classified into three categories:

7

Heteroskedasticity in Time Series 8

Continuous versus Discrete time series, Stationary versus Non stationary, and

finally Deterministic versus Stochastic time series [33, 34].

2.1.1 Continuous and Discrete Data Sets

Continuous time series is when the behaviour of a system is described by a set

of linear differential equations. Continuous models are represented with f(t) and

the changes in the data are always reflected over continuous time intervals. On

the other hand, the Discrete time series is when the behaviour is described by

difference equations. A difference equation, is also known as ”recurrence relation”.

The recurrence relation is an equation described as a function of the preceding term

as a sequence. A discrete model does not take into consideration the function of

time.

2.1.2 Stationary and Non-Stationary Data Sets

A stationary time series is when the data fluctuates around a constant time. It is

a stochastic process with a joint probability distribution kept constant at all time-

intervals. Non-stationary time series is when the parameters of the series (length,

amplitude and phase) change over time. The joint probability distribution of

non-stationary process is dependent on the time index. The trends need to be

eliminated to avoid their impacts on the other features of the time series data.

Heteroskedasticity in Time Series 9

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

20

40

60

80

100

120

sample (t)

read

ings

y(t

)

time series data y(t)local std dev σ

y(t)

(a) Time Invariant Mean and Variance

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000−40

−20

0

20

40

60

80

100

120

140

sample (t)

read

ings

y(t

)

time series data y(t)local std dev σ

y(t)

(b) Time Variant Mean

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000−1500

−1000

−500

0

500

1000

1500

2000

sample (t)

read

ings

y(t

)

time series data y(t)local std dev σ

y(t)

(c) Time Variance Variance)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000−50

0

50

100

150

200

250

300

350

400

sample (t)

read

ings

y(t

)

time series data y(t)local std dev σ

y(t)

(d) Time Variant Mean and Variance

Figure 2.1: A comparison between stationary (a) and non-stationary (b, c and d)time-series data sets [35].

2.1.3 Deterministic and Stochastic Data Sets

Time series systems can be classified according to whether they are deterministic

or stochastic. A deterministic model is based on the assumption that the outcome

is certain. Ultimately, a change in the exogenous variable (independent) will have

an impact on the endogenous (dependent) variable. A deterministic model can

then be defined as the case when the previous states of the variable affect how the

variables are being determined. Consequently, the variables will be identified the

same, given the same initial conditions. Stochastic models are considered more

Heteroskedasticity in Time Series 10

Figure 2.2: Levy distribution with α = 0.5, 1.5, 2.0 (left to right) is a well knownexample featuring fat-tailing phenomenon. The fat-tailed anomaly is highlighted by thered circles. The areas highlighted by the red circles show that the tail of a unimodalstatistical distribution does not monotonically decrease as X moves further from the

mean and mode. .

realistic than the deterministic models, within the context of behavioural effects.

A person’s purchasing power does not depend only on their income, but it is also

affected by their tastes, age, the time of the year and so on. In contrast to the

deterministic model, randomness persists [14, 36].

2.2 Time Series Analysis

Time series analysis is concerned with studying the statistical features of time

series data and extracting meaningful information from them. These features can

be classified into: fat tails (excess kurtosis), volatility clustering, and leverage.

Another aspect of time series analysis is to understand such features for a better

presentation of time series data and enhance the process of future prediction [33,

37–39].

The properties of time series data and their implications are to be explored as

follows:

Heteroskedasticity in Time Series 11

The fat-tail phenomenon, sometimes referred to in the literature as heavy-tailed

distribution, is a feature of some probability distributions in which excess kurtosis

is exhibited. Kurtosis is a measure of distribution of a real valued random vari-

able around the mean. In [40], Plott was the first to address the problem of excess

kurtosis and its implications on the experimental market. However, Plott did not

introduce an explanation of such phenomena. In [3], Harvey addressed the fat

tailed disturbances in implementing their time series models with ARCH distur-

bances. Harvey’s transition equation expanded the scope of the model to cover

disturbances with fat tails. This was considered a breakthrough in the research

field, because the traditional Gaussian ARCH models did not seem to capture this

phenomenon, especially in the financial data [41–44]. Figure 2.2 illustrates the

difference between the normal and heavy tailed distributions.

The variation from a time period to the succeeding one is referred to as volatility

clustering or persistence. Volatility clustering refers to the fact that changes tend

to follow each other in their magnitude in an uncorrelated yet serially dependent

way [14, 18, 45]. It is a property of most heteroskedastic time series processes

used in the fields of finance and economics. Volatility clustering, as a type of

heteroskedasticity, attempts to explain part of the causes of excess kurtosis in the

time series data. Excess kurtosis can be partially the consequence of abnormalities

in the return distributions in the financial assets that experience fat tails.

The final characteristic of time series is measuring the leverage effect. This effect

was best described in the financial applications. It describes the case when the

Heteroskedasticity in Time Series 12

observed returns on the assets being examined have a negative relationship with

the changes observed in the volatility. For certain assets, a negative relationship

was observed between volatility trends and the expected returns. In other words,

volatility tends to rise when the returns are lower than expected and vice versa.

Time series analysis can accomplish many goals. Some of which are descriptive

analysis, spectral analysis, explanative analysis, forecasting and control analysis.

2.2.1 Descriptive Analysis

Descriptive analysis explains the trends and patterns experienced by the time

series. This can be illustrated either by plotting or by applying more advanced

statistical techniques. The approach most commonly applied is to plot and analyse

these trends and patterns such as: overall trends of the time series, outliers, turning

points and finally, the cyclic patterns in the data series.

2.2.2 Spectral Analysis

This analysis describes how cyclic components can be accounted for variations in

time series data. This type of analysis is sometimes referred to as the frequency do-

main, where an estimate of a spectrum over a range of frequencies can be obtained

and periodic components can be separated out from a noisy environment.

Heteroskedasticity in Time Series 13

2.2.3 Explanative Analysis

Explanative analysis, also known as cross correlation, describes a mechanism that

can develop a dependent time series, and can be estimated, given one or more

variable time series. This type of analysis aims to describe the data generation

in an appropriate statistical model. These models can be either univariate or

multivariate. Therefore, this type of analysis can provide an explanation for the

variations in the series.

2.2.4 Forecasting

Forecasting refers to predicting the future behaviour of the time series based on

how it reacted in the past, within a specified confidence limit. The stochastic cor-

relation between one observation and the succeeding one is to be utilised to predict

the future values based on the past history and the behaviour of the observation

[46–50]. In this case, we can classify time series into two components: the de-

terministic component (the forecast) and the random component (the uncertainty

related to the forecast). This can be illustrated as follows:

Yt = f(t− 1, X) + εt (2.1)

where f(t− 1, X) is the function of the independent component at time t− 1 and

εt represents the disturbances in the mean of Yt.

Heteroskedasticity in Time Series 14

2.2.5 Intervention or Control Analysis

This technique can be applied to prevent a certain event affecting the time series

from happening, given that we are confident that this event has an impact upon

the time series. We can then define this event and ensure that it will not affect

the time series data being investigated. This can be referred to as ”What if?”

forecasting.

A time series model illustrates that observations close together in time will be

more related than observations further apart. In other words, values for a given

period will be expressed as being driven from past values rather than from future

values. In these cases, the data can be stabilised to normalise the variance by

applying the algorithmic transformation. We can also make the seasonal effects

additive, which will make the effect constant from one year to another. Finally,

we can make the data normally distributed, which will reduce the skewness in the

data to apply appropriate statistics.

A time series can be described as a sequence of correlated random observations.

This correlation is utilised to predict the future values baed on the historical se-

quence of the observed data under investigation. Statisticians have been spending

a lot of time and effort developing models that lead to better predictions based on

past data. They have been facing challenges of model limitations, errors and dif-

ferent natures of data that affect their calculations. The researchers in [48] offered

a comprehensive evaluation of forecasting techniques associated with volatility.

Heteroskedasticity in Time Series 15

2.2.6 ARMA-ARIMA Time Series Forecasting Techniques

One of the first efforts in the area of regression analysis is the Box-Jenkins ap-

proach [46]. Their development of the Autoregressive Integrative Moving Average

(ARIMA) model, which combined the Autoregressive (AR) and the moving aver-

age (MA) was a major breakthrough. This model is specialised in dealing with

the random shocks in the data at each corresponding time. Some assumptions

about these shocks are to be identified by the time series. These assumptions can

be described as follows: zero mean, constant variance, a normal distribution and

finally, no covariance detected between one shock and another. ARIMA (p,d,q)

model covers the following main aspects: Autoregression, Integration and Mov-

ing average. The autoregression [ARIMA(p,0,0)] describes the importance of the

preceding values of the series to the current ones over time. But the data val-

ues tend to decrease over time on an exponential basis. Consequently, this effect

will continue to decrease until it is nearly zero. It is represented by the following

equation

Yt = φ1Y(t−1) + αt (2.2)

where φ has a constraint of being between -1 and 1.

The integration process [ARIMA (0,d,0)] aims towards removing the trends and

drift of the data which results in transferring non-stationary data into stationary.

Heteroskedasticity in Time Series 16

It works by subtracting the first observation from the second observation and so

on.

The order of the process rarely exceeds one. Although many unit root tests have

been introduced in the last three decades, identifying the integration (d) still

depends on the graph representing the time series. If the data exhibit apparent

deviations from stationarity, it will not be appropriate to assign d value as zero.

Finally, there is the moving average process [ARIMA(0,0,q)]. This process is

concerned with the serially correlated data and which can be represented as

Yt = αt − θαt − 1 (2.3)

Generally, The ARIMA is fitted to the time series to perform one of two basic

time series models functions[51, 52]. It is either fitted to the data to provide us

with a better understanding of the data in hand, or to perform forecasting based

on the current data given.

To identify the most appropriate ARIMA model for a time series, we start by

differencing in order to make the series stationary and eliminate the gross feature

of seasonality. This is the first step in the Box-Jenkins approach that can be

referred to as the (de-trending of the series). One may transform a series by

taking the first differences as

Yt = ηt − ηt−1 = (1− β)ηt (2.4)

Heteroskedasticity in Time Series 17

There are many results to the differencing process. If the differenced series Yt

is an ARMA process, ηt is known as ARIMA process. But when Yt represents

iid random variables and ARMA is a (0,0) process, then ηt is an ARMA process.

Finally, when Yt process is a stationary ARMA (p,q) process, ηt is an ARIMA

(p,1,q)process or an ARMA (p+1,q) process with An AR unit root.

In practice, the differencing technique is not the only way to eliminate the trends

and seasonal effects. One may also eliminate the seasonality by other techniques

such as regression or by estimating a seasonal ARMA model.

The second step in the Box-Jenkins approach is identification. Identifying the

proper ARMA model has never been an easy task. For convenience, many pro-

grams today simply use the 95% confidence level. One might say that these bounds

are used mostly to identify and check the autocorrelations of the white noise. The

primary tools for dealing with autocorrelations are either the autocorrelations or

partial autocorrelations plots. The sample autocorrelation plots are compared to

the theoretical behaviour plots when the order is known. For a stationary and

invertible ARMA (p,q) model, both autocorrelation and partial autocorrelation

decay to zero and don’t have any unexpected cutoff points. Which is why in the

literature we consider the process of identifying the ARMA order both arbitrary

and difficult [34, 53].

The last step in the Box-Jenkins model is the model estimation. For ARMA

models, we have to estimate the order of the autoregressive process (p), and the

Heteroskedasticity in Time Series 18

moving average process (q). For the (p) process, the sample autocorrelation should

have an exponentially decreasing nature. However, it might combine a mixture of

both exponentially decreasing and sinusoidal components.

As for the estimation of the MA component of the ARMA model, it is more

involved because the innovations or white noise part εt are not observable and

only can be computed recursively. The autocorrelation function of MA becomes

zero at lag q+1 and greater. So, when we are examining the sample autocorrelation

function, we are mainly concerned with where the plot tends to be zero. This can

be done by presuming a confidence level of 95% for the sample autocorrelation

function on the sample autocorrelation plot.

We can sum up the Box-Jenkins model in the following points: The ARMA model

is designed to transform time series that are characterised with seasonal effects

and stochastic trends to non-stationary time series. It is mainly used to repre-

sent the transitory dependence. Estimation using the Box-Jenkins model is of

such complexity that most of the literature argues that it is preferable to leave

the process of parameter estimation to high quality software. [46]. This mainly

applies to the non-linear estimation, in which the non-linear last square approach

is recommended.

There are other non-linear models to represent the changes of variances in a time

series (Heteroskedasticity) [25, 54]. Examples include but are not limited to:

Heteroskedasticity in Time Series 19

ARCH, GARCH, E-GARCH and others. These models are concerned with study-

ing heteroskedasticity not as a problem to be corrected, but as a prediction to be

computed. In the field of econometrics, ARCH and GARCH models were proven

in the literature to cover lots of the aspects related to heteroskedasticity. They

perform in a much sophisticated yet comprehensive approach. This section aims

to provide a basic comparative study between two of the core models addressing

heteroskedasticity, ARCH and GARCH models. It aims to define the strengths

and weaknesses of each model, the statistical modelling behind each and finally the

researchers’ contributions in each model. This will explain why researchers prefer

one model to the other, and explains why they are still interested in understanding

the statistics behind each of them.

2.3 ARCH Model

Engle [19] was the first to introduce ARCH model. ARCH stands for Autoregres-

sive conditional heteroskedasticity. Autoregressive indicates a feedback mechanism

that incorporates past and present observations. While conditional means that the

variance relies basically on the immediate past. ARCH is based on the concept

that the unconditional variance is left constant while allowing the conditional vari-

ance to change. This change is applied over a period of time and is represented as a

function of past errors. This process is characterised by having a zero mean and se-

rially uncorrelated with non-constant variances, having these variances conditional

on the past error terms, but they have constant unconditional variances. ARCH

Heteroskedasticity in Time Series 20

model has proven success, especially in identifying and analysing major sophisti-

cated economic phenomena. In Engle [20] and Engle and Kraft [2], applications

on economic inflation were discussed. These models recognised the volatility of

inflation. With a known zero mean, the model can be represented as

Yt = εtXt−1 (2.5)

where Yt is the dependent variable and Xt is the vector of the exogenous variables.

Since ARCH model is applied when heteroskedasticity is an issue in the time series,

therefore one should pay attention to the variances when dealing with the data on

hand. The variance was proven to show slow variation over a time span, making

the assumption of equal weights intolerable. In other words, the events that are

more recent are given higher weights as they are more relevant. Then, based on

the work of Engle in 1982, these weights were considered the equation parameters

to be estimated. This allowed the ARCH model to be capable of detecting the

best weights for the data under investigation, allowing for the forecasting process

to take place [19] and [54].

The ARCH regression model is generated by assuming the mean of Yt is given as

Xtβ. Then, the regression model can be finally represented as

εt = Yt −Xtβ (2.6)

Heteroskedasticity in Time Series 21

The ARCH regression model has a range of features, making it more plausible for:

1. Econometric applications: Econometric forecasters are capable of forecasting

and predicting the future variation in the data from one period to another. ARCH

emphasises on the fact that future forecast varies over time and rely on the past

errors in their prediction. 2. Monetary theory and theory of finance. The portfolios

of financial assets depend on their estimation on the rates of returns. Therefore,

the variations in the expected means and variances on such returns consequently

affect the prediction of such asset prices. ARCH has shown access in handling

the uncertainties of such equities by studying the nature of the variances within

the constrained regression model. 3. Forecasting is conditionally deterministic.

It does not leave any uncertainty in the process of expecting the squared error in

time t when compared to past error terms.

We can claim that the ARCH model analyses the effect of omitting the vari-

ables from the estimated model. ARCH literature made a major breakthrough in

predicting the volatility of economic time series, making it possible to analyse the

apparent changes in volatility, and identifying the impact of non-linear dependence

of the parameters in the economic model.

Applying ARCH models is a straight forward process. They handle the collective

errors, they take of non-linearities and finally, they adapt to the change in the per-

sonal capabilities of the economic forecaster. From the statistical point of view,

the ARCH models may be considered considered as some specific non-linear time

series models. This will allow for a quite exhausting studying of the underlying

Heteroskedasticity in Time Series 22

dynamics. On the other hand, the ARCH model failed to capture irregular phe-

nomena especially in the financial and econometric sector such as crashes, mergers,

news effects or threshold effects. We can claim that ARCH models are not offering

the flexibility required for capturing the persistence in volatility [22].

Finally, we can summarise the strategic features of ARCH models as follows:

1. ARCH models are playing a very important role in identifying the stochastic

nature of the error terms, for both linear and non-linear econometric models; and

2. ARCH models allow for the prediction of the average size of error terms when

fitted to empirical data.

Although ARCH modelling showed some great capabilities handling the stochastic

features and nature of time series data, there was an apparent need to generalise

this model to enhance its performance. This consequently led to the introduction

of the generalised ARCH model known as GARCH.

2.4 GARCH model

GARCH is considered a useful generalisation of ARCH modelling. Bollerslev first

introduced it in 1986 [55]. The GARCH model aims to help with the study of past

variances in order to explain future ones. GARCH model explains the dependence

of the time series model on volatility. Generalising Engle’s ARCH technique, the

GARCH model included both the autoregressive (AR) as well as moving average

Heteroskedasticity in Time Series 23

(MA) terms. It has the advantage of using fewer parameters. This will subse-

quently increase its computational efficiency.

GARCH follows the understanding that all past errors contribute to forecast

volatility. This is to be considered the most general case.The GARCH model

has proven success in the area of predicting conditional variance. GARCH was

able to capture the main features of the series under investigation by describing

the conditional variance. At the same time, it is simple enough to allow for much

investigative study of the available solutions.

The main application of GARCH, by definition, is when the series is heteroskedas-

tic; its variance is noticed of being changing over time. The error terms are

practically expected to change from one point to another. They might be large

for some points and small for others.

2.4.1 GARCH modelling

The GARCH model takes into consideration the fat-tail phenomena and volatility

clustering. There are the main basic features of time series data and the main

interest in studying time series analysis. The GARCH model has been widely

implemented in the field of finance. It was widely applied in fields including risk

management, portfolio management, option pricing, and foreign exchange. We can

apply the GARCH model to examine the relationship between long and short term

interest rates, analyse time varying risk premiums, and model foreign exchange

Heteroskedasticity in Time Series 24

markets that incorporates fat tail behaviour. GARCH effects can also be shown

in fields like capital allocation and value at risk (VaR) [23–25, 32, 54, 56–58].

GARCH’s model widely used specification is that the prediction of the variance in

the next period is dependent on weighted average of the long-run average variance.

This variance was statistically captured by the most recently observed squared

residuals. Many GARCH estimations are currently available through commercial

software such as MAtlab, SAS or TSP. These types of software offer a straight

forward process when using GARCH applications. First, we need to define the

parameters. These include ω, α and β. Second, start calculating an estimate of the

variance based on the first observation. Subsequently, it will be easy to identify the

estimate for the second observation. The GARCH updated formula will calibrate

the following parameters: a. A weighted average of the unconditional variance,

b. The squared residual of the first observation, and c. The starting variance

and estimates of the variances of the second observation. This formula will be the

input in estimating the third variance and a long time series variance forecast will

be constructed. A positive relationship was denoted between the residuals and the

time series constructed. A symmetric way to modify the parameters ω, α and β

to obtain the best fit by applying the likelihood function [26, 28, 59].

The most basic GARCH (1,1) model can be represented as follows:

ht+1 = ω + α(rt −mt)2 + βht (2.7)

where rt represents the return on a portfolio or an asset, h is relative to the past

Heteroskedasticity in Time Series 25

information set and ω,α and β are constants estimated by the econometricians

[54]. This only works if α + β < 1 and requiring that α > 0, β > 0 and finally

ω > 0.

Although the main application of the model is to forecast just one period, it

turned out that based on one period a two period forecast. Therefore, long-horizon

forecasts can be created. Referring to Engle again, in his work he claimed that

for GARCH (1,1) the unconditional variance can be addressed to represent the

distant-horizon forecast. This condition can be achieved if α + β < 1 for all the

time periods. This will ultimately imply that GARCH models are conditionally

heteroskedastic and the mean is driven yet the variance is unconditionally constant.

The GARCH (1,1) model can be expanded to the comprehensive GARCH (p,q)

model. This is a model where additional lag terms are provided. They are useful

when a long-term time series data is investigated. For example, several years of

data.

Heteroskedasticity of errors has disappeared with the adoption of more advanced

statistical data models. GARCH specification of error terms allowed for a more

accurate volatility forecast, and overcame the fact that most familiar statistical

models exhibit some sort of conditional error heteroskedasticity.

The GARCH success in predicting volatility changes allowed for a wide range

of applications in the fields of economics and finance. Risk management, option

Heteroskedasticity in Time Series 26

pricing, and asset allocations are some of the popular applications in finance. Their

effects are important in areas like efficient capital allocation and management.

The GARCH model has been intensively investigated by a wide range of re-

searchers. Their goal was to improve its performance and expand the areas of

its applications. The literature has identified some of the basic work and models

in the field. The basic model was developed by Bollerslev in [55], who then in-

troduced the I-GARCH model. Taylor in [60] introduced the TS-GARCH model.

Engle and Ng [61] suggested the NA-GARCH and the V-GARCH. The E-GARCH

and the NGARCH were introduced by Higgins in [27]. Hentshel in [62] introduced

the H-GARCH and the Aug-GARCH was suggested by Duan in [58]. The H-

GARCH and the Aug-GARCH specifications are very flexible and both models

include some of the features of the other previously stated GARCH models, mak-

ing them more reliable and mature.

In conclusion, we can clearly state that GARCH has proven accuracy in forecast-

ing and modelling time-varying conditional variances. It takes into consideration

basic time series features such as volatility clustering and excess kurtosis. GARCH

models led also to a fundamental change in the econometric approaches introduced

before. They have been considered important in the eyes of academics and practi-

tioners simply because their ease of use in practice and their richness in addressing

theoretical problems many of which have not been yet solved.

Heteroskedasticity in Time Series 27

2.4.2 Restrictions on GARCH model

There are some limitations avoiding the wide application of GARCH in finance.

GARCH functions in its finest capability under moderately stable market con-

ditions. GARCH fails to capture asymmetrical phenomena in most cases such

as unexpected events. These events can lead to substantial change. Finally, the

heteroskedasticity explains some, but not all ,fat tails behaviour in the financial

series.

One of the recent studies in the field [32] was dedicated to study the limitations of

GARCH model, particularly in detecting excess kurtosis and persistence in volatil-

ity. This was also related to the fact that GARCH model, especially in the financial

applications, does not give the required attention to the behavioural changes in

the market such as crashes and financial crisis. This research investigated the

impact of the simultaneous changes in the GARCH model parameters on the re-

turn series and its volatility. The results of this study found that GARCH model

parameters have an impact on both the volatility and the dynamic structure of

the series. The results of the study also concluded that one can get more informa-

tion by investigating the changes in individual model parameters rather than the

collective investigation of the parameters as one unit. This can be done assuming

that both the volatility and excess kurtosis change enduringly. These changes had

permanent effects on the volatility at all times, but only a change in parameters

α and β has that permanent impact on the excess volatility of the series.

Heteroskedasticity in Time Series 28

2.4.3 Outliers in GARCH model

Outliers can be defined as observations that are numerically distant from the data.

They are defined as “those that deviate from other members of the sample” [63].

They are often considered an error or noise, though they might carry essential

information. Therefore, we should pay proper attention to investigating outliers.

They contain valuable information about data collection and recording for the

process under investigation at many instances [64].

There are lots of reasons behind the appearance of outliers. This might be due

to human error, changes in the system behaviour, sample contamination with el-

ements from outside the population, or as a natural deviation in the population.

Also, outliers may appear by chance, error in data transmission or population

transcription or as an indication of measurement error or heavy-tailed distribu-

tions. One should be careful when dealing with outliers and not to conflict them

with experimental errors.

Outliers might include sample minimum or sample maximum or both, but sample

maximum or minimum are not always accordingly reported as outliers. Another

notable point about outliers is that in larger sampling of data, some data will be

noticed as far away from the sample mean than the reasonably accepted range.

This could be a consequence of incidental systematic error or flaw in the theory

generating the sample. In this case, outliers can designate faulty data, or an

application is where a specific theory might be considered invalid. It was also

Heteroskedasticity in Time Series 29

denoted that in large sampling, a small number of outliers will be detected.

Outliers can be classified into univariate and multivariate outliers. Univariate are

the outliers that are detected within a single variable either visually by examining

the data with a frequency distribution or by using a Box plot to identify the

extreme and mild outliers. While the multivariate outliers are those that occur

within the joint combination of two or more variables.

There are numerous problems associated with outliers detection. Determining

whether an observation is an outlier is highly subjective. The second problem is

that outliers mask each other. The estimated standard deviation is affected by the

size of the outlier. The estimated standard deviation shrinks when a large outlier

is removed. It gets worse by increasing the complexity of the data. The outliers

become visible when some others are removed.

GARCH model has been applied in the area of outliers detection. Based on the

work of [65], the GARCH (1,1) model was applied to detect and correct outliers

before estimating the risk measures in financial markets such as minimum capital

risk requirements. Their study has proven successful in generating more accurate

minimum capital risk requirements (MCCRs).

Heteroskedasticity in Time Series 30

2.5 Heteroskedasticity Challenges and Tests

The main assumption of the least square model is homoskedasticity. While het-

eroskedasticity is when the variances of the error terms of the data under investi-

gation are not equal. The error terms are expected to be larger from some point

to another. In this case, not only the deficiencies of the least square models are

corrected, but also a prediction is computed for the variance of each error term.

Causes of heteroskedasticity can include the following: 1. Model misspecification.

2. As the value of the independent variable increases, the errors may increase. 3.

As the values of the independent variable become more extreme in either directions,

errors increase. 4. Measurement errors can cause heteroskedasticity to appear in

the data being examined.

The consequences of heteroskedasticity of the data can be illustrated as follows:

1. Standard errors are biased when heteroskedasticity is present, which may lead

to biased test statistics and confidence interval. 2. In logistics regression, het-

eroskedasticity can produce biased and misleading parameter estimates.

In 1982, Engle presented two types of heteroskedasticity. The first type was when

the regression errors were heteroskedastic. This is because they rely in their com-

putation on the value of the independent variables. In this case, the average error

was positively related to the independent variable size.

Heteroskedasticity in Time Series 31

The second type represents the case in which the error terms vary with time

rather than the value of the process variables. Based on the literature in [59,

66], heteroskedasticity of error terms has not completely disappeared with the

acceptance of sophisticated financial and statistical model variables. There was no

evidence of a naturally developed process to model conditional heteroskedasticity.

Various attempts aim to identify different features of the data that seem relatively

important to the analysis. This relative importance and relevance of the data

makes it difficult to develop much more sophisticated and widely applicable model

that fits all the available data sets.

2.5.1 Heteroskedasticity Detection

Heteroskedasticity detection is representing an important issue when dealing with

Least squares estimation and linear regression models [67], particularly when het-

eroskedasticity is present in the error terms. This was proven to be influenced by

the sample size. In small samples, a valid test for heteroskedasticity represents a

challenge for analysts to obtain.

Visual detection of heteroskedasticity was first introduced by inspecting the resid-

uals plotted against the fitted values or by plotting the independent variable sus-

pected to be correlated with the variance of the error terms. If the residuals appear

to be roughly the same size for all values of X, or in small samples slightly larger

than the mean of X, then it is safe to assume that heteroskedasticity is not severe

enough to raise concern. On the other hand, if the plot shows uneven presentation

Heteroskedasticity in Time Series 32

of residuals, so that the width is larger for some values of X than for others, then

a heteroskedasticity test should be conducted.

Goldfeld and Quandt [68] have offered great contributions in tackling this prob-

lem. Their work was then modified by Phillips and Harvey in 1974 [67], based on

maintaining the residuals of the least square regression model. As for larger sam-

ples, Glejser [69], Park [70], and Rutemiller and Bowers [71] introduced a number

of tests to estimate certain forms of heteroskedasticity in larger sample sizes.

The starting point when studying heteroskedasticity was assuming the general

linear model

Y = (Xβ) + u (2.8)

where Y is an n*1 vector of observations on the dependent variable, X is an n ∗ k

matrix of observations on the K independent variables, and β is a K ∗ 1 vector

of regression coefficients, and finally u is an n ∗ 1 vector of stochastic disturbance

terms.

Based on the work first introduced by Harvey and Phillips in [67], heteroskedas-

ticity tests address two simple forms of heteroskedasticity

σ2j = σ2X2

ji (2.9)

and

Heteroskedasticity in Time Series 33

σ2j = σ2Xji (2.10)

We classify heteroskedasticity tests as follows:

2.5.2 The Goldfeld-Quandt test

The Goldefeld-Quandt test is categorised as a test for the discrete changes in

variances. In this test, the main assumption is that observations are in an as-

cending order based on increasing values of σ2j [68]. This test is concerned with

homoskedasticity in the regression analysis.It operates by comparing the variances

of error terms across discrete subgroups. Under homoskedasticity, all subgroups

should have the same estimated variances. The Goldfeld-Quandt test can be ex-

plained in the following equation:

ei = Y1 −X1b1 (2.11)

where b1 is a least squares estimate of β. It has a X2 distribution with

l − k = m (2.12)

where m represents the degrees of freedom under the null hypothesis.

This test offers a simple diagnosis for heteroskedastic errors in both univariate and

multivariate regression models, but it does not show robustness to specific types of

Heteroskedasticity in Time Series 34

errors. It cannot distinguish between heteroskedastic errors and the specification

problems as incorrect functional form. It was also noted that omitting central

observations may lead to a reduction in the degrees of freedom, which may lead

to inaccurate results of the test as well.

2.5.3 Ordinary likelihood ratio test

The ordinary likelihood ratio test can be represented as follows:

LT = −2{Lp(Y ; δ0)− L− p(Y ; δ)

}(2.13)

.

Rutemiller, Bowers and Harvey [71, 72] have derived this statistical test by apply-

ing specific weighting functions. Basically, their work was based on x2 distribution

with q degrees of freedom.

2.5.4 Conditional likelihood ratio test

First introduced by Honda [73] as a modification of Cox and Hinkley’s [74] work

in order to tackle the problems of applying the profile likelihood ratio test. The

results of this test can be summarised in failing to perform in small number of

noisy parameters. The conditional likelihood ratio test function can be illustrated

as following:

CLT = −2{CLp(Y ; δ0)− CLp(Y ; δ)

}(2.14)

.

Heteroskedasticity in Time Series 35

2.5.5 Modified likelihood ratio test

There is a lack of orthogonality between the parameter of interest δ and σ2, which

represents the nuisance parameter in the conditional likelihood ratio test. This

test was introduced by Simonoff and Tsai in 1994 [75] to solve these problems.

Their work was influenced by Cox and Reid’s work in 1987 [76].

In practical applications, orthogonal parameters are not that easy to handle or to

obtain. In 1993, Cox and Reid modified their model, in the form of an adjustment

for which explicit orthogonalization of the parameters is not required. In this case,

The resulting modified likelihood ratio test can be illustrated as follows:

ALT = CLT + 2(δ − δ0)Σni=1wi0/n (2.15)

.

evaluated at δ0 for i=1...,n

where

˙wi0 = ∂wi/∂δ (2.16)

The ALT was proven successful for the exponential weighting function, regardless

of whether or not the orthogonal parameter transformation is available.

Heteroskedasticity in Time Series 36

2.5.6 Residual likelihood ratio test

Verbyla 1993 [77] claimed that if the scale and the weighting parameters were

treated as the parameters of interest, the residual likelihood function is the same as

the conditional profile likelihood function, given the maximum likelihood estimates

of θ. The resulting function can be represented as follows:

RLp(Y ; σ2, δ) = RLp(Y |θ2σ; σ2, δ) = CLp(Y ; δ)− logσ2δ = MLp(Y ; δ)− logγδ

(2.17)

Given the fact that we are using the scale parameter as the parameter of interest,

the equation of the residual likelihood ratio test can be modified as follows:

RLT = CLT + 2(logσ20 − logσ2) (2.18)

We can take this equation one step further as following, when RLT is greater than

MLT:

RLT = MLT + 2(logγ0 − logγ). (2.19)

.

2.5.7 Breusch-Pagan Test

Breusch-Pagan test aims to detect for the conditional heteroskedasticity in a linear

regression model [78]. It tests for the continuous changes in variances. It measures

the dependency of the estimated variance of the residuals from the regression on

Heteroskedasticity in Time Series 37

the values of the independent variables. It tests the null hypothesis of all error

variances are all equal versus the alternative which is that error variances are a

multiplicative function of one or more variables. The bigger the predicted value

of y, the bigger the error variance. It is formulated as follows:

y = β0 + β1x+ u (2.20)

The estimated mean u, also known as the residual, is then estimated with a mean

equal to zero based on the ordinary least squares method (OLS).

u2 = β0 + β1x+ v (2.21)

.

The Breusch-Pagan test examines nR2 with K degree of freedom where n is the

number of observations, R is the regression of the squared residuals from the

original regression of the independent variables and k is the number of independent

variables. If the test shows that there is a joint significant dependency between the

dependent and the independent variables, then we can reject the null hypothesis

of homoskedasticity. When implementing the test, the examiner selects which

variables to include in the auxiliary equation as a judgment call. In this case, a

poor judgment can lead to poor test.

Heteroskedasticity in Time Series 38

The Breusch-Pagan test failed to work well for non-linear forms of heteroskedas-

ticity. For example, when the error variances get larger as x gets more extreme

in either directions. Also, it has problems when the error terms are not normally

distributed.

2.5.8 White Test

The White test, named after its founder Halbert White [79], is a direct test of

heteroskedasticity. It is a special case of Breusch-Pagan test. It solves the prob-

lems regarding the execution of the Breusch-Pagan and it is more general. It

adds a lot of terms to test for more types of heteroskedasticity. For example, by

adding the squares of regressors, it detects the non-linearities such as an hour-glass

shape. It operates by assuming that there is no prior knowledge of the existence

of heteroskedasticity in the sample being examined. This test can be illustrated

as follows:

Yt = β0 + β1Xt1 + β2Xt2 + ut (2.22)

σ2t = α0 + α1Xt1 + α2Xt2 + · · · (2.23)

The White test operates by accepting or rejecting the null hypothesis of the data

being of equal variances. If the null hypothesis is not rejected, we can say that the

residuals in the examined sample are homoskedastic. Alternatively, if we reject the

Heteroskedasticity in Time Series 39

null hypothesis, the residuals are recognised as heteroskedastic. In order to accept

or reject the homoskedasticity hypothesis, we need to calculate nR2 where n is

the size of the sample and the R is the unadjusted R-squared from the auxiliary

regression of μt as the dependent variable against the constants Xt1, Xt2.

2.5.9 Levene’s Test

Levene’s test is a statistical test that assesses the homogeneity of variances in dif-

ferent samples. It uses the median instead of the mean in order to provide a good

robustness against many types of outliers and non-normal data [80]. It depends

in its examination on a null-hypothesis claiming that the population variances are

equal. If the resulting value is less than the critical value (usually 0.05), the null

hypothesis will be rejected and this will show that there is a difference between

variances in the examined population and hence proves heteroskedasticity. Lev-

ene’s test does not require a normality in the underlying data. The test evaluates

the following:

W =(N −K)

∑Ki=1 Ni(Zi. − Z..)

2

(K − 1)∑Ni

j=1(Zij − Zi)2(2.24)

Z·· =1

N

k∑i=1

Ni∑j=1

Zij (2.25)

Zi· =1

Ni

Ni∑j=1

Zij (2.26)

Heteroskedasticity in Time Series 40

where K is the number of different groups of the samples, N is the total number

of samples, Ni is the number of samples in the ith group, Yij is the value of the jth

sample from ith group and Zij is defined as follows:

Zij =

{|Yij − Yi·|, Yi· is a mean of ith group

|Yij − Yi·|, Yi· is a median of ith group(2.27)

2.5.10 Spearman’s Rank Correlation Coefficient

Spearman’s Rank correlation coefficient is a non-parametric measure. It measures

the statistical dependence between two variables, X representing the independent

variable and Y which is the dependent variable, using a monotonic function. It is

represented by the value ρ which is calculated as follows:

ρ = 1− 6∑

d2in(n2 − 1)

(2.28)

where di = xi − yi is the difference between the rankings xi and yi of random

variables X and Y , respectively.

The sign of ρ indicates the direction of the relation between X and Y . A positive

coefficient means that Y increases as X increases. On the other hand, a zero

coefficient indicates that there is no correlation between both variables. This is

applied for detecting the existence of heteroskedasticity by simply assuming that

the data can be fitted to a piece-wise linear model [48]. The Spearman’s correlation

coefficient is sometimes regarded as an alternative to Pearson’s coefficient because

it performs perfectly when the variables X and Y have a linear trend.

Heteroskedasticity in Time Series 41

2.5.11 Heteroskedasticity challenges

This section will address the most common ways that have been utilised for the

past periods when dealing with heteroskedasticity. It will be illustrated as follows:

1. Respecify the model or transform the variable: Sometimes, some important

variables are left out of the model. If the reason for heteroskedasticity was found

to be the misspecification of the model, then by checking for heteroskedasticity we

might be able to identify the model specification problems.

2. Use robust standard errors: when heteroskedasticity exists, robust standard

errors tend to be more trustworthy. It addresses the problem of errors that are

not independent and identically distributed. It will not change the coefficient

estimates provided by the Ordinary Least Squares (OLS), but it will change the

standard errors and significance tests results. As for outliers, we can use robust

regression by using a weighing scheme that causes outliers to have less impact on

the estimates of regression coefficients. In this case, robust regression will produce

different coefficient estimates than the ordinary least squares.

3. Use weighted least squares: generalised least squares is a technique that will

always result in estimators that are Best Linear Unbiased Estimator (BLUE) when

either heteroskedasticity or serial correlations are present. Ordinary Least Squares

works by selecting coefficients that minimise the sum of squared regression resid-

uals

∑(Yj − Yj)

2 (2.29)

Heteroskedasticity in Time Series 42

In the case of heteroskedasticity, observations expected to have error terms with

large variances are given smaller weights than observations thought to have error

terms with small variances. In other terms, the smaller the error variances, the

more heavily the case is weighted. The observations with the smallest error vari-

ances should give the best information about the position of the true regression

line. Yet still, the OLS is unbiased but inefficient. It does not have the smallest

possible variances, but its variances may be acceptable when still unbiased.

2.6 Conclusions

In this chapter, we attempted to identify time series analysis from two different

perspectives: time series data sets and different types of time series analysis. Each

classification is different in addressing a certain aspect in the characteristics of the

data sets being examined and the analysis being carried out. The ARMA-ARIMA

models were discussed in terms of model specifications and applications to the

time series examination. The ARCH and GARCH models were explained from

both historical and application point of views. The GARCH model was widely

explored to examine how it works and the restrictions on its application. The

outliers in the GARCH model were defined and the role of the GARCH model in

outliers detection was identified.

Finally, we discussed the heteroskedasticity challenges facing the statistical analyst

and the approaches that have been utilised to address them. This brought us

Heteroskedasticity in Time Series 43

to the conclusion that a new model or approach can be introduced to face the

heteroskedasticity challenges and enable for a more reliable outcome.

Chapter 3

Critical Review of the Literature

ARCH and GARCH models are playing important roles in the analysis of time

series data. The importance of the ARCH and GARCH models is highlighted

when the study aims to analyze and forecast volatility, particularly in the financial

applications.

This section aims to provide a basic comparative study between two of the core

models addressing heteroskedasticity, the ARCH and GARCH models. It aims

to define the strengths and weaknesses of each model, the statistical modelling

behind each and finally the researchers’ contributions in each model. This will

explain why researchers prefer one model to the other, and explains why the they

are still interested in understanding the statistics behind each of them.

3.1 ARCH model

Engle [19] was the first to introduce the ARCH model. ARCH stands for Au-

toregressive conditional heteroskedasticity. Autoregressive indicates a feedback

44

Critical Review 45

mechanism that incorporates past and present observations. While conditional

means that the variance relies basically on the immediate past. ARCH is based

on the concept that the unconditional variance is left constant while allowing the

conditional variance to change. This change is applied over a period of time and

is represented as a function of past errors. This process is characterised by hav-

ing a zero mean and serially uncorrelated with non-constant variances, having

these variances conditional on the past error terms, but they have constant un-

conditional variances. ARCH model has proven success, especially in identifying

and analysing major sophisticated economic phenomena. In Engle [20] and Engle

and Kraft [2], applications on economic inflation were discussed. These models

recognised the volatility of inflation. With a known zero mean, the model can be

represented as follows:

Yt = εtXt−1 (3.1)

where Yt is the dependent variable and Xt is the vector of the exogenous variables.

As the ARCH model is applied when heteroskedasticity is an issue in the time

series, one should pay attention to the variances when dealing with the data on

hand. The variance was proven to show slow variation over a time span, making

the assumption of equal weights intolerable. In other words, the events that are

more recent are given higher weights as they are more relevant. Then, based on

the work of Engle in 1982, these weights were considered the equation parameters

Critical Review 46

to be estimated. This allowed the ARCH model to be capable of detecting the

best weights for the data under investigation, allowing for the forecasting process

to take place [19] and [54].

The ARCH regression model is generated by assuming the mean of Yt is given as

Xtβ. Then, the regression model can be finally represented as follows:

εt = Yt −Xtβ (3.2)

The ARCH regression model has a range of features, making it more plausible

for: 1. Econometric applications: Econometric forecasters are capable of forecast-

ing and predicting the future variation in the data from one period to another.

ARCH emphasises on the fact that future forecast varies over time and rely on the

past errors in their prediction. 2. Monetary theory and theory of finance. The

portfolios of financial assets depend on their estimation on the rates of returns.

The variations in the expected means and variances on such returns consequently

affect the prediction of such asset prices. ARCH has shown access in handling

the uncertainties of such equities by studying the nature of the variances within

the constrained regression model. 3. Forecasting is conditionally deterministic.

It does not leave any uncertainty in the process of expecting the squared error in

time t when compared to past error terms.

We can claim that the ARCH model analyses the effect of omitting the vari-

ables from the estimated model. ARCH literature made a major breakthrough in

predicting the volatility of economic time series, making it possible to analyse the

Critical Review 47

apparent changes in volatility, and identifying the impact of non-linear dependence

of the parameters in the economic model.

Applying ARCH models is a straight forward process. They handle the collective

errors. They take of non-linearities and finally, they adapt to the change in the

personal capabilities of the economic forecaster. From the statistical point of

view, the ARCH models may be considered as some specific non-linear time series

models. This will allow for a quite exhausting studying of the underlying dynamics.

On the other hand, ARCH model failed to capture irregular phenomena especially

in the financial and econometric sector such as crashes, mergers, news effects or

threshold effects. We can claim that ARCH models are not offering the flexibility

required for capturing the persistence in volatility [22].

Finally, we can summarise the strategic features of ARCH models as follows:

1. ARCH models are playing a very important role in identifying the stochastic

nature of the error terms, for both linear and non-linear econometric models.

2. ARCH models allow for the prediction of the average size of error terms when

fitted to empirical data.

Although ARCH model showed some great capabilities handling the stochastic

features and nature of time series data, there was a great need to generalise this

model to enhance its performance. This consequently led to the introduction of

the generalised ARCH model known as GARCH.

Critical Review 48

3.2 GARCH model

GARCH is considered a useful generalisation of ARCH model. It was first in-

troduced by Bollerslev in 1986 [55]. GARCH model aims to help with the study

of past variances in order to explain future ones. GARCH model explains the

dependence of the time series model on volatility. Generalising Engle’s ARCH

technique, GARCH model included both the autoregressive (AR) as well as mov-

ing average (MA) terms. It has the advantage of using fewer parameters. This

will subsequently increase its computational efficiency.

GARCH follows the understanding that all past errors contribute to forecast

volatility. This is to be considered the most general case. GARCH model has

proven success in the area of predicting conditional variance. GARCH was able

to capture the main features of the series under investigation by describing the

conditional variance. At the same time, it is simple enough to allow for a much

investigative study of the available solutions.

The main application of AGRCH by definition is when the series is heteroskedastic;

its variance is noticed of being changing over time. The error terms are practically

expected to change from one point to another. They might be large for some

points and small for others.

Critical Review 49

3.2.1 GARCH modelling

GARCH model takes into consideration the fat tail phenomena and volatility clus-

tering. There are the main basic features of time series data and the main interest

in studying time series analysis. GARCH model has been widely implemented in

the field of finance. It was widely applied in fields like risk management, portfolio

management, option pricing, and foreign exchange. We can apply GARCH model

to examine the relationship between long and short term interest rates, analyse

time varying risk premiums, and model foreign exchange markets that incorpo-

rates fat tail behaviour. GARCH effects can also be shown in fields like capital

allocation and value at risk (VaR).

GARCH model widely used specification is that the prediction of the variance in

the next period is dependent on weighted average of the long-run average variance.

This variance was statistically captured by the most recently observed squared

residuals. Many of the GARCH estimations are currently available through com-

mercial software such as Matlab, SAS or TSP. These types of software offer a

straight forward process when applying GARCH applications. First, we need to

define the parameters. These include ω, α and β. Second, start calculating an

estimate of the variance based on the first observation. Subsequently, it will be

easy to identify the estimate for the second observation. The GARCH updated

formula will calibrate the following parameters: a. A weighted average of the

unconditional variance, b. The squared residual of the first observation, and c.

The starting variance and estimates of the variances of the second observation.

Critical Review 50

This formula will be the input in estimating the third variance and a long time

series variance forecast will be constructed. A positive relationship was denoted

between the residuals and the time series constructed. A symmetric way to mod-

ify the parameters ω, α and β to obtain the best fit by applying the likelihood

function[56].

The most basic GARCH (1,1) model can be represented as follows:

ht+1 = ω + α(rt −mt)2 + βht (3.3)

where rt represents the return on a portfolio or an asset, h is relative to the past

information set and ω,α and β are constants estimated by the econometricians

[54]. This only works if α + β < 1 and requiring that α > 0, β > 0 and finally

ω > 0.

Although the main application of the model is to forecast just one period, it

turned out that based on one period, a two period forecast can be developed.

Therefore, long-horizon forecasts can be created. Referring to Engle again, in

his work he claimed that for GARCH (1,1) the unconditional variance can be

addressed to represent the distant-horizon forecast. This condition can be achieved

if α + β < 1 for all the time periods. This will ultimately imply that GARCH

models are conditionally heteroskedastic and the mean is driven yet the variance

is unconditionally constant.

The GARCH (1,1) model can be expanded to the comprehensive GARCH (p,q)

Critical Review 51

model. This is a model where additional lag terms are provided. They are useful

when a long-term time series data is investigated. For example, several years of

data.

Heteroskedasticity of errors has disappeared with the adoption of more advanced

statistical data models. GARCH specification of error terms allowed for a more

accurate volatility forecast, and overcame the fact that most familiar statistical

models exhibit some sort of conditional error heteroskedasticity.

The GARCH success in predicting volatility changes allowed for a wide range

of applications in the fields of economics and finance. Risk management, option

pricing, and asset allocations are some of the popular applications in finance. Their

effects are important in areas like efficient capital allocation and management.

The GARCH model has been intensively investigated by a wide range of re-

searchers. Their goal was to improve its performance and expand the areas of

its applications. The literature has identified some of the basic work and models

in the field. The basic model was developed by Bollerslev in [55], who then in-

troduced the I-GARCH model. Taylor in [60] introduced the TS-GARCH model.

Engle and Ng [61] suggested the NA-GARCH and the V-GARCH. The E-GARCH

and the NGARCH were introduced by Higgins in [27]. Hentshel in [62] introduced

the H-GARCH and the Aug-GARCH was suggested by Duan in [58]. The H-

GARCH and the Aug-GARCH specifications are very flexible and both models

include some of the features of the other previously stated GARCH models, mak-

ing them more reliable and mature.

Critical Review 52

In conclusion, we can clearly state that GARCH has proven accuracy in forecast-

ing and modelling time-varying conditional variances. It takes into considerations

basic time series features such as volatility clustering and excess kurtosis. GARCH

models led also to a fundamental change in the econometric approaches introduced

before. They have been considered important in the eyes of academics and practi-

tioners simply because their ease of use in practice and their richness in theoretical

problems many of which has not been yet solved.

3.2.2 Restrictions on GARCH model

There are some limitations avoiding the wide application of GARCH in finance.

GARCH functions in its finest capability under moderately stable market con-

ditions. GARCH fails to capture asymmetrical phenomena in most cases such

as unexpected events. These events can lead to substantial change. Finally, the

heteroskedasticity explains some but not all of fat tails behaviour in the financial

series.

One of the recent studies in the field [32] was dedicated to study the limitations

of GARCH model, particularly in detecting excess kurtosis and persistence in

volatility. This was also related to the fact that GARCH model, especially in

the financial applications, does not give the required attention to the behavioural

changes in the market such as crashes and financial crisis. They investigated

the impact of the simultaneous changes in the GARCH model parameters on the

return series and its volatility. They found that they have an impact on both

Critical Review 53

the volatility and the dynamic structure of the series. They also concluded that

one can get more information by investigating the changes in individual model

parameters rather than the collective investigation of the parameters as one unit.

This can be done assuming that both the volatility and excess kurtosis change

enduringly. These changes had permanent effects on the volatility at all times,

but only a change in parameters α and β has that permanent impact on the

excess volatility of the series.

3.2.3 Outliers in GARCH modeling

Outliers can be defined as observations that are numerically distant from the data.

They are defined as “Those that deviate from other members of the sample” [63].

They are often considered an error or noise, though they might carry essential

information. Therefore, we should pay proper attention to investigating outliers.

They contain valuable information about data collection and recording for the

process under investigation at many instances [50, 64].

There are lots of reasons behind the appearance of outliers. This might be due

to human error, changes in the system behaviour, sample contamination with el-

ements from outside the population, or as a natural deviation in the population.

Also, outliers may appear by chance, error in data transmission or population

transcription or as an indication of measurement error or heavy-tailed distribu-

tions. One should be careful when dealing with outliers and not to conflict them

with experimental errors.

Critical Review 54

Outliers might include sample minimum or sample maximum or both, but sample

maximum or minimum are not always accordingly reported as outliers. Another

remark about outliers is that in larger sampling of data, some data will be noticed

as far away from the sample mean than the reasonably accepted range. This could

be a consequence of incidental systematic error or flaw in the theory generating

the sample. In this case, outliers can designate faulty data, or an application is

where a specific theory might be considered invalid. It was also denoted that in

large sampling, a small number of outliers will be detected [65].

Outliers can be classified into univariate and multivariate outliers. Univariate are

the outliers that are detected within a single variable either visually by examining

the data with a frequency distribution or by using a Box plot to identify the

extreme and mild outliers. While the multivariate outliers are those that occur

within the joint combination of two or more variables.

There are lots of problems associated with outliers detection. Defining whether an

observation is an outlier is very subjective. The second problem is that outliers

mask each other. The estimated standard deviation is affected by the size of the

outlier. The estimated standard deviation shrinks when a large outlier is removed.

It gets worse by increasing the complexity of the data. The outliers become visible

when some others are removed.

The GARCH model has been applied in the area of outliers detection. Based on

the work of [65], the GARCH (1,1) modelling was applied to detect and correct

outliers before estimating the risk measures in financial markets such as minimum

Critical Review 55

capital risk requirements. Their study has proven successful in generating more

accurate minimum capital risk requirements (MCCRs).

3.3 Prediction Interval Modelling

Statistically speaking, a prediction interval is defined as an estimate interval in

which the future observations will fall, within a certain probability. It depends

in its mechanism on the already observed data, or what we call the past observa-

tions. It predicts the distribution of the individual future points rather than the

true population mean or other quantities that can not be observed in this matter.

These later information can be detected and analysed by applying confidence in-

tervals. Confidence intervals take into consideration the unobservable population

parameters [22].

Many recent papers utilise prediction intervals to forecast the future values of

time series under examination. In [52], Pellegrini has been studying the impact

of heteroskedastic time series with stochastic trends on the prediction intervals.

They applied their study on a conditionally heteroskedastic model with a level of

stochasticity. In their research, they applied ARIMA-GARCH model to generate

the prediction intervals. They claimed that the lengths of the prediction intervals

will vary depending on whether the conditional heteroskedasticity affects the long-

run or short-run component.

However, their study was based on a moderate sample size. It showed that if the

source of heteroskedasticity was well explained, the uncertainty of the parameter

Critical Review 56

estimation will not have an influence on the construction of prediction intervals.

Yet, they did not succeed in establishing a mechanism to identify how to eliminate

the impact of heteroskedasticity when establishing prediction intervals for large-

sized financial data.

The problems associated with prediction intervals and their impacts have been

addressed by Baillie and Bollerslev in [81]. Their research focused on defining a

prediction for the variance in GARCH (p,q) model. Their work resulted in an

assumption of that by increasing the forecast horizon, the available information

become less significant and the optimal forecast converts to the unconditional

variance. With respect to the integrated GARCH (1,1) or IGARCH(1,1) models,

having α1+β1=1, the present piece of information becomes more relevant and

important for building up forecasts.

There were attempts to construct prediction intervals in [52]. Their study was

based on the ARIMA models for constructing a prediction interval. Their results

showed that due to the presence of unit root in this model, the prediction intervals

based on the ARIMA model always depend on excess volatility. Whether the

excess volatility is positive or negative determines if the prediction interval created

is either wide or narrow when compared to intervals built up by other competing

models. In that particular study, the opposing was the unobserved component

model. Neural network was also a key player in constructing prediction intervals.

In [82–85], leading techniques such as the Bayesian, the delta, the bootstrap models

were reviewed in terms of their contribution in constructing a prediction model for

Critical Review 57

future forecasting. Problems of uncertainties were also discussed and accounted

for to build the optimal neural-network based framework.

3.4 Conclusions

The ARCH regression model is an approximation to a more complex regression

which has non-ARCH disturbances. The ARCH specification might be by picking

up the effect of variables omitted from the estimated model.

A major contribution of the ARCH literature is the finding that apparent changes

in the volatility of economic time series may be predictable and result from a

specific type of non-linear dependence rather than exogenous structural changes

in variables. ARCH models are simple and easy to handle.They take care of

clustered errors, take care of non-linearities and can also take care of changes in

the econometrician’s ability to apply the forecasting models and techniques.

On the other hand, there are limitations and assumptions on the underlying ARCH

models. ARCH models assume a rather stable environment and fails to capture

irregular phenomena such as crashes, mergers , news effects or threshold effects

when studying irregularities such as financial markets applications.[60]

GARCH model is used in time series that exhibit time-varying volatility periods

of swings followed by periods of relative calm.

A wider exploration for ARCH and GARCHmodels was introduced in this chapter.

A historical and critical investigation of both ARCH and GARCH model was

Critical Review 58

implemented. The mechanism in which both ARCH and GARCH models operate

was illustrated, with the restrictions of the GARCH model being discussed.

Outliers detection and its impact on GARCH modelling was explained. The

GARCH model addresses the problem of heteroskedasticity and volatility clus-

tering of the time series by solving the outliers in the data and errors in the

outcome [21].

Chapter 4

SoLVI: Slope of Local VarianceIndex

Heteroskedasticity, by definition, is the phenomenon of having time varying vari-

ance in a time series [19, 55, 67]. As discussed in Chapter 2, measuring het-

eroskedasticity relies on estimating the change in variance relative to time. In

order to do that, a function should be derived to estimate local variance at a

certain time t within the time series y(t). This chapter starts with deriving equa-

tions for local estimation of mean μy(t) and variance σ2y(t) statistical parameters

of a time series y(t). Then two heteroskedasticity quantifying methods will be

presented. The first method is called heteroskedasticity variance index (HVI) re-

lies on estimating variance of local variances [35]. The second method, Slope of

Local Variance Index (SoLVI), evaluates the change in first derivative of the local

variance function ddtσ2y(t) [35].

59

SoLVI: Slope of Local Variance Index 60

4.1 Local Estimation of Statistical Parameters

Local estimation of statistical parameters is a well known technique in image

and signal processing [86–88]. The estimation is obtained by moving a kernel

window W of size N samples on the values of the time series y(t) and perform the

estimation procedure. The most common locally estimated statistical parameters

are mean μy(t) and variance σ2y(t).

4.1.1 Estimation of Local Average

Calculating local average is fairly simple. Let y(t) be the time series and μNy (t)

be the average of first N samples of y(t) at time t. The local average at time t is

then defined as

μNy (t) =

1

N

N∑i=t

y(i) (4.1)

This step is then repeated at subsequent samples yt+1, yt+2, ...yM , where M is the

total number of samples. In order to simplify and generalise the local calculation

procedure, signal processing convolution operator can be employed here. The

convolution operator ∗ : RM × RN → R

M+N−1 takes a time series yt as an input

and an N sized kernel h and is formulated as

(y ∗ h) (t) =∞∑

i=−∞y(t− i) · h(i) (4.2)

SoLVI: Slope of Local Variance Index 61

This formulation allows mathematicians to model different local operations on a

time series by simply changing the kernel h. The equations can then be further

simplified by using matrices mathematical structures by modelling y(t)|t=0..∞ as an

infinite vector yt = [y0, y1, ..., y∞], a time limited time series y(t)|t=0..M as a finite

vector yt = [y0, y1, ..., yM ] and an N sized kernel hNt = [h0, h1, ..., hN ]. Equation 4.2

can then be rewritten as

(y ∗ h) (t) =M−1∑i=0

y(t− i) · h(i) (4.3)

According to the formulation in eq. 4.3, an N sized averaging kernel is then for-

mulated as a 1×N vector

hNμ =

[1

N,1

N,1

N,1

N,1

N...,

1

N

](4.4)

or simply

hNμ =

1

N[1, 1, 1, 1, 1..., 1] (4.5)

Finally, the local average estimation equation can be summarised as

μNy (t) =

(y ∗ hN

μ

)(t) (4.6)

where hNμ = 1

N[1, 1, 1, 1, 1..., 1] is the N sized averaging kernel.

SoLVI: Slope of Local Variance Index 62

4.1.2 Generalisation with Gaussian Filters

In order to generalise the averaging kernel, a Gaussian formulation was used. Gaus-

sian generalisation replaces the averaging kernel hNμ with a zero mean Gaussian

kernel as follows:

gN,σμ (t) =

1

σ√2π

et2

2σ2 (4.7)

where gN,σμ is an averaging kernel with a standard deviation σ that gives more

weight to the sample yN2

and less weight to samples where yN2±i|i={1,..., 1

N} and

accordingly hNμ = gN,∞

μ .

4.1.3 Estimation of Local Variance

Locally estimated variance of a time series y(t) is a signal processing technique that

derives a time-based function of local variances σ2(t). It estimates the variance at

every spatial sample. The local variance signal is calculated using the expected

value notation V ar(X) = E(X2) − E(X)2. The expected value is calculated via

convoluting the y(t) with a moving average filter hN . Let hw be a 1×N average

linear shift invariant filter, the local average is then calculated as follows:

μNy (t) = hN(t) ∗ y(t) (4.8)

μNy2(t) = hN(t) ∗ [y(t)]2 (4.9)

SoLVI: Slope of Local Variance Index 63

where y(t)2 = y(t) · y(t) is calculated using the matrix Hadamard product. The

local variance function σ2y(t|w) is then calculated as follows:

σ2N

y (t) = μNy2(t)−

[μNy (t)

]2(4.10)

where[μNy (t)

]2= μN

y (t) · μNy (t) is calculated using the matrix Hadamard product.

In order to generalise the equations, the classic average filter hw can be replaced

by a Gaussian filter gNσ where σ controls the width of the Gaussian bell shape

and adjusts the weights for every spatial sample. The generalised local variance

function σ2N

y (t) with an averaging window is then defined as follows:

σ2N

y (t) = gNσ (t) ∗ y(t)2 − [gNσ (t) ∗ y(t)]2 (4.11)

Figure 4.1 illustrates local variance function overlaid on a homoskedastic (top) and

heteroskedastic (bottom) time series.

4.1.4 ARMA Filter

Selecting the proper size N of the convolution kernel is challenging. It relies on

the data size, the variability of data and the variance σ2 of Gaussian distribution

used in preparing the weighted average filter gNσ . On one hand minimising the

kernel size moves the assessment of the local variances towards simple time series

sampling and derives a zero-valued local variance as demonstrated in Figure 4.2.

On the other hand, maximising N yields a smoother local mean and variance

SoLVI: Slope of Local Variance Index 64

0 2000 4000 6000 8000 10000 12000

-20

2

samples t

y(t)an

dσy(t)

y(t)σy(t)

0 2000 4000 6000 8000 10000 12000

-200

-100

010

020

0

samples t

y(t)an

dσy(t)

y(t)σy(t)

Figure 4.1: Local variances of homoskedastic (left) and heteroskedastic (right) sam-ples.

signals which, in return, yields an unrealistic estimation of the local variances and

the quantified heteroskedasticity in general as shown in Figure 4.2.

SoLVI: Slope of Local Variance Index 65

0 2000 4000 6000 8000 10000 12000

020

4060

8010

012

0

samples t

σy(t)

w = 5w = 100

Figure 4.2: Local variances of a heteroskedastic time series using convolution.

Another alternative would be using autoregressive moving average (ARMA) filter.

ARMA filter performs autoregression on samples in the kernel size w to estimate

a trajectory of the growth of the function.

While ARMA driven filters provide a stable trajectory estimate, it lacks in esti-

mating the actual local variances at a much smaller kernel sizes as illustrated in

Figure 4.3. Figure 4.4 demonstrates the stability comparison between ARMA and

classic convolution local variance filters.

SoLVI: Slope of Local Variance Index 66

0 2000 4000 6000 8000 10000 12000

010

20

30

4050

60

samples t

σy(t)

w = 5w = 100

Figure 4.3: Local variances of a heteroskedastic time series using autoregression.wcontrols the autoregression moving average (ARMA) window size. ARMA is used to

obtain the mean instead of arithmetic mean applied by filtering the time series.

4.2 HVI: Heteroskedasticity Variance Index

Statistical heteroskedasticity tests focus mainly on accepting or rejecting a null

hypothesis [35, 79, 80, 89]. However, these tests do not state how far the time series

is from being homoskedastic or how heteroskedastic it is. This section presents the

heteroskedasticity variance index (HVI) that measures how variant are the local

variances in the examined data. HVI consists of two major steps. First, local

variances σ2y(t) of the time series y(t) are estimated using convolution operator

SoLVI: Slope of Local Variance Index 67

0 1000 2000 3000 4000 5000 6000

010

0020

0030

0040

0050

00

sample t

localvariance

σ2

ConvolutionARMA

Figure 4.4: Local variance comparison using convolution and autoregressive filtering.

and the expected value formulation V ar(X) = E (X2) − E(X)2 as illustrated

in eq. 4.10. Second, a variance of local variances is calculated the variations in

calculated local variances across time. The overall heteroskedasticity index σ2σ2yis

simply defined as the variance of locally computed variances using inner product

with a uniformly distributed averaging filter g∞∞ as follows:

σ2

σ2Ny(t) =

⟨g∞∞(t), σ2

y(t)2⟩− ⟨

g∞∞(t), σ2y(t)

⟩2(4.12)

SoLVI: Slope of Local Variance Index 68

Figure 4.5: HVI with different kernel sizes

where 〈·, ·〉 is the inner product, g∞∞(t) is an infinite uniformly distributed averaging

filter and

∫ ∞

∞g∞∞(t) dt = 1 (4.13)

4.3 SoLVI: Slope of Local Variance Index

Quadratic growth, boundlessness and choice of proper kernel size imposed a series

of challenges on HVI. As HVI, technically, calculates variance of local variances it

SoLVI: Slope of Local Variance Index 69

features a quadratic growth as number of different local variances increase. The

variance of local variances also caused the HVI as a metric to be boundless. The

choice of the kernel size N remains a tradeoff factor that governs the smoothness

of the local variance curve (Figure 4.5). A smaller kernel size produces a very

noisy local variance function. On the other hand, a large kernel size derives a

very smooth local variance function and causing HVI to record a very small value

implying a homoskedastic behaviour. These issues have been addressed in the

Slope of Local Variance Index (SoLVI) [90]. The main modification of SoLVI over

HVI is performing regression to local variance functions and calculating average

slope of the local variance function in degrees.

4.3.1 Local Variance Regression

Deriving regression function of local variances relaxes the effect of the kernel size

N . As mentioned before, a small N derives a noisy local variance function. How-

ever, a regression function smoothes the noisy local variance function and produces

a trend function highlighting the growth in number of distinct variances in the time

series. In order to add regression component R in the estimation of local variance

function, eq. 4.10 will be changed to

σ2y,N,R (t) = R

(μNy2(t)−

[μNy (t)

]2)(4.14)

SoLVI: Slope of Local Variance Index 70

Figure 4.6: SolVI scores for time series generated using different 64 sigmas. Thegraphs demonstrate different kernel sizes w.

4.3.2 Average Slope of Local Variance

To qualify as an index the proposed metric is then mapped between 0◦ and 90◦

using tan−1. The change of local variances is measured by estimating

mσ2y=

d

dtσ2y,N,R (t) (4.15)

SoLVI: Slope of Local Variance Index 71

and then heteroskedasticity is quantified by calculating the average tangent angle

of local variance function as

μθ(σ2y) =

1

N

∫ N

t=1

[tan−1

(d

dtR (

σ2y(t|w)

))]dt (4.16)

where θ(σ2y

)is the local tangent angles function of σ2

y(t|w), N is the length of

the time series and μθ(σ2y) is the average local tangent angles of the same function

that correlates theoretically with the change of local variances and hence quantifies

heteroskedasticity. Figure 4.6 shows SoLVI scores of synthesised heteroskedastic

data with number of standard deviations ranging from 1 to 64.

4.3.3 Selection of Kernel Size w

The application domain dictates the selection of kernel sizes w. The economist

investigating the time series should have an idea about the range of number of

variances in the time series. As illustrated in Figure 4.6, higher kernel sizes lin-

earises the SoLVI score graph. The kernel size w serves then as a parameter to

zoom in and out on a particular number of sigmas.

4.4 Conclusions

Based on our study, there are two types of metrics to measure heteroskedasticity:

local variance based metrics and statistical based metrics. Local variance based

metrics were the core of our early attempts to quantify heteroskedasticity. They

SoLVI: Slope of Local Variance Index 72

calculate the local variances in the time series. Both Heteroskedasticity variance

Index (HVI) and Slope of Local Variance Index (SoLVI) techniques can be clas-

sified under this category. HVI uses global variance of the obtained local sigmas.

The SoLVI model performs regression on local sigmas and calculates the average

slope.The proposed index provides more than just a hypothesis test. The main

advantage of the proposed metric is providing a quantifying method to test how far

a heteroskedastic series is from being homoskedastic. The results show consistency

between the proposed index and the widely popular heteroskedasticity tests.

Chapter 5

Divergence HeteroskedasticityMeasure

An alternative approach to measure heteroskedasticity is to sample the estimated

local variances in the time series. By doing this, a probability distribution pσ2 of

the local variances can be derived. In theory, a homoskedastic time series should

have a consistent local variance σ2 over time. Consequently, the probability dis-

tribution of a homoskedastic time series should be unimodal and centred around

σ2. On the other hand, a heteroskedastic time series should, in theory, approach

a uniform distribution covering a wide range of local variances. The ultimate

heteroskedasticity time series should, in theory, feature a uniform distribution

Figure 5.1: Probability density function of local variances for a homoskedastic [left]and a heteroskedastic [middle] and a theoretically ultimate heteroskedastic [right] time

series.

73

Divergence Heteroskedasticity Measure 74

U (0,∞). Therefore, measuring the distance between the probability distribution

of the local variances pσ2 and the uniform distribution provides a quantified mea-

sure of heteroskedasticity. In this section, we propose heteroskedasticity measures

based on probability distribution metrics. The heteroskedasticity quantified mea-

sure is defined as follows:

H(y) = Δp

(P (σ2

y),U(0,∞))

(5.1)

where Δp : P2 → [0, 1] is a distribution distance function of the estimated local

variances σ2y .

Many probability distribution metrics are available. However, most of them rely on

entropies, joint probability density functions and sigma algebra. In this section a

justification for excluding three of the most famous probability distribution metrics

is discussed.

5.1 Mutual Information (MI)

Mutual information between two random variablesX and Y derives a cross entropy

between the joint probability distribution p(x, y) and the ultimate scenario of

complete mutual independence p(x) · p(y) as follows:

MI(X;Y ) =

∫ ∫p(x, y) log

(p(x, y)

p(x)p(y)

)dx dy (5.2)

Divergence Heteroskedasticity Measure 75

While MI can be used to measure the information shared between X and Y and

equals to zero when X and Y are completely independent as follows:

MI(X;Y ) =

∫ ∫p(x)p(y) log

(p(x)p(y)

p(x)p(y)

)dx dy (5.3)

=

∫ ∫p(x)p(y) log 1 dx dy = 0 (5.4)

it does not, however, provide a good solutions for quantifying heteroskedasticity

because it is only bounded with the maximum entropy of X or Y as follows:

MI(X;X) =

∫ ∫p(x, x) log

(p(x, x)

p(x)p(x)

)dx dy (5.5)

=

∫p(x) log

(p(x)

p(x)p(x)

)dx (5.6)

=

∫p(x) log

(1

p(x)

)dx (5.7)

=

∫p(x) log

(1

p(x)

)dx (5.8)

=

∫p(x) log p(x)−1 dx (5.9)

= −∫p(x) log p(x) dx = H(X) (5.10)

where H(X) is the entropy of X.

Divergence Heteroskedasticity Measure 76

5.1.1 Tsallis Driven Mutual Information (MIα)

Another variation of mutual information was proposed by Cvejic et al. in [91].

They proposed to use the tunable Tsallis entropy [92] described below.

MIα(X, Y ) =1

1− α

(1−

∫ ∫p(x)α

p(y)1−αdx dy

)(5.11)

where α ∈ R − {1} and MIα(X, Y ) → MI(X, Y ) as α → 1. This was proven by

applying l’hopital rule on eq. 5.11 and substituting α = 1.

MI1(X, Y ) = limα→1

MIα(X, Y ) (5.12)

= limα→1

1− ∫ ∫p(x)αp(y)α−1 dx dy

1− α(5.13)

= limα→1

ddα

[1− ∫ ∫

p(x)αp(y)α−1 dx dy]

ddα

[1− α](5.14)

= limα→1

∫ ∫p(x)αp(y)1−α ln p(x)

p(x)αp(y)1−α ln p(y)dx dy (5.15)

= limα→1

∫ ∫p(x)αp(y)1−α ln p(x)

ln p(y)dx dy (5.16)

=

∫ ∫p(x)

ln p(x)

ln p(y)dx dy (5.17)

= MI(X, Y ) (5.18)

5.2 Jensen-Shannon Divergence

Jensen-Shannon divergence metric uses sigma algebra [93] to derive an intermedi-

ate random variable M = 12(X + Y ) which serves as a reference point to measure

Divergence Heteroskedasticity Measure 77

distance of X and Y from using mutual information as follows:

JSD(X, Y ) =1

2MI(X,M) +

1

2MI(Y,M) (5.19)

While this metric is bounded to 0 ≤ JSD(X, Y ) ≤ 1, deriving the mixture distri-

bution of the random variable M is computationally intensive.

5.3 Renyi Divergence

Renyi divergence [94] uses a generalised form of Shannon, Hartley, min-, and

collision- entropies [95, 96] and is formulated as follows:

Hα =1

1− αlog

(∫p(x)α

)(5.20)

Renyi’s divergence metric is then formulated as follows:

Rα(X, Y ) =1

1− αlog

(∫ ∫p(x)αp(y)1−α dx dy

)(5.21)

As Reynyi entropy generalises many entropies, its divergence metric also gener-

alises many divergence metrics. For example, when α → 1 the Renyi entropy

converges to Shannon’s entropy and the divergence metric converges to the Mu-

tual Information metric by applying l’hopital rule as follows:

Divergence Heteroskedasticity Measure 78

H1(X) = limα→1

log(∫

p(x)α)

1− α(5.22)

= limα→1

∫p(x)α log p(x)∫

p(x)α

−1(5.23)

= limα→1

−∫p(x)α log p(x)∫

p(x)α(5.24)

= − 1∫p(x)

∫p(x) log p(x) (5.25)

= −∫

p(x) log p(x) = H(X) (5.26)

R1 =

∫ ∫p(x, y) log

(p(x, y)

p(x)p(y)

)dx dy = MI(X, Y ) (5.27)

Additionally, Renyi divergence also correlates with Bhattacharyya coefficient when

α = 12as follows:

R 12(X) =

1

1− 12

log

(∫ ∫p(x)

12p(y)1−

12 dx dy

)(5.28)

= −2 log

(∫ ∫√p(x)p(y) dx dy

)= −2 logBC(X, Y ) (5.29)

= 2ΔBp (X, Y ) (5.30)

5.4 Bhattacharyya Distance

Bhattacharyya-based metrics rely on deriving the Bhattacharyya Coefficient BC

[97]. The BC coefficient measures the closeness between two probability distribu-

tions p and q by measuring how disjoint they are as follows:

Divergence Heteroskedasticity Measure 79

Tsallis

Figure 5.2: The effect of HVI window size on Bhattacharyya coefficient. The resultare of time series generated using different 64 sigmas. The graphs demonstrate different

kernel sizes w.

BC(p, q) =∑x∈X

√p(x)q(x) (5.31)

Figure 5.2 shows the Bhattacharyya coefficient as number of local variances in-

crease in the dataset. Bhattacharyya coefficient has an upper bound of 1 if and

only if p(x) = q(x).

This coefficient is then used to derive the Bhattacharyya distance as follows:

Divergence Heteroskedasticity Measure 80

Figure 5.3: Bhattacharyya distance of a time series generated using different 64sigmas. The graphs demonstrate different kernel sizes w.

ΔBp (p, q) = − lnBC(p, q) (5.32)

However, this distance function has no upper bound and does not satisfy the

triangulation inequality. Figure 5.3 demonstrates the Bhattacharyya distance.

Divergence Heteroskedasticity Measure 81

5.4.1 Hellinger Distance

Finally, Hellinger et al. provided a sound Bhattacharyya-based divergence metric

that is bounded and satisfies the triangulation inequality in [98]. The Hellinger

metric is derived from Bhattacharyya coefficient as:

ΔHp (p, q) = 1−

√1− BC(p, q) (5.33)

Figure 5.4 shows the effect of window size on Hellinger divergence metric.

5.4.2 Bhattacharyya Heteroskedasticity Measure

As a heteroskedastic time series, by definition, is derived from systems of different

variances; the probability distribution of local variances p(σ) of a heteroskedastic

time series must be approaching a uniform distribution U . On the other hand,

a homoskedastic time series will have a probability distribution further from the

uniform distribution U . To guarantee bounded function we chose Bhattacharayya

coefficient over the Renyi driven metric in eq. 5.21. The Bhatacharayya het-

eroskedasticity measure is then formulated as follows:

HB(y) =∑x∈X

√P(σ2y

)U (σ2y

)(5.34)

where P(σ2y

)is a probability distribution function of the estimated local variances

σ2y . A Hellinger variation can also be derived with the same concept as follows:

Divergence Heteroskedasticity Measure 82

Figure 5.4: Hellinger coeffecient of a time series generated using different 64 sigmas.The graphs demonstrate different kernel sizes w.

HH(y) = 1−√

1−∑x∈X

√P(σ2y

)U (σ2y

)(5.35)

5.5 Conclusions

In this chapter, the divergence heteroskedasticity has been measured. The mo-

tivation was that most of the available probability distribution metrics rely on

entropies, joint density functions and sigma algebra.

Divergence Heteroskedasticity Measure 83

Mutual information, Jensen-Shannon divergence and Renyi divergence were ex-

cluded. Consequently, the Bhattacharyya Distance was adopted to introduce

Bhattacharyya heteroskedasticity measure. The main reason behind preferring

the Bhattacharyya over the Renyi divergence model was to guarantee a bounded

function. The Bhattacharyya heteroskedasticity measure was then formulated,

with the ability to derive the Hellinger variation.

Chapter 6

Conclusions and Future Work

This chapter discusses the conclusions driven from the work presented in this thesis

and identifies rooms for future improvements and open research problems.

This study presents a novel strategy for identifying the presence and quantify-

ing heteroskedasticity in the time series, by deriving a quantifying measurement

for heteroskedasticity. The proposed measure relies on the common definition of

heteroskedasticity as a time- variant variance in the time series. This framework

enables the pre-existing models to predict this type of uncertainty, making it more

beneficial and more satisfying in predicting long-term span. In addition, a better

forecast is to be achieved to allow for a more reliable decision- making and future

planning.

Traditionally, heteroskedasticity features in time series have become a factor around

which forecasting models have to work and errors for which they must compensate.

In this work we have addressed the heteroskedasticity problem from a different per-

spective. The objective of this work is to quantify a heteroskedastic behaviour as

84

Conclusions and Future Work 85

opposed to just detecting the behaviour with heteroskedastic tests. First, an argu-

ment justifying the need for heteroskedasticity quantification rather than detection

was derived. Then, we characterised the different features that make a time se-

ries test positive to heteroskedastic tests. This characterisation is then used to

generate synthetic test data with different levels of heteroskedasticity. Finally, we

derived two families of solutions to quantify heteroskedasticity.

The first family relies on local temporal estimation of statistical parameters that

are used in two schemes. In the first scheme, the time-variant change of the esti-

mated parameters are then used for quantifying heteroskedasticity. HVI and SoLVI

algorithms described in Chapter 3 were developed under this scheme. The pro-

posed index serves as a cost function to be optimised in many fields such as ecology,

machine learning and finance. The implementation of the proposed heteroskedas-

ticity index is straightforward. It utilises statistical and signal processing functions

available in all technical computing packages such as MatlabTM, MathematicaTM

and R. Future improvement of this work will apply different nonlinear regression

methods (e.g. polynomial and neural network) to estimate local variances in order

to resolve the current limitations of heteroskedasticity tests with unpredictable

means. Similar techniques can also be applied to quantify time variant statistical

distributions which gained more interests recently because of the climate change

and its effects on different ecosystems.

The second scheme derives the distribution of the estimated local statistical pa-

rameters and employs Kullback-Leibler divergence metrics [99] for quantification.

Conclusions and Future Work 86

Bhattacharyya and mutual information metrics were employed in Chapter 4. The

second family relies on applying heteroskedasticity tests to split the time series

into homoskedastic parts. This method facilitates the study of heteroskedastic

patterns.

During this study we discovered that homoskedastic signals with time varying sta-

tistical distributions do test positively to heteroskedastic tests while maintaining

a stationarity. Quantifying heteroskedasticity via binary decomposition [100, 101]

was investigated and will be the core of future work and implementations. Fur-

ther applications in the fields of ecological studies and financial markets are to be

explored.

References

[1] T. Bollerslev, “A conditionally heteroskedastic time series model for specu-lative prices and rates of return,” The Review of Economics and Statistics,vol. 69, no. 3, pp. 542–547, 1987.

[2] R. Engle and D. F. Kraft, “Multiperiod forecast error variances of inflationestimated from ARCH models,” Applied Time Series of Economic Data, pp.293–302, 1983.

[3] A. Harvey, E. Ruiz, and E. Sentana, “Unobserved component time-seriesmodels with ARCH disturbances,” Journal of Econometrics, vol. 52, no.1-2, pp. 129–157, 1992.

[4] T. Nilsen and T. Aven, “Models and model uncertainty in the context ofrisk analysis,” Reliability Engineering & System Safety, vol. 79, no. 3, pp.309–317, 2003.

[5] W. D. Rowe, “Understanding uncertainty,” Risk Analysis, vol. 14, no. 5, pp.743–750, 1994.

[6] G. W. Parry, “The characterisation of uncertainty in probabilistic risk as-sessments of complex systems,” Reliability Engineering & System Safety,vol. 54, no. 2, pp. 119–126, 1996.

[7] J. Helton, “Treatment of uncertainty in performance assessments for complexsystems,” Risk Analysis, vol. 14, no. 4, pp. 483–511, 1994.

[8] G. Raadgever, C. Dieperink, P. Driessen, A. Smit, and H. Van Rijswick,“Uncertainty management strategies: Lessons from the regional implemen-tation of the water framework directive in the netherlands,” EnvironmentalScience & Policy, vol. 14, no. 1, pp. 64–75, 2011.

[9] S. C. Hora, “Aleatory and epistemic uncertainty in probability elicitationwith an example from hazardous waste management,” Reliability Engineer-ing & System Safety, vol. 54, no. 2, pp. 217–223, 1996.

87

References 88

[10] F. O. Hoffman and J. S. Hammond, “Propagation of uncertainty in riskassessments: the need to distinguish between uncertainty due to the lack ofknowledge and due to variability,” Risk Analysis, vol. 14, no. 5, pp. 707–712,1994.

[11] S. Rai, D. Krewski, and S. Bartlett, “A general framework for the analysisof uncertainty and variability in risk assessment,” Human and ecological riskassessment, vol. 2, no. 4, pp. 972–989, 1996.

[12] M. E. Pate-Cornell, “Uncertainties in risk analysis: six levels of treatment.”Reliability Engineering & System Safety, vol. 54, no. 2, pp. 95–111, 1996.

[13] W. L. Oberkampf, K. V. Diegert, K. Alvin, and B. M. Rutherford, “Variabil-ity, uncertainty, and error in computational simulation,” American Societyof Mechanical Engineers- Publications- Heat Transfer Division, vol. 32, no. 2,pp. 135–154, 1998.

[14] D. Hobson, “Stochastic volatility model,” Mathematical Finance, vol. 14,no. 4, pp. 537–556, 2004.

[15] R. Cont, Frontiers in quantitative finance : volatility and credit risk model-ing, ser. Wiley finance. Hoboken, N.J.: John Wiley & Sons, 2009.

[16] R. Cont and P. Tankov, Financial modelling with jump processes, ser. finan-cial mathematics series, 2004.

[17] A. C. Harvey, Dynamic models for volatility and heavy tails : with applica-tions to financial and economic time series, ser. Econometric society mono-graphs, 2013.

[18] R. Cont, Volatility Clustering in Financial Markets: Empirical Facts andAgent-based Models. Springer Berlin Heidelberg, 2007, pp. 289–309.

[19] R. Engle, “Autoregressive conditional heteroskedasticity with estimates ofthe variance of united kingdom inflation,” Econometrica, vol. 50, no. 4, pp.987–1007, 1982.

[20] ——, “Estimates of the variance of u.s. inflation based upon the ARCHmodel,” Journal of Money, Credit and Banking, vol. 15, no. 3, pp. 286–301,1983.

[21] L. Bauwens, C. M. Hafner, and S. Laurent, Handbook of volatility modelsand their applications. Wiley, March 2012.

References 89

[22] T. Bollerslev, R. Y. Chou, and K. F. Kroner, “ARCH modelling in finance- a review of the theory and empirical evidence,” Journal of Econometrics,vol. 52, no. 1-2, pp. 5–59, 1992.

[23] C. Alexander, “Principal component models for generating large GARCHcovariance matrix,” Economic Notes, vol. 31, no. 2, pp. 337–349, 2002.

[24] L. Bauwens, S. Laurent, and J. V. K. Rombouts, “Multivariate GARCHmodels: a survey,” Journal of Applied Econometrics, vol. 21, no. 1, pp. 79–109, 2006.

[25] R. F. Engle and M. E. Sokalska, “Forecasting intraday volatility in the usequity market. multiplicative component GARCH, type = Journal Article,volume = 10, year = 2012,” Journal of Financial Econometrics, no. 1, pp.54–83.

[26] P. R. Hansen and A. Lunde, “A forecast comparison of volatility models:does anything beat a GARCH (1,1)?” Journal of Applied Econometrics,vol. 20, no. 7, pp. 873–889, 2005.

[27] M. Higgins, “A class of nonlinear ARCH models,” International EconomicReview, vol. 33, no. 1, pp. 137–158, 1992.

[28] S. Hwang, V. Pereira, and L. Pedro, “The effects of structural breaks inARCH and GARCH parameters on persistence of GARCH models,” Com-munications in Statistics - Simulation and Computation, vol. 37, no. 3, pp.571–578, 2008.

[29] C. Francq and J.-M. Zakoian, GARCH models : structure, statistical infer-ence, and financial applications. Chichester, West Sussex: Wiley, 2010.

[30] R. Engle, S. M. Focardi, and F. J. Fabozzi, ARCH/GARCH models in Ap-plied Financial Econometrics. Wiley, 2008.

[31] R. Plackett, “Some theorems in least squares,” Biometrika, vol. 37, no. (1-2),pp. 149–157, 1950.

[32] P. Galeano and R. S. Tsay, “Shifts in individual parameters of a GARCHmodel,” Journal of Financial Econometrics, vol. 8, no. 1, pp. 122–153, 2010.

[33] J. D. Hamilton, Time Series Analysis. Princeton University, 1994.

References 90

[34] A. C. Harvey, Time series models, 2nd ed. Cambridge, Mass.: MIT Press,1993.

[35] M. Hassan, M. Hossny, S. Nahavandi, and D. Creighton, “Heteroskedasticityvariance index,” IEEE Internationl Conference on Modelling and Simulation2012, pp. 135–141.

[36] J. Duan, S. Luo, and C. Wang, Recent development in stochastic dynamicsand stochastic analysis, ser. Interdisciplinary mathematical sciences. Sin-gapore ; London: World Scientific, 2010.

[37] C. Chatfield, Problem solving : a statistician’s guide, 2nd ed., ser. Chapman& Hall texts in statistical science series. London ; New York: Chapman &Hall, 1995.

[38] ——, The analysis of time series : an introduction, 6th ed., ser. Texts instatistical science. Boca Raton, FL: Chapman & Hall/CRC, 2004.

[39] G. Box, G. M. Jenkins, and G. C. Reinsel, Time series analysis : forecastingand control, 4th ed., ser. Wiley series in probability and statistics. Hoboken,N.J.: John Wiley, 2008.

[40] C. R. Plott and S. Sunder, “Efficiency of experimental security markets withinsider information: An application of rational- expectations models,” TheJournal of Political Economy, vol. 90, no. 4, pp. 668–698, 1982.

[41] G. Samoridnitsky and M. Taqqu, Stable Non-Gaussian Random Processes:Stochastic Models with Infinite Variance. Chapman and Hall, 1994.

[42] J. Nolan, “Numerical calculation of densities and distribution functions,”Stochastic Models, vol. 13, no. 4, pp. 759–774, 1997.

[43] J. P. Nolan, Stable Distributions - Models for Heavy Tailed Data. Boston:Birkhauser, 2015.

[44] J. Chambers, C. Mallows, and B. Stuck, “A method for simulating stablerandom variables,” J. Amer. Statist. Assoc., vol. 71, pp. 340–344, 1976.

[45] J. L. Knight and S. Satchell, “Forecasting volatility in the financial markets,”pp. viii, 415 p. ill. 25 cm., 2007.

[46] G. Box and G. Jenkins, Time Series Analysis: Forecasting and Control,4th ed. Wiley, 1970, vol. 734.

References 91

[47] E. M. Lin, C. W. Chen, and R. Gerlach, “Forecasting volatility with asym-metric smooth transition dynamic range models forecasting volatility withasymmetric smooth transition dynamic range models,” International Jour-nal of Forecasting, vol. 28, no. 2, pp. 384–399, June 2012.

[48] T. J. Brailsford and R. W. Faff, “An evaluation of volatility forecastingtechniques,” Journal of Banking and Finance, vol. 20, no. 3, pp. 419–438,1996.

[49] T. Bollerslev and T. G. Andersen, “Modelling and forecasting realisedvolatility,” Econometrica, vol. 71, no. 2, pp. 579–625, 2003.

[50] J. Engelberg, C. F. Manski, and J. Williams, “Comparing the point pre-dictions and subjective probability distributions of professional forecasters,”Journal of Business and Economic Statistics, vol. 27, no. 1, pp. 30–41, 2009.

[51] S. Pellegrini, E. Ruiz, and A. Espasa, “Conditionally heteroskedastic unob-served component models and their reduced form,” Economics Letters, vol.107, no. 2, pp. 88–90, 2010.

[52] ——, “Prediction intervals in conditionally heteroskedastic time series withstochastic components,” International Journal of Forecasting, vol. 27, pp.308–319, April-June 2011.

[53] A. C. Harvey, The econometric analysis of time series, 1st ed., ser. LSEhandbooks in economics. Cambridge, Mass.: MIT Press, 1990.

[54] R. Engle, “GARCH 101: The use of ARCH/GARCH models in appliedeconometrics,” Journal of Economic Perspectives, vol. 15, no. 4, pp. 157–168, 2001.

[55] T. Bollerslev, “Generalized autoregressive conditional heteroskedasticity,”Journal of Econometrics, vol. 100, no. 1, pp. 307–327, 1986.

[56] T. Busch, “A robust LR test for the GARCH model,” Economics Letters,vol. 88, pp. 358–364, 2005.

[57] D. B. Nelson and C. Q. Cao, “Inequality constraints in the univariateGARCH model,” Journal of Business and Economic Statistics, vol. 10, no. 2,pp. 229–235, 1992.

[58] J. Duan, “Augmented GARCH (p,q) process and its diffusion limit,” Journalof Econometrics, vol. 79, no. 1, pp. 97–127, 1997.

References 92

[59] C. G. Lamoureux and W. D. Lastrrapes, “Heteroskedasticity in stock returndata: Volume versus GARCH effects,” Journal of Finance, vol. 45, no. 1,pp. 221–229, 1990.

[60] S. J. Taylor, Modelling Financial Time Series. World Scientific, 2008.

[61] R. F. Engle and V. K. NG, “Measuring and testing the impact of news onvolatility,” The Journal of Finance, vol. 48, no. 5, pp. 1749–1778, 1993.

[62] L. Hentshel, “All in the family: Nesting symmetric and asymmetric GARCHmodels,” Journal of Financial Economics, vol. 39, pp. 71–104, 1995.

[63] V. Barnett and T. Lewis, Outliers in Statistical Data, 3rd ed. Wiley, 1994.

[64] H. Liu, S. Shah, and W. Jiang, “On-line outlier detection and data cleaning,”Computers and Chemical Engineering, vol. 28, no. 9, pp. 1635–1647, 2004.

[65] Y. Chen, D. Miao, and H. Zhang, “Neighborhood outlier detection,” ExpertSystems with Applications, vol. 37, pp. 8745–8749, 2010.

[66] I. Morgan, “Stock prices and heteroskedasticity,” Journal of Business andEconomic Statistics, vol. 49, pp. 496–508, 1976.

[67] A. Harvey and G. Phillips, “A comparison of the power of some tests forheteroskedasticity in the general linear model,” Journal of Econometrics,vol. 2, no. 4, pp. 307–316, 1974.

[68] S. Goldfeld and R. Quandt, “Some tests for homoskedasticity,” Journal ofthe American Statistical Association, vol. 60, no. 310, pp. 539–547, 1965.

[69] H. Glejser, “A new test for heteroskedasticity,” Journal of the AmericanStatistical Association, vol. 64, no. 325, pp. 316–323, 1969.

[70] R. E. Park, “Estimation with heteroskedastic error terms,” Econometrica,vol. 34, no. 4, 1966.

[71] D. A. Bowers and H. C. Rutemiller, “Estimation in a heteroskedastic regres-sion model,” Journal of the American Statistical Association, vol. 63, no.332, pp. 552–557, 1968.

[72] J. Harvey, M. K. Johnson, and J. Harvey, Modern economics. Study guideand workbook, 5th ed. Basingstoke: Macmillan Education, 1989.

References 93

[73] Y. Honda, “Testing the error components model with non-normal distur-bance,” The Review of Economic Studies, vol. 52, no. 4, pp. 681–690, 1985.

[74] D. Cox and D. Hinkley, Theoretical Statistics, 1st ed., 1979.

[75] J. D. Lyon and C.-L. Tsai, “A comparison of tests for heteroskedasticity,”Journal of the Royal Statistical Society Series D ( The Statistician), vol. 45,no. 3, pp. 337–349, 1996.

[76] D. Cox and N. Reid, “Parameter orthogonality and approximate conditionalinference,” Journal of the Royal Statisical Society Series B: Methodological,vol. 49, no. 1, pp. 1–39, 1987.

[77] A. P. Verbyla, “Modelling variance heterogeneity: Residual maximum like-lihood and diagnostics,” Journal of the Royal Statistical Society Series B:Methodological, vol. 55, no. 2, pp. 493–508, 1993.

[78] T. S. Breusch and A. R. Pagan, “Simple test for heteroskedasticity andrandom coefficient variation,” Econometrica, vol. 47, no. 5, pp. 1278–1294,1979.

[79] H. White, “A heteroskedasticity-consistent covariance matrix estimator anda direct test for heteroskedasticity,” Econometrica, vol. 48, no. 4, pp. 817–838, 1980.

[80] H. Levene, “In contributions to probability and statistics: Essays in honourof harold hotelling,” Stanford University Press, pp. 278–292, 1960.

[81] R. T. Baillie and T. Bollerslev, “Prediction in dynamic models with time-dependent conditional variances,” Journal of Econometrics, vol. 52, pp. 91–113, 1992.

[82] A. Khosravi, S. Nahavandi, and D. Creighton, “Construction of optimalprediction intervals for load forecasting problems,” IEEE transactions onPower systems, vol. 25, no. 1496-1503, 2010.

[83] ——, “A prediction interval-based approach to determine optimal structuresof neural network metamodels,” Expert Systems with Applications, vol. 37,no. 3, pp. 2377–2387, 2010.

[84] A. Khosravi, S. Nahavandi, D. Creighton, and A. Atiya, “Comprehensive re-view of neural network-based prediction intervals and new advances,” IEEETransactions on Neural Networks, vol. 22, pp. 1341–1356, 2011.

References 94

[85] A. Khosravi, S. Nahavandi, D. Creighton, and R. Naghavizadeh, “Uncer-tainty quantification for wind farm power generation,” Proceedings of theIEEE International Joint Conference on Neural Networks, pp. 309–314, June2012.

[86] M. Hossny, S. Nahavandi, and D. Creighton, “Comments on informationmeasure for performance of image fusion,” Electronics Letters, vol. 44, no. 28,pp. 1066–1067, 2008.

[87] M. Hossny and S. Nahavandi, “Measuring the capacity of image fusion,”in IEEE International Conference on Image Processing Theory,Tools andApplications (IPTA), October 2012, pp. 415–420.

[88] M. Hossny, S. Nahavandi, and D. Creighton, “An evaluation mechanism forsaliency functions used in localized image fusion quality metrics,” in IEEEConference on Computer Modelling and Simulation, March 2012, pp. 407–415.

[89] T. S. Breusch and A. R. Pagan, “The lagrange multiplier test and its ap-plications to model specification in econometrics,” The Review of EconomicStudies, vol. 47, no. 1, pp. 239–253, 1980.

[90] M. Hassan, M. Hossny, S. Nahavandi, and D. Creighton, “Quantifying het-eroskedasticity using slope of local variances index,” IEEE InternationalConference on Modelling and Simulation, no. 107-111, April 2013.

[91] N. Cvejic, C. Canagarajah, and D. Bull, “Image fusion metric based onmutual information and tsallis entropy,” Electronics Letters, vol. 42, no. 11,pp. 626–627, 2006.

[92] C. Tsallis, “Possible generalization of Boltzmann-Gibbs statistics,” Journalof Statistical Physics, vol. 52, pp. 479–487, 1988.

[93] T. Fischer, “On simple representations of stopping times and stopping timesigma-algebras,” Statistics and Probability Letters, vol. 83, no. 1, pp. 345–349, 2013.

[94] A. Renyi, “On measures of information and entropy,” Proceedings of thefourth Berkeley Symposium on Mathematics, Statistics and Probability, pp.547–561, 1960.

[95] C. E. Shannon, “A mathematical theory of communication,” Bell SystemTechnical Journal, vol. 27, no. 3, p. 379423, 1948.

References 95

[96] R. Knig, R. Renner, and C. Schaffner, “The operational meaning of min-andmax-entropy,” IEEE Transactions on Information Theory, vol. 55, no. 9, pp.4337–4347, 2009.

[97] A. Bhattacharyya, “On a measure of divergence between two statistical pop-ulations defined by their probability distributions,” Bulletin of the CalcuttaMathematical Society, vol. 35, pp. 99–109, 1943.

[98] E. Hellinger, “Neue begrndung der theorie quadratischer formen von un-endlichvielen vernderlichen,” Journal fr die reine und angewandte Mathe-matik, vol. 136, pp. 210–271, 1909.

[99] R. Dahlhaus, “On the kullback-leibler information divergence of locally sta-tionary processes,” Stochastic processes and their applications, vol. 62, no. 1,pp. 139–168, March 1996.

[100] M. Hassan, M. Hossny, S. Nahavandi, and D. Creighton, “Quantifying het-eroskedasticity via binary decomposition,” IEEE International Conferenceon Modelling and Simulation, pp. 112–116, 2013.

[101] M. Hossny, S. Nahavandi, D. Creighton, and M. Hassan, “Image fusion met-rics: Evolution in a nutshell,” IEEE International Conference on ComputerModelling and Simulation, pp. 443–450, 2013.