multiple time series analysis of freight rate indices

IN DEGREE PROJECT MATHEMATICS,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2020

Multiple Time Series Analysis of Freight Rate Indices

SIMON KOLLER

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ENGINEERING SCIENCES

Multiple Time Series Analysis of Freight Rate Indices

SIMON KOLLER

Degree Projects in Financial Mathematics (30 ECTS credits)

Degree Programme in Applied and Computational Mathematics

KTH Royal Institute of Technology year 2020

Supervisor at RISE Research Institutes of Sweden AB: Johannes Hüffmeier

Supervisor at KTH: Boualem Djehiche

Examiner at KTH: Boualem Djehiche

TRITA-SCI-GRU 2020:384

MAT-E 2020:091

Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

Abstract

In this master thesis multiple time series of shipping industry and financial data are analysed in

order to create a forecasting model to forecast freight rate indices. The data of main interest which

are predicted are the two freight rate indices, BDI and BDTI, from the Baltic Exchange. The project

investigates the possibilities for aggregated Vector Autoregression(VAR) models to outperform sim-

ple univariate models, in this case, an Autoregressive Integrated Moving Average(ARIMA) with

seasonal components. The other part of this thesis is to model market shocks in the freight rate

indices, given impulses in the other underlying VAR-model time series using the impulse response

function. The main results are that the VAR-model forecast outperforms the ARIMA-model in fore-

casting the tanker freight rate index (BDTI), while the the bulk freight rate index(BDI) is better

predicted by the simple ARIMA when calculating the forecast mean square error.

Keywords: VAR, Vector Autoregression, Impulse Response Function, Forecasting, Freight Rates,

Shipping, Seaborne trade, BDI, BDTI.

Sammanfattning

I denna avhandling analyseras multipla tidsserier over rederinarings- och finansiell data i syfte att

skapa en prognosticerande modell for att prognosticera fraktratsindex. Dataserierna som i huvud-

sak prognosticeras ar fraktratsindexen BDI och BDTI fran Baltic exchange. I projektet undersoks

om en aggregerad Vektor Autoregressiv(VAR) modell overtraffar en univariat modell, i detta fall

en Autoregressive Integrated Moving Average(ARIMA) med sasongsvariabel. I andra delen av

denna avhandling modelleras chocker i fraktratsindexen givet impulser i de andra underliggande

tidsserierna i de aggregerade VAR-modellerna. Huvudresultaten ar att VAR-modellens prognos

overtraffar ARIMA-modellen for tankerraterna (BDTI), medan bulkraterna(BDI) battre prognos-

ticeras av ARIMA-modellen, i avseende pa prognosernas beraknade mean square error.

Svensk titel: Multipel Tidsserieanalys av Fraktratsindex

Keywords: VAR, Vektor Autoregression, Impulse Response Function, Prognostisering, Fraktrater,

Sjofart, Sjoburen Frakt, BDI, BDTI.

Contents

1 Introduction 3

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Summary of Shipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.2 Freight Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.3 The Covid-19 Pandemic and Freight Market . . . . . . . . . . . . . . . . . . . 6

1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.6 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.7 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Theoretical Framework 10

2.1 Multiple Time Series Analysis and Forecasting . . . . . . . . . . . . . . . . . . . . . 10

2.2 Vector Autoregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Stable VAR(p) Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Stationary Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Autoregressive Integrated Moving Average-Model . . . . . . . . . . . . . . . . . . . . 14

2.6 Autocovariance and Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.7 Estimations of a VAR(p)-process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.8 Order Selection of VAR(p) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.9 Normality of Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.10 Expectation-Maximization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.11 Impulse Response Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.12 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.13 Forecast Error Variance Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Data 20

3.1 Time Series in Oil-Tanker-Rates-Model . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Time Series Bulk-Iron-Rates-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Imputations and EM-algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Method 25

4.1 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Lag-Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3 Model Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.1 Serial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.2 Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.3 Test of Normal Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.4 OLS-Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1

4.4 Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5 ARIMA-Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.6 Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26


5 Results 28

5.1 Final Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.1.1 Final VAR(p)-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.1.2 Final ARIMA-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2 Impulse responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2.1 Oil-Tanker-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2.2 Iron-Bulk-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3 Model Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.3.1 Forecast Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


6 Discussion 36

6.1 Analysis of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.2 Analysis of Final Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.3 Analysis of Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.4 Analysis of Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.5 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Conclusion 40

Appendix 41

Bibliography 46

2

1 Introduction

In this first section, the aim is to give the reader a brief background of the market of seaborne

trade and to assess the relevance and the problem formulations of this thesis. The reader who

is even more interested in the shipping business is referred to Martin Stopford’s Maritime Eco-

nomics, 2009.

1.1 Background

1.1.1 Summary of Shipping

In the year 2019, the maritime sector transported goods that totally weighed more than 11.09

billion tons. This ist amount of cargo is moved by a world fleet that in January the same year had

a total tonnage of 1.98 billion dwt (dead-weight tonnage, measure of the total cargo and loading

capacity of a ship). It is hard to estimate the total economic value of all cargoes that are shipped

by the shipping sector around the world, but the UNCTAD suggests 14 trillion US dollars, and

it is growing by an average of 2− 3% every year. In monetary value, the seaborne trades alone

carries around 70% of the total global trade. As of January 2019, among the biggest ship owning

nations Greece is the biggest at 18%, then Japan, China and Singapore, with 11%, 11% and 6%

respectively.1

Among the most common types of goods transported, ships accommodated for bulk freight

and tanker freight make up more than half of the capacity of the shipping market as can be seen

in Figure1. 2019 the total amount of goods that is freighted by tankers is 3169 million tons, and

the total amount of goods that is freighted by bulk ships is 3225 million tons, which sums up to

around 60% of the market. 2

Figure 1: Distribution of different types of cargo vessels of the world fleet over time.(UNCTAD,2019)

1UNCTAD, 20192UNCTAD, 2020

3

The tanker market can be decomposed into different parts. The minor parts of the tanker market

are chemical tankers, refined petroleum tankers, LNG-tankers. These together make out around

40% of the market of tankers. The major part, around 60%, is the crude oil tankers, also know

as dirty tankers, that transports crude oil from oil producing sites to refineries and so on. The

usual oil is transported in different sizes of tankers depending on routes, and some types that

can be mentioned is the Very Large Crude Carrier(VLCC, 200, 000 − 320, 000 dwt), Suezmax

crude tanker, (125, 000–199, 999 dwt) and the Aframax crude tanker (85, 000–124, 999 dwt). In

terms of volume oil transported, a fully loaded VLCC can carry 2 million barrels of oil. The year

2019, the biggest quantity, 32%, of the worlds oil is produced in the Middle East, and the biggest

consumption is in the East Asia, 36%. Since the big shale oil revolution of the US, they are no

longer the biggest oil importing country, but still a major player. Given that peak oil demand

might come sooner rather than later, the tanker market may need to look for new businesses. 3

Oil can also be defined as a type of bulk, often called liquid bulk, but in this thesis, bulk will

always mean the five major bulks. These are, in order of size, iron ore, grain, coal, phosphates

and bauxite. Earlier coal was the singular bulk goods which was shipped in the greatest quan-

tity, but since the markets demand for steam coal for energy has declined, and the distinction

between coking coal for steel industry and the steam coal for energy is made, iron ore and grain

are the biggest goods types regarding quantity in the bulk sector. Other minor bulks that can

be mentioned in this context are the forest and steel industry outputs. 4

The major bulk freight is made with iron ore, in fact 1500 million tons of iron ore was shipped

around the world with bulk freighters 2019. 57% of it comes from Australian mines, and 23%

from Brazilian. The worlds biggest demand for iron ore comes from China, where over 51% of

all the worlds steel is made. However, the Chinese infrastructural projects may not go on for

eternal time, so the bulk market may in the near future go through some structural changes.

The biggest and most usual types of bulk freights are the Capesize bulk carrier (≥ 100, 000 dwt)

and the Panamax bulk carrier (65, 000–99, 999 dwt). 5

1.1.2 Freight Rates

The biggest economic mechanisms in the shipping business are the freight rates, which have been

of utmost importance through all the historic economical shipping cycles. To put it simply, a

shipowner has two main tasks, one is to operate the ships, and the other is to make money from

the shipping business. The later task takes some skill since it is a question of being in the right

place in the right time, and to time the market cycles. A freight rate can be defined as the sum

of money paid to a shipowner or shipping line company for the carriage of each unit of goods

(tonne, cubic metre, barrel or container load) between specified ports. In this market there are

also other sorts of contracts for shipping, Time Charter Equivalent (TCE), which can be seen as

some sort of spot price of the freight rate, and the Forward Freight Agreement(FFA), which can

been seen as the shipping markets equivalent to options.6

Figure 2 gives a good overview of the main drivers of both the demand and the supply to

the maritime economy. The model is somewhat simplified, since like all parts of the world

economy, the maritime economy is very complex. In reality, some of the mechanisms are simply

3UNCTAD, 20204Stopford, 2009, p 645UNCTAD, 20206Stopford, 2009, p 135

4

grasped. The world economy decides the demand for the seaborne trade, and some developments

in particular commodity trades could change the trends of the economic growth (e.g. some

development in the steel industry demand because of the infrastructure expansion may influence

the iron ore trade).7

Figure 2: An overview and summary given by Stopford of the shipping markets supply anddemand model.(Stopford, 2009, p. 137, figure 4.1)

Some usual ways to trade the freight rates are via the freight rate indices. These indices are

compiled as any stock market index. They are a weighted average of several trade routes with

different ship sizes, and can be traded daily on the Baltic Exchange, for example. There are a

variety of compiled indices for the shipping market, but the focus in this thesis will be on the

Baltic Dry Index, an index over the main bulk freight in the world, and the Baltic Dirty Tanker

Index, which compiles the most common crude oil trade routes. The indices are a good measure

7Stopford, 2009, p 135

5

for the equilibrium of the shipping markets demand and supply. For more specific definitions,

the reader is referred to the Baltic Exchange’s website.8

As the reader can see in figure 2, the model that is illustrated is the supply and demand model,

with some more intricate definitions and specifications. Since this is a thesis in applied mathemat-

ics and not economics, the details of this model will not be discussed. Two specific phenomenons

are worth mentioning in this context, seasonality and the systems sensitivity to shocks. Season-

ality plays an important role in the seaborne commodities trade, since it can create short term

volatility. Specifically the grain trade can give up to 50% seasonal impact on the bulk freight

rates, since the crops are often harvested in September on the northern hemisphere. In the same

way, Oil and energy is traded seasonally with regard to temperature and weather season, seeing

that more oil is shipped on the northern hemisphere during the Autumn and early winter months

than the during the spring and summer. Shocks on the other hand are unpredictable events that

impact the market with great force. Shocks in the shipping market can come from all sorts of

directions, weather, politics, financial crisis, war and so on. Because of the unpredictability and

the size of the shocks, demand in shipping can change fast, and if it is timed correctly, vast

fortunes can be made(or lost), like the Onassis during the Suez-crisis. During the invasion of

Kuwait, the tanker market went into a boom as demand for tankers to use as storage(floating

storage) for crude oil rallied. Ergo, not all crises are bad for the shipping companies earnings.9

1.1.3 The Covid-19 Pandemic and Freight Market

2019 was a weak year for shipping, trade wars and weaker sentiment led to a decline in demand.

The covid-19 pandemic that overwhelmed the world in the beginning of 2020 led to an economic

crisis which greatly affected the shipping market. UNCTAD estimates that the demand for

shipping will shrink with 4.1% 2020, meaning billions of US dollars. The world was hit by a

simultaneous demand and supply shock which had not been seen before.10

The lockdowns that followed the pandemic caused the demand for crude oil to dive, and prices

for the same commodity followed. Following Saudi Arabia’s failure to agree with OPEC:s pro-

duction cuts, and their ”opening of the taps and valves” was the final blow for the crude oil,

which totally collapsed, and a situation of super contango arose. This led to a surge in tanker

demand, as traders rushed to hire tankers in order to store their oil in order to sell the oil for a

higher price in the future, called floating storage. All tanker segments saw huge peaks in freight

rates and TCE:s during March and April in 2020. As fast as the market rallied, the decline

came all ready in may when the demand for floating storage when the tonnage of floating storage

slumped from 45 million dwt at the end of April to 30 million dwt at the end of May. Demand

for Oil remained significantly lower in mid 2020, and since the OPEC agreement led to lower

output of crude, the tanker market outlook is still looking gruesome, and freights remain low

and very volatile.11

The Bulk market during the covid-19 pandemic was seeing a big slump due to the declining

demand for shipping and overcapacity. Especially during the early outbreak of the pandemic,

since China is the major importer of many bulk goods globally, including iron ore and grains.

8Baltic Exchange, 20209Stopford, 2009, p 143

10UNCTAD, 202011UNCTAD, 2020

6

The bulk market had already seen a tough year 2019, specifically because of lower Brazilian

iron ore export due to the Vale dam collapse. This chain of negative events caused high market

volatility, and the Capesize index of the Baltic Exchange(which is one of the components of the

Baltic Dry Index) went negative(−243) during February and March. Already in June, the same

index had risen to 2267 because of the easing of lockdowns in China and the rallying imports of

iron ore to the same country. The UNCTAD outlook for this market is not very positive either,

overcapacity is still a big problem, and China is mentioned as one of the key features for market

recovery since they are the biggest importers of bulk goods, such as iron ore and grains. 12

1.2 Aim

On account of the events that occurred during the spring of 2020, and the covid-19 pandemic, i.e.

counterintuitive movements in the freights rates in regard to the underlying commodities, the

idea behind this thesis is to analyse these markets. The aim of this project will be to assess time

series over different data, which are believed to hold important information about the maritime

economy and shipping business, in order to create a Vector Autoregression model. This VAR-

model will be used to forecast and develop an analysing tool to predict freight rates and simulate

shocks. In today’s shipping business there are many different freight rates for various types of

cargo, from Very Large Crude Carriers rates to Dry Bulk rates. These rates depend on many

different variables and are entwined with data such as crude oil price, economic growth, world

trade and the second hand market. This thesis will aim to assess these connections.

1.3 Relevance

The shipping business is a trillion dollar business, and it provides work for millions of people.

In today’s global world, the seaborne transport is highly necessary for the global economy.

The relevance in this project comes from the necessity for shipping companies to forecast their

incomes, and the importance of the oil price for the whole world trade according to UN, since it

is both the biggest goods type and affects as bunker fuel for propulsion. The same holds for the

iron ore and the development of the economies and infrastructure, and it is the most important

seaborne bulk goods.

1.4 Problem Formulation

This thesis seeks to analyse and model the following questions

• Will a VAR-process forecast freight rates for oil tankers and bulk better than the simple

univariate ARIMA, when taking into account data such as the Crude oil prices, Iron ore

price, and some other econometric/financial data?

• How will the impulse response function simulate shocks on the freight rates? Which un-

derlying data will have the biggest influence on the freight rates given an impulse?

1.5 Limitations

The limitations of the this thesis are that the focus will be on the bulk and tanker market. Specif-

ically the Baltic Dirty Tanker Index and the Baltic Dry Index will be analysed in the period of

2009-05 to 2020-11, and will be the variables of interest in their respectively model. Although

12UNCTAD, 2020

7

the container shipping market is growing each year, the market is not as easily modelled, and

is therefore not taken into consideration. In general the models that are created consists of a

number of time series, which are analysed together in order to create a good forecast. Although

the models are constructed with macro economic theory as a base, the models will be totally

unbiased in relation to assumed relationships between variables. This means that all the results

from this thesis are based strictly on mathematics and statistics rather than macro-economical

assumptions. The data which is used comes from open source sources, and can be easily retrieved.

The starting point was to also include some sort of data which tracked environmental actions

or sustainability within the shipping business, but since such data has either very low time fre-

quency, or has not been observed for especially long time, this goal is dropped. In the future

when society is more enlightened in these areas, some sort of analysis can perhaps be made, to

investigate if sustainability affects the freight rate pricing. Some other trials are made to include

an exogenous dummy variable in an attempt to model legislation for sulphuric emissions from

the shipping industry in certain areas, but this is also dropped because of lack of time and bad

effects on model diagnostics.

All the mathematical framework used in this thesis achieved their goal. Although it should

be noted that the VAR-modelling is the most basic form of analysing multiple time series. Other

such models which can be of interest are for example VECM, Vector Error Cointegration Model,

or BVAR, Bayesian Vector Autoregression. These are also both often used in econometric mod-

elling. The VAR-model is often used to analyse big macro economic systems, like the relationship

between mortgage rates, housing prices and GDP.

1.6 Previous Work

Two previous work are the main sources for investigation and inspiration for ideas for this thesis.

The first is ”A Novel Approach to Forecasting the Bulk Freight Market” by Vangelis Tsioumas

et al.. In this article the authors try to model the bulk shipping market with BDI as the variable

of interest, some data regarding Chinese steel output, some sort of compiled index of their own

device and the bulk fleets development. The analysis of the VAR-model is performed in first

differences rather than levels. This model is then compared to an ARIMA-model of the BDI, and

the conclusion is that the VAR-model is solid and creates a better forecast than the univariate

ARIMA. The analysis is performed on monthly data between 1999 and 2014. 13

The other article, ”Quantifying the relationship between seaborne trade and shipping freight

rates: A Bayesian vector autoregressive approach” from 2020, is written by Michail and Melas,

and it tries to quantify the relationships between the Baltic Dry Index, the Baltic Dirty Tanker

Index, and the Baltic Clean Tanker Index and their respective underlying commodities. The

method used is a BVAR, Bayesian Vector Autoregression, and the data is on monthly basis.

They found that the commodity trade has a strong impact on the BDI and BDTI, but the Clean

Tanker Index is not so much impacted, and the conclusion is that they can operate on a more

diverse market. They also found a negative correlation between brent oil and the two tanker

indices, and explained that with the use of tankers as floating storage. 14

13Tsioumas et al. 201714Michail and Melas, 2020

8

1.7 Expectations

During the start of this thesis project, a interview was conducted in an informal capacity with

a person working in the shipping business on a chartering company in Sweden. The 30 years of

work experience from the person working at the chartering company said that the models should

be hold simple, i.e. he meant that if the models hold to many time series of arbitrary data, they

would not perform well in forecasting. The limitation of the set of underlying time series should

be to for example oil price, fleet growth and perhaps production/import data according to his

suggestions.

9

2 Theoretical Framework

This section aims to describe the most important mathematical theories that are needed to

understand and implement the models which are used. The main overall theoretical framework

in this section is based on the work of H. Lutkepohl in New Introduction to Multiple Time Series

Analysis. Some various other sources are used when complementary theory is needed.

2.1 Multiple Time Series Analysis and Forecasting

In order to respond and to decide which policy to employ, decision makers at many structural

levels make use of predictions and forecasts of economic of financial variables. If observations of

a time series of interest are available, and that data from the past contains information about

the future, it is possible to use the data in order to forecast the future events. One of the most

common approach and data is to forecast the unemployment rate for a country or a region using

economic data at hand. This approach to forecast may formally be expressed with the following

formula

yT+h = f(yT , yT−1....), (1)

where f(·) is some suitable function or transform of the past observations yT , yT−1.... .

When handling economic variables, the univariate model is seldom enough, since it probably in

addition depends of past values of sometimes multiple other time series. If many time series are

related together, it is reasonable to use this additional information in order to forecast. A formal

expression for this can be

y1,T+h = f1(y1,T , y2,T , ...., yK,T , y1,T−1, y2,T−1....yK,T−1.....), (2)

by denoting the variables y1t, y2t.....yKt and the forecast y1,T+h. A set of time series ykt, k =

1, ...,K, t = 1, ..., T is often called Multiple time series, and y1,T+h the forecast of variable 1.

It is often interesting to investigate the dynamics and impact that different time series can have

between each others. One mathematical model which can be used to model a system of multiple

time series, and their mutual influence on each other, is VAR, Vector Autoregression.15

2.2 Vector Autoregression

The VAR modelling aims to assess the time series which depend on past behaviour of multiple

variables. In the univariate case above, we have that f(·) is a linear function which models the

forecast of the near future (i.e. h=1), the model becomes

yT+h = ν + α1yT + α2yT−1 + ....+ αpyT−p+1 + uT+1 (3)

where it is assumed that the model is using a finite value of past observations p. Further the value

of yT+h will not be the same as the real yT+h. That is why there is an error term, also known

as the forecast error, uT+1 = yT+h − yT+h. In general the assumption is that the observations

in the model are realizations of random variables, and that the same distribution for the data

generating process holds for each period T . With these assumptions, the univariate form of the

VAR is obtained, which is the autoregressive process, AR,

15Luthkepohl, 2005, p. 1

10

yt = ν + α1yt−1 + α2yt−2 + ....+ αpyt−p + ut (4)

where yi and u are random variables for all t. It is also assumed that ut, the forecast errors, are

uncorrelated for different periods T . From this, the general extension into multiple time series is

yt = ν +A1yt−1 +A2yt−2 + ....+Apyt−p + ut (5)

where yi’s are regarded as random vectors on the form yt := (y1,t, ..., yK,t)′ and ν = (ν1, ....νK)

and

A :=

a11,i . . . a1K,i

.... . .

...

aK1,i . . . aKK,i

. (6)

In the model, it is assumed that ut := (u1,t, ..., uK,t)′ is a vector of independent and identically

distributed variables with zero mean.16

2.3 Stable VAR(p) Process

From above it has been established that the general VAR(p), i.e. the VAR process with order

p,is given by the following

yt = ν +A1yt−1 +A2yt−2 + ....+Apyt−p + ut, t = 0,±1,±2, ...., (7)

in which yt is a (K×1) random vector, the Ai:s are deterministic (K×K) coefficient matrices and

ν is a fixed (K×1) vector of model intercept allowing for non-zero mean E(yt) of the model. ut is

a K-dimensional white noise process with the following properties: E(ut) = 0, E(utu′t) = Σu and

E(utu′s) = 0 ∀s 6= t. The covariance matrix Σu is assumed to be non-singular if not otherwise

stated.

In order to get an intuitive sense about this model, the VAR(1) is considered

yt = ν +A1yt−1 + ut. (8)

If the data generation of this process starts at a given time, t = 1, the following system arises

y1 = ν +A1y0 + u1

y2 = ν +A1y1 + u2 = ν +A1(ν +A1y0 + u1) + u2 =

= (IK +A1)ν +A21y0 +A1u1 + u2

...

yt = (IK +A1 + ...+At−11 )ν +At1y0 +

t−1∑i=0

Ai1ut−1

...

(9)


11

From this, it is clear that the vectors y1, ..., yt are all uniquely determined by y0, u1, ...., ut and

also the joint distribution of y1, ..., yt is determined by the joint distribution of y0, u1, ...., ut.

If all the eigenvalues of A1 have a modulus of < 1, the sequence Ai1 will be absolutely summable,

i.e. the infinite sum∑∞i=0A

i1ut−1 exists in mean square. The proof for this can be found in

the reference material of this section but is here omitted. From this fact, the VAR(1)-process

becomes the well-defined stochastic process

yt = µ+

∞∑i=0

Ai1ut−1. (10)

are

µ = (IK −A1)−1ν. (11)

The distributions and joint distributions of the yt are uniquely determined by the distributions

of the ut process. The first and second moments are known to be

E(yt) = µ ∀t ∈ R (12)

and

Γy(h) = E[(yt − µ)(yt−h − µ)′] =∞∑i=0

Ah+i1′ΣuA

i1′. (13)

since E(utu′s) = 0 ∀s 6= t and E(utu

′t) = Σu ∀t ∈ R.

A VAR-process is deemed stable if all eigenvalues of A1 have modulus of < 1 or expressed

mathematically as the following condition

det(Ik −A1z) 6= 0 ∀|z| ≤ 1. (14)

These concepts above can easily be extended to the general VAR(p) process, since for any p > 1 a

VAR(p)-process can be written on the VAR(1)-form. If yt is a VAR(p)-process as defined above,

a corresponding Kp-dimensional VAR(1)-process can be defined as

Yt = ν + AYt−1 + Ut (15)

where

Yt :=

yt

yt−1

...

yt−p+1

, ν =

ν

0...

0

,

A =

A1 A2 · · · Ap−1 Ap

IK 0 · · · 0 0

0 IK 0 0...

. . ....

...

0 0 · · · IK 0

, Ut :=

ut

0...

0

.(16)

12

Following the same structure from earlier in this section, Yt is stable if

det(IKp−Az) 6= 0 ∀|z| ≤ 1. (17)

The first and second moments for the model are given by

µ := (IKp −A)−1ν (18)

and

ΓY (h) =

∞∑i=0

Ah+i′ΣUAi′. (19)

where ΣU := E(UtU′t). Using the K × Kp matrix

J :=[IK : 0 · · · : 0

], (20)

the process yt is obtaind as yt = JYt. Since Yt is a well-defined stochastic process, the same

holds for yt. The mean is E(yt) = Jν, which is constant for all t, and the autocovariance matrix

is given by Γy(h) = JΓY (h)J ′, which is also independent of t. yt is defined as stable if

det(IKp−Az) = det(IK −A1z −A2z

2 − .....−Apzp) 6= 0 ∀|z| ≤ 1, (21)

which is also known as the stability condition. To summarize, a VAR(p)-process is stable if (21)

holds, and

yt = JYt = Jµ + J

∞∑i=0

AiUt−1. (22)

By the definitions stated above, it can be easily seen that the process yt is determined by its

white noise.17

2.4 Stationary Model

A stochastic process is called stationary when its first and second moments are assessed as time

invariant. I.e. a stochastic process, yt is stationary if

E(yt) = µ ∀t ∈ R (23)

and

E[(yt − µ)(yt−h − µ)′] = Γ(h) = Γ(−h)′, ∀t ∈ R, h = 0, 1, 2, ... (24)

The first condition means that yt has a mean vector that is constant for all t. In the same

way, the autocovariance matrix is also independent of t and only depends on the time difference

step, h. If not stated, all quantities are assumed to be finite. It follows from above that a

stable VAR(p)-process also is stationary, i.e. stability implies stationarity, but not necessarily

the other way around. Stable VAR-processes has under usual assumptions time invariant means

and variances and covariances.18

If one of the underlying time series in the VAR(p)-process is deemed non-stationary (i.e. with

I(1) or a trend) by some of the tests that exists, the solution is often to make the process trend-

stationary. This is done by taking the first order differences. First order difference means to

17Luthkepohl, 2005, p. 2418Luthkepohl, 2005, p. 13

13

create a new time series by the following formula y′t = yt−yt−1. The problem is that it is advised

to chose to either estimate a VAR(p)-process in levels (without any differencing at all), or to

estimate it in first differences (all underlying time series are differentiated one time), it is seldom

advised to to have both leveled and differentiated time series in the same VAR-process, only if

there is no co-integration. Here Luthkepohl advises to not differentiate the non-stationary time

series in the VAR-process, since their is a risk that one throws away essential information. If the

VAR-process in levels is stable, then by assumption it is also stationary and further analysis can

be done.19

2.5 Autoregressive Integrated Moving Average-Model

In time series analysis there arises a need to model not only stationary time series, but non-

stationary time series as well. A general model which can handle a big variations of non-stationary

processes is the ARIMA-process, i.e processes that are reduced to ARMA-processes when differ-

entiated finitely many times.

By definition, if d is a non-negative integer, then Xt is an ARIMA(p,d,q) process if :

Yt := (1−B)dXt (25)

is a causal ARMA(p,q) process. This gives that the process Xt satisfies the difference equation

φ∗(B)Xt ≡ φ(B)(1−B)dXt = θ(B)Zt, Zt ∼WN(0, σ2) (26)

where φ(z) and θ(z) are polynomials of degree p and q, and that φ(z) 6= 0 for | z |≤ 1. 20

2.6 Autocovariance and Autocorrelation

As shown above, one of the requirements for a time series to be deemed stationary is that the

value of the time series at different times only depend on the time difference between two points.

In higher order VAR(p) this is illustrated by

yt − µ = A1(yt−1 − µ) + ...+Ap(yt−p − µ) + ut (27)

and the autocovariance function is denoted by Γy(h). For h = 0,

Γy(0) = A1Γy(1)′ + ...+Ap(Γy(p) + Σu (28)

and with h > 0

Γy(h) = A1Γy(h− 1)′ + ...+Ap(Γy(h− p) + Σu. (29)

These equation are know as the Yule-Walker equations, an can be used to calculate Γ(h).21

Because of the unit dependence in the Autocovariance function, it is often more convenient

to use the Autocorrelation function instead when modelling, since it is scale invariant of the

linear dependencies in a model. The Autocorrelation function is given by

Ry(h) = D−1Γy(h)D−1 (30)

19Luthkepohl, 2011, p. 520Brockwell et al, 2002, p. 18021Luthkepohl, 2005, p. 26

14

where D is the matrix with the standard deviations of the components of yt on the main diagonal,

i.e. the main diagonal elements are the square roots of the diagonal elements of Γy(0).22

2.7 Estimations of a VAR(p)-process

In order to estimate the parameters within a VAR(p)-process, the least squares estimate β, which

minimizes the error, needs to be calculated.

Defnintions:Y := (y1.....yT ) (K × T ),

B := (ν,A1, ....., Ap) (K × (Kp+ 1)),

Zt :=

1

yt...

yt−p+1

(Kp+ 1),

Z := (Z0, ...., ZT−1) ((Kp+ 1)× T ),

U := (u1, ....., uT ) (K × T ),

y := vec(Y ) (KT × 1),

β := vec(B) ((K2p+K)× 1),

b := vec(B′) ((K2p+K)× 1),

u := vec(U) (KT × 1),

(31)

where vec is the vector stacking operator. When using the notation above, and t = 1, 2, ...., T ,

the VAR-model can be written in this compact way

Y = BZ + U. (32)

From this the normal equation of the least squares follows as

(ZZ ′ ⊗ Σ−1u )β = (Z ⊗ Σ−1

u )y. (33)

Solving the above stated equation gives the least squares estimator in the following way

β = ((ZZ ′)−1Z ⊗ IK)y (34)

where ⊗ is the Kronecker operator. 23

2.8 Order Selection of VAR(p)

When modelling a set of multiple time series, the problem of selecting the optimal lags, p, arises.

In theory there is not just one right answer to this problem, because if yt is a VAR(p)-process,

it is also formally a VAR(p+1)-process. This holds for the previous assumption that coefficient

matrices which are zero, are not excluded from the model. Henceforth, it is now defined that a

VAR(p)-process, is a process for which all Ap 6= 0 and Ai = 0, i > p. I.e. p is the smallest order

possible for the model.24

When forecasting is the main goal of the VAR(p)-process, it is reasonable to choose order in

22Luthkepohl, 2005, p. 3023Luthkepohl, 2005, p. 6924Luthkepohl, 2005, p. 136

15

a way that reduces the Mean Square Error (MSE). Akaike(1969,1971) suggested to base the

VAR-order selection on a one-step ahead forecast MSE, given by

Σy(1) =T +Km+ 1

TΣu (35)

where m is the lag order for the selected VAR-process, T is the sample size, and K is the

number of time series in the VAR-process. From this it is easy to see that a VAR(p)-process

should be chosen over a VAR(p+i)-process, since the forecast MSE is an increasing function

of the order of the fitted model. In practice, the lag order is selected from different criterion.

The most common among these criterion are the Akaike’s information criterion(AIC), Final

prediction error(FPE), Bayesian/Schwarz information criterion (BIC/SC) and the Hannan-Quinn

information criterion(HQ). Of these, AIC and the FPE are more suited to use when having

samples of up to 60 observations and/or when the main focus for the estimated model is to

forecast, while SC and HQ are more suited when having samples of more than 120 observations

and/or estimating the with the notion of consistency of the model. Infact, Shibata(1980) showed

that also for univariate models the FPE and AIC minimizes the one-step-ahead forecast MSE

asymptotically. 25

2.9 Normality of Residuals

When estimating a VAR(p)-process, the need of normality arises in the underlying data gen-

eration. When estimating the normality of a model, one often investigate the normality of

the model residuals. This is important when performing the impulse response analysis on the

VAR(p)-model.

A stationary VAR(p)-process is Gaussian distributed if and only if the residuals white noise

processes ut are Gaussian. There are different tests to investigate the normality of the residu-

als, one is to investigate the third and fourth moments (i.e. Skewness and Kurtosis) as in the

Jarque-Bera-test.

If the assumption of Gaussian distributed White-Noise residuals cannot be made, different boot-

strap method can be used in order to create new sets of residuals from the data, by drawing

repeatedly from the estimated residuals.26

2.10 Expectation-Maximization Algorithm

When only a subset of the data of interest is available, the Expectation-Maximization(EM) algo-

rithm can be used as an iterative procedure to compute the maximum likelihood estimator. The

EM-algorithm is one of the methods used for imputation of data sets with missing observations.

Let (X1, X2, .......Xn be the sample data set, where observations are missing, with the conditional

density fX|Θ(x|θ) given Θ = θ. From this the log-likelihood function is written below

l(θ;X) = logfX|Θ(X|θ). (36)

Since data is missing, it is assumed that the data X consists of the subsets Y = (Y1, Y2, ...Yk)

which are the observed observations, and Z = (Z1, ...., Zn−k) which are the missing (or latent)

observations of the data set. With X = (Y,Z) the log likelihood for the observed Y is given by

the following

25Luthkepohl, 2005, p. 15126Luthkepohl, 2005, p. 17

16

lobs(θ;Y ) = log

∫fX|Θ(Y, z|θ)νz(dz). (37)

A problem can arise when calculating the integral in order to maximize the log-likelihood, since

it might not be possible to find a closed form of the lobs(θ;Y ). In order to maximize the lobs(θ;Y )

with respect to θ an iterative process with two steps is done, the E-step and the M-step, where

θ(i) is the estimate of Θ and after the ith step. Then the two steps in the (i+ 1)th iteration are

E-step: Compute Q(θ|θ(i)) = Eθ(i))[l(θ;X)|Y ].

M-step: Maximize Q(θ|θ(i)) with respect to θ and put θ(i+1) = argmax Q(θ|θ(i)).

This procedure is repeated until the model convergence on the integral is achieved.27

2.11 Impulse Response Analysis

When applying a VAR(p)-process on a data set of multiple time series, it is often of interest to

see how one variable reacts to an impulse in another variable of interest within a model with a

number of variables, e.g macro-economical or financial variables.

Consider the problem when isolating the shock impact of one variable on another variable, at

a time t = 0. It is supposed that all the variables hold their mean value prior to time t = 0,

yt = µ. Now at time t = 0 one variable s is increased by one unit, to u1,0 = 1. It is now possible

to trace what happens with the system in the periods t = 1, 2...., n if it is assumed that no other

shocks occur, i.e. u2,0 = 0, u3,0 = 0 and so on. In such an event, the mean of the system is of no

interest, but the variations of the variables around their means are, the means of the variables

are set to zero, and ν = 0.

The first column of Ai1 are the yi = (yk,i)′. The elements of Ai1 represent the effect of a

unit impulse in the variables of the model after i periods from t = 0. Because the variables

in a model often have differing scales it may be extended to shocks of different magnitudes by

rescaling the impulse responses. For instances, instead of analysing an unexpected unit shock in

some variable, a shock of a standard deviation in magnitude is used instead. This gives a clearer

visualization of the impulses dynamic and relationship between the variables in the model. In

the impulse response graph, their is often a calculated confidence-interval which is calculated

from the normality of the VAR-model. If the VAR-model is non-normal, the confidence intervals

are bootstrap several times.28

2.12 Forecasting

From earlier theory we know that the optimal h-step forecast is given by

yt(h) = ν +A1yt(h− 1) + ....+Apyt(h− p) (38)

where yt(j) = yt+j , i.e. the timestep as a function of j. If in this equation, the true coefficients

B = (ν,A1, ...., Ap are replaced with their estimates B = (ν, A1, ...., Ap, we get the estimated

forecast

yt(h) = ν + A1yt(h− 1) + ....+ Apyt(h− p) (39)

27Hult, 2019, p. 9028Luthkepohl, 2005, p. 53

17

where yt(j) = yt+j , and thus, the forecast error can be calculated as

yt+h − yt(h) = [yt+h − yt(h)] + [yt(h)− yt(h)] =

h−1∑i=1

Φiut+h−i + [yt(h)− yt(h)](40)

where Φi are the coefficient matrices of the canonical MA-innovations representation of yt, which

for shortness sake is omitted. Under general conditions of the VAR(p)-process forecast, it can

be assumed that the mean of the forecast is zero, i.e. E[yt+h − yt(h)] = 0. From this it is

understood that the forecast is unbiased even with the coefficient estimates. All the us and yt

are uncorrelated if the estimations are done up to a period t, and from this we can calculate the

MSE matrix, Mean Square Error, for yt(h) as the following

Σy(h) := MSE[yt(h)] = E[[yt+h − yt(h)][yt+h − yt(h)]′]

= Σy(h) +MSE[yt(h)− yt(h)](41)

in which

Σy(h) =

h−1∑i=0

ΦiΣuΦi′. (42)

From these equations a result which minimizes the forecast MSE is calculated, in order to retrieve

the best predictor in conformity with earlier results and requirements on the VAR(p)-process.29

2.13 Forecast Error Variance Decomposition

If the innovations which drive the VAR-process could be assessed, another tool for examining

the model is available. The MA(Moving Average)-representation of the process can be written

as

yt = µ+

∞∑i=0

ΦiPP−1ut−1 = µ+

h−1∑i=1

Θiwi (43)

where Θi = ΦiP and wi = P−1ut−1. In this the Σu = PP ′, where P is a lower triangular

non singular matrix with positive diagonal elements. In equation (43) the components of wt =

(w1t, ..., wKt) are uncorrelated white noise with have unit variance, and the covariance matrix

Σw = P−1ΣuP−1′ = IK , where IK is the identity matrix with (K ×K)-elements.

Giving the denotation θmn,i for the mn-th element of the Θi-matrix the h-step forecast error of

the j-th component of yt is

yj,t+h − yj,t(h) =

K∑k=1

(θjk,0wk,t+h + ...+ θjk,h−1wk,t+1). (44)

From equation 44 we get that the forecast error of the j-th component could consist of all

innovations wt = (w1t, ..., wKt). Although obviously some of the components mn,i could be = 0,

so that some of the components disappears. Since the components of wt are uncorrelated, the

MSE of yj,t(h) becomes

E[yj,t+h − yj,t(h)]2 =

K∑k=1

(θ2jk,0 + ...+ θ2

jk,h−1). (45)


18

From this

θ2jk,0 + ...+ θ2

jk,h−1 =

h−1∑i=0

(e′jΘiek)2 (46)

which is often interpreted as contribution of innovations in variable k to the MSE of the h-step

forecast of variable j. Here ei is the i-th column of IK . Dividing equation 46 by the MSE[yj,t(h)]

gives

ωjk,h =

∑h−1i=0 (e′jΘiek)2

MSE[yj,t(h)]. (47)

The last equation gives the proportion of the h-step forecast error variance of variable j through

the innovations wkt. If wkt can be related to variable k, wjk,h is the representation of the

properties of the h-step forecast error variance, which is accounted for by innovations in variable

k. Thus, the forecast error variance is decomposed into parts which are accounted for by the

innovations in the different variables of the model. The h-step forecast MSE-matrix is given by

Σy(h) = MSE[yt(h)] =

h−1∑i=0

ΘiΘi′ =

h−1∑i=0

ΦiΣuΦi′. (48)

in which the diagonal elements are the MSE:s of the yjt variables which is shown before.30


19

3 Data

For this thesis, two sets of time series are made, one for the Tanker-Oil-model, and one set for the

Bulk-Iron-model. The goal in the data search is to find as equal and similar explanatory market

data for each model as possible. But, as the reader may note, there is not always a perfect data

counterpart available between the different parts of the industry or markets.

All data series are measured between 05-2009 and 06-2020, striving to have weekly measurements.

The forecasts are then made between 06-2020 and 11-2020, 26 weeks forward. The reason why

weekly data is chosen over monthly or quarterly is that the big movements(i.e. volatility) in oil

and freight markets that occurred during the financial crisis and the covid-19-pandemic would

not have been so well captured with lower time resolution. In the cases where only monthly

measurements are available, statistical imputations are performed in order to increase the fre-

quency of the data series. As many relevant data series as possible are sought in order to create

a VAR(p)-model that is representative for the underlying market. The final number of observa-

tions used to create the models is n = 595, which is divided into n = 569 for model creation,

and n = 26 for validation/forecast comparison of the indices. The data is retrieved from several

different sources, the main part from Baltic Exchange and Clarkson’s, and some open source

financial data.

3.1 Time Series in Oil-Tanker-Rates-Model

The following nine time series are used in the VAR(p)-process that is created in order to forecast

and analyse the Tanker-Oil-market.

Baltic-Dirty-Tanker-Index(BDTI)

The main variable of the Oil-Tanker-model. Compiled index of the most common trade routes

for dirty tankers. Published every day by the Baltic Exchange as a market benchmark.31

Brent Oil Price

Spot price of Brent Oil in US-dollars per barrel, the most commonly traded oil contract. Priced

continuously during market hours, data extracted on weekly basis from EIA (Energy Information

Administration).32

Tanker Earnings

Average of shipping companies tanker earnings compiled over different sizes of tankers. Measured

in dollars per day, weekly data is extracted from the Clarkson register. 33

Dollar Index

Index or measure of the US dollars strength compared to a basket of foreign currencies. A high

index indicates a strength in US dollars, and a low index indicates weakness. Priced continuously

during market hours, data extracted on weekly basis from Investing.com34

31Baltic Exchange, 202032EIA, 202033Clarksons, 202034Investing.com, 2020

20

Libor

London Interbank Offered Rate, is the 3-month daily updated investment rate. Libor is the most

standard investment rate and is used as a global benchmark for short term investment rates. The

time series consist of the logarithmic returns of the LIBOR-rate. 35

Second Hand Tanker Price

Weekly assessments of the price of a five year old vessel measured in US-dollar per tonne. In this

case the Tanker types that are considered are VLCC- and Aframax-size, extracted from Baltic

exchange.36

Singapore Tanker Arrivals

Time series over the monthly tonnage of oil-tanker arrivals, only vessels that are more than 75

gross tons are included. Measured in thousands tons. compiled by the ”Maritime and Port

Authority of Singapore”.37

US Oil Imports

Assessment of the total weekly imports of crude oil to the United States. Measured in thousand

barrels and compiled by the EIA (Energy Information Administration).38

Fleet Growth

Relative change of the growth of the total world fleet from month to month. Data collected from

the Clarkson’s register.39

Time Series Min Max Mean Standard DeviationBDTI 431.0 3194.0 1020.3 440.863

Brent Oil Price 14.24 141.07 76.05 27.10494Tanker Earnings 5752 98094 18137 11659.07

Dollarindex 72.10 103.01 87.89 8.499Libor -0.458 0.478 -0.002 0.050

Second Hand Tanker Price 51.93 162.90 73.12 18.77Singapore Tanker 35418 74675 50574 8051

US imports 4900 11153 7919 970Fleet Growth -0.013 0.012 -0.003 0.003

Table 1: Table of summary statistics for the time series in the Tanker-Oil-model.

3.2 Time Series Bulk-Iron-Rates-Model

The following time series are used in the VAR(p)-process that is created in order to forecast and

analyse the Bulk-Iron-market.

35macrotrends.net, 202036Baltic Exchange, 202037MPA Singapore, 202038EIA, 202039Clarksons, 2020

21

Baltic-Dry-Index(BDI)

The main variable of the Bulk-Iron-model. The Baltic-Dry-Index is a composite index of the

weighted dry bulk timecharter averages of the Capesize (40%), Panamax (30%) and Supramax

(30%) indices. Published every day by the Baltic Exchange as a market benchmark.40

Iron Ore Price

Iron ore prices used in this thesis are Iron Ore Fine China Import 62 percent grade Spot Cost.

This contract is delivered in Tianjin China and the ore is used to make steel for infrastructure

and other construction projects. Priced continuously during market hours, data extracted on

weekly basis from Investing.com 41

Bulk Earnings

Average of bulk earnings compiled over different sizes of bulk. Measured in dollars per day,

weekly data is extracted from the Clarkson’s register. 42

Dollar Index

Index or measure of the dollars strength compared to a basket of foreign currencies. Priced

continuously during market hours, data extracted on weekly basis from Investing.com43

Libor

London Interbank Offered Rate, is the 3-month daily updated investment rate. Libor is the most

standard investment rate and is used as a global benchmark for short term investment rates. 44

Second Hand Bulk Price

Weekly assessments of the price of a five year old vessel measured in US-dollar per tonne. In this

case the Bulk types that are considered are Cape-size and Panama-size, extracted from Baltic

exchange.45

Singapore Dry-Bulk Arrivals

Time series over the monthly tonnage of bulk-freighter arrivals, only vessels that are more than

75 gross tonnes are included. Measured in thousands tonnes. Compiled by the ”Maritime and

Port Authority of Singapore”.46

Australian Metal Bulk Export

Aggregated data from Australia considering the value in Australian Dollars of ”Metalliferous

ores and metal scrap” that is exported every month. Data is collected by the Australian Bureau

of Statistics and published every month on their webpage.47

40Baltic Exchange, 202041Investing.com, 202042Clarkson’s, 202043Investing.com, 202044macrotrends.net, 202045Baltic Exchange, 202046MPA Singapore, 202047ABS, 2020

22

Fleet Growth

Relative change of the growth of the total world fleet from month to month. Data collected from

the Clarkson’s register.48

Time Series Min Max Mean Standard DeviationBDI 290.0 9379.0 1490.9 798.2

Iron Ore Price 38.28 188.25 100.14 37.4Bulk Earnings 3504 57804 10634 4819.6Dollar index 72.10 103.01 87.89 8.499

Libor -0.458 0.478 -0.002 0.050Second Hand Bulk Price 16.11 50.89 30.38 8.38

Singapore Dry Bulk 37888 78651 58640 8799Australian Export 3003 12904 7147 1739

Fleet Growth -0.013 0.012 -0.003 0.003

Table 2: Table of summary statistics for the time series in the Bulk-Iron-model.

3.3 Imputations and EM-algorithm

In order to elevate the time frequency to weekly in the time series which are only possible to find

in monthly frequency, the Expectation-Maximization-algorithm is applied. This specific method

assumes that the data which is imputed is asymptotically normally distributed, and therefore all

the time series which are imputed via this EM-algorithm are first investigated via the QQ-plot.

Results from this can be found in Appendix B. The following time series are imputed using the

EM-algorithm,

• Australian Ore Export

• Fleet Growth

• Singapore Tanker Arrivals

• Singapore Oil-Bulk Throughput

with successful normal EM-convergence for all of them. All time series are imputed 5 times, and

then the average of all 5 imputations are taken in order to make a new time series with weekly

time-frequency. The graphs over the fitted distributions via the EM-algorithm can be seen in

the Appendix B of this thesis.

3.4 Stationarity

Following the mathematical framework when establishing a VAR(p)-model, the necessity for a

stationary VAR(p)-process is given. In order to investigate all the underlying time series in the

VAR(p)-processes, a test for stationarity is performed on them one by one. The test that is used

is the Phillips-Perron test, which does not require a selected lag before testing for unit roots.

For the following time series the null-hypothesis, i.e. that the time series has a unit-root, could

not be rejected on p ≤ 0.05 significance level

• Dollar index

48Clarksons, 2020

23

• Brent Oil

• Iron Ore

• Second Hand Price Tankers

• Second Hand Price Bulk.

As before mentioned, the VAR(p)-processes stationarity is not completely dependent on the sta-

tionarity of each of the underlying time series but on the systems stability. No transforms is

exercised on the data and the plots of the original time series and their residuals can be found

in the Appendix A of this thesis.

24

4 Method

The time series that are part of the two VAR(p)-models that are estimated are chosen because

of the sought connection between them. In the light of this, the VAR(p)-models are completely

unbiased in that regard, and simply takes into account the observations of the input data.

The two ARIMA-models which are also estimated, are purely estimated from the two respective

freight rate indices, and therefore also independent and completely unbiased.

4.1 R

The modelling and mathematical analysis of the data is performed in the statistical compu-

tational software R, specifically R version 4.0.1. Specific important packages that are used to

perform the analysis are the following

• VAR Estimation package: vars49

Package for estimations of VAR-models, diagnostics of models, the impulse response func-

tion and forecasting.

• EM-algorithm package: Amelia50

Package for multiple imputations via the Expectation-Maximization algorithm.

4.2 Lag-Selection

In order to select the lag order for the models the different criterion are applied after binding

together the time series. It is important to weigh between model adequacy and not choosing to

many lags. Akaike’s information criterion(AIC), Finalprediction error(FPE), Bayesian/Schwarz

information criterion (BIC/SC) and the Hannan-Quinn information criterion(HQ) are all anal-

ysed for both models. As written before, for VAR(p)-processes which are estimated in order to

mainly do forecasts, minimizes the forecast MSE when selecting lags given by Akaikie’s Infor-

mation Criterion and Forecast Prediction Error, and thus these criterion will be dimensioning in

this case.

4.3 Model Diagnostics

When the lag order is selected for each model respectively, a number of diagnostic measures

are applied to the model, in order to check if the theoretical assumptions holds. The following

sections will give brief descriptions of them.

4.3.1 Serial Autocorrelation

A test is performed in order to investigate whether the models suffers from serial autocorrelations.

The test computes the multivariate Portmanteau-test to check the models serially correlated

errors. The null hypothesis is that there is no serial autocorrelation in the model, and the

significance level is p = 0.05. Since serial autocorrelation is undesirable one wants to confirm the

null hypothesis.

49Pfaff et al, 201850Honaker et al. 2019

25

4.3.2 Heteroskedasticity

Heteroskedasticity or ARCH(Autoregressive conditional heteroskedasticity)-effects in the models

are undesirable, since it means that different observations have different variances over time. In

order to test whether the models suffer from this, an ARCH-LM(Lagrange Multiplier)-test is

performed. The null hypothesis is that the model does not suffer from heteroskedasticity, on the

significance level p = 0.05.

4.3.3 Test of Normal Residuals

For the impulse response analysis the residuals of the fitted models needs to be normally dis-

tributed, otherwise the need of bootstrapping the confidence intervals for the impulse response

arises. In order to test the normality, the Jarque-Bera test is applied to the residuals of the

estimated VAR-model and the null hypothesis is that the residuals are non-normally distributed

with significance level p = 0.05.

4.3.4 OLS-Estimates

In order to check the stability of the VAR(p)-models, it is checked if the eigenvalues of the

Ordinary-Least-Squares estimates have a modulus of < 1. This is investigated graphically. This

step is important since it decides whether the theoretic assumptions about stability(and station-

arity) are feasible for these models.

4.4 Impulse Responses

The impulse responses is calculated from giving the underlying time series an impulse of one

standard deviation and then plotting the response in the variable of interest, i.e. the freight

rates. The response is the magnitude of the deviation from the mean of the freight rate. The

confidence intervals are bootstrapped and the number of bootstrap runs is set to n = 200, and

the confidence intervals are set to 95%. The number of lags for which the impulse response is

analysed is set to h = 26 in order to correspond to half a year in real time.

4.5 ARIMA-Model Estimation

The univariate time series of the BDTI and the BDI are estimated and fitted with an ARIMA-

model. In order to find the magnitudes of the lags, integrations and any seasonality, (p, d, q)(P,D,Q)m,

a grid search with all possible configurations is performed with the Akaike information criterion

(AIC) as the dimensioning measure in order to retrieve the best estimations.

4.6 Forecasts

Both the estimated VAR-models and the ARIMA-models are used to create forecasts of the

freight rate indices BDTI and BDI. Forecasts are performed with both a mean forecast, and the

95% confidence intervals. The number of lags on which the forecasts are performed is h = 26,

which in weeks represent half a year. The forecasts are done from June 2020 forward to the last

week of November 2020. The forecast means are then compared with the real data for the same

period, and plotted. The RMSE (Root Mean Squared Error) and MAE(Mean Absolute Error)

are calculated in order to assess the goodness of fit on the real data, and to see if the VAR-model

or ARIMA-model for the respective indices produces the best forecast.

26


In order to investigate the contribution of each time series to the VAR-models, a Forecast error

Variance Decomposition(FEVD) of each VAR-model is made. These two decompositions are

made to see how much each of the underlying time series effects the forecasts of the BDTI and

the BDI. The FEVD:s are made for h = 12 lags forward. The results are presented as how many

percent of the forecast variance each time series contribute with at the different lags.

27

5 Results

5.1 Final Models

Here follows the results for the final models and their diagnostics.

5.1.1 Final VAR(p)-models

After performing the lag-selection analysis, the following results in the table are given for the

two models respectively.

Model AIC HQ BIC FPEOil-Tanker-rates (BDTI) 7 2 2 7

Iron-Bulk-rates (BDI) 5 2 2 5

Table 3: Table of results and suggested lags from performed lag-selection for each model withfour different criterion.

From this it is decided that the Oil-Tanker-rates model is given the lag p = 7, i.e., VAR(7) follow-

ing the Akaike Information Criterion and the Forecast Prediction Error, and the Iron-Bulk-rates

model is given the lag p = 5, i.e. VAR(5) with the same reasoning.

After deciding the respective lag for each model, the diagnostics results are assessed, and the

results from those are presented in the table below.

Model Serial Autocorrelation ARCH-test NormalityOil-Tanker-rates p = 0.4029 p = 1 p < 2.2 · 10−16

Iron-Bulk-rates p = 0.4258 p = 1 p < 2.2 · 10−16

Table 4: Table of results from performed diagnostic tests on the two models.

From the diagnostics tests it is clearly seen that neither model suffers from serial auto-correlation

or any ARCH-effects, and further analysis with given theoretical assumptions could be done with-

out violation. As can also be seen, neither model has normally distributed residuals, and therefore

the confidence intervals of the impulse-response-functions are bootstrapped several times.

The stability test rendered that all OLS-estimates eigenvalues have a modulus of < 1, which can

be seen in the graphs in Appendix C.

5.1.2 Final ARIMA-models

From the performed grid search with Akaike Information Criterion as the dimensioning measure,

the following orders for the ARIMA-models are estimated

Time Series p d q P D Q mTanker Index(BDTI) 1 0 1 0 0 1 52Dry-Bulk Index(BDI) 1 0 3 2 0 0 52

Table 5: Table of results from the AIC order selection for the SARIMA-models of the freightrates.

The models retrieved from the ARIMA-estimations are both SARIMA:s with a seasonal

component of m = 52. The table of coefficients are

28

Time Series φ1 θ1 θ2 θ3 Φ1 Φ2 Θ1 µTanker (BDTI) SARIMA 0.8878 0.1066 - - - - 0.1167 772.8054

Bulk (BDI) SARIMA 0.9605 0.3535 0.1790 0.0153 0.1066 0.0341 - 1362.3372

Table 6: Table of coefficients of the estimated SARIMA-models.

5.2 Impulse responses

5.2.1 Oil-Tanker-Model

Each underlying time series in the models is given an impulse of magnitude one standard de-

viation, and the impact on the freight rate is estimated. The plots can be seen below for the

Oil-Tanker-Model.

Figure 3: Plots of the impulse responses of Baltic-Dirty-Tanker-Index given shocks of one stan-

dard deviation in all the underlying time series of the model. The red dashed lines are the 95%

confidence intervals of the estimated shocks.

29

5.2.2 Iron-Bulk-Model

Figure 4: Plots of the impulse responses of Baltic-Dry-Index given shocks of one standard devia-

tion in all the underlying time series of the model. The red dashed lines are the 95% confidence

intervals of the estimated shocks.

30

5.3 Model Forecasts

The graphical results of the forecasts of the VAR- and ARIMA-processes are shown below.

Figure 5: VAR(7)-forecast of the BDTI, h = 26 lags forward, with confidence intervals at 95%.The real data for the period 2009-05 to 2020-06 in black and the forecast in blue.

Figure 6: The ARIMA-forecast and real data of the BDTI. The forecast is h = 26 lags forward,with confidence intervals of 95%

31

Figure 7: VAR(5)-forecast of the BDI, h = 26 lags forward, with confidence intervals at 95%.The real data for the period 2009-05 to 2020-06 in black and the forecast in blue.

Figure 8: The ARIMA-forecast and real data of the BDI. The forecast is h = 26 lags forward,with confidence intervals of 95%

32

5.3.1 Forecast Comparisons

In this section, the ARIMA-models forecasts and the VAR-models forecasts are compared with

the real data for the period of June-2020 to November-2020, 26 weeks of data. The real data in

the graphs stretches the whole of 2020 to show the impact of the covid-19 on the Freight rates.

The forecast are from the first week of June 2020 and 26 weeks forward.

Figure 9: Actual BDTI-data compared to the two forecast models made with ARIMA and VARrespectively. The real data displayed is fore the whole of 2020.

Figure 10: Actual BDI-data compared to the two forecasts models made with ARIMA and VARrespectively. The real data displayed is fore the whole of 2020.

33

The Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) between the estimated forecasts and

the real data is calculated.

BDTI RMSE MAEVAR-Forecast 119.46 105.40

ARIMA-Forecas 288.74 268.03

BDI RMSE MAEVAR-Forecast 796.52 744.29

ARIMA-Forecast 566.19 467.81

Table 7: Table of RMSE and MAE for the estimated forecasts and real data.


The results from the Forecast Error Variance Decomposition follows in both graphical and tabular forms for

better understanding.

Figure 11: Forecast Error Variance Decompostion of the BDTI, h = 12 lags forward.

Lags BDTI Earnings Brent US Libor Dollar Second Growth Singaporeh = 1 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000h = 2 0.7258 0.2625 0.0080 0.0007 0.0011 0.0003 0.0002 0.0003 0.0011h = 3 0.6889 0.2702 0.0114 0.0067 0.0115 0.0032 0.0005 0.0069 0.0007h = 4 0.6362 0.2997 0.0117 0.0153 0.0194 0.0031 0.0006 0.0135 0.0006h = 5 0.5847 0.3101 0.0189 0.0242 0.0405 0.0069 0.0013 0.0127 0.0006h = 6 0.5489 0.2956 0.0269 0.0319 0.0712 0.0114 0.0026 0.0109 0.0006h = 7 0.5238 0.2974 0.0306 0.0434 0.0716 0.0135 0.0086 0.0103 0.0008h = 8 0.4915 0.3023 0.0312 0.0604 0.0690 0.0209 0.0137 0.0100 0.0009h = 9 0.4743 0.2889 0.0313 0.0786 0.0732 0.0243 0.0184 0.0099 0.0009h = 10 0.4598 0.2764 0.0312 0.0948 0.0770 0.0256 0.0242 0.0100 0.0010h = 11 0.4495 0.2648 0.0315 0.1088 0.0777 0.0280 0.0281 0.0105 0.0011h = 12 0.4418 0.2558 0.0313 0.1193 0.0780 0.0304 0.0311 0.0108 0.0014

Table 8: The specific results for the Forecast Error Variance Decomposition of the BDTI for all underlying timeseries.

34

The results from the Forecast Error Variance Decomposition follows in both graphical and tabular forms for

better understanding.

Figure 12: Forecast Error Variance Decomposition of the BDI, h = 12 lags forward.

Lags BDI Earnings Second Libor Iron Dollar Singapore Australia Growthh = 1 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000h = 2 0.8404 0.1509 0.0006 0.0038 0.0011 0.0000 0.0001 0.0008 0.0022h = 3 0.7345 0.2483 0.0011 0.0058 0.0033 0.0006 0.0022 0.0010 0.0031h = 4 0.6638 0.3070 0.0027 0.0083 0.0045 0.0013 0.0074 0.0018 0.0032h = 5 0.6156 0.3414 0.0024 0.0093 0.0048 0.0012 0.0181 0.0020 0.0050h = 6 0.5872 0.3535 0.0022 0.0084 0.0049 0.0012 0.0330 0.0028 0.0069h = 7 0.5695 0.3551 0.0020 0.0075 0.0052 0.0010 0.0464 0.0044 0.0089h = 8 0.5571 0.3506 0.0018 0.0073 0.0058 0.0010 0.0591 0.0067 0.0105h = 9 0.5466 0.3434 0.0017 0.0076 0.0069 0.0013 0.0702 0.0095 0.0128h = 10 0.5364 0.3344 0.0016 0.0082 0.0084 0.0021 0.0808 0.0122 0.0158h = 11 0.5268 0.3253 0.0016 0.0087 0.0101 0.0034 0.0898 0.0151 0.0193h = 12 0.5181 0.3166 0.0016 0.0091 0.0117 0.0051 0.0971 0.0179 0.0228

Table 9: The specific results for the Forecast Error Variance Decomposition of the BDI for all underlying timeseries.

35

6 Discussion

6.1 Analysis of Data

There is always a trade off between information and time resolution when handling time series.

It is only recently, with the big data trend, that there has been an increase in data collection,

i.e. increasing the collecting frequency. Given that much happened during short time intervals

both during the financial crisis and the covid-19-pandemic, the choice to perform the analysis

on weekly data is justified with the information gain. Since the results from the EM-algorithm

are satisfactory, and also the overall performance of the models, the data used is deemed good

enough. The choice to impute the data instead of interpolate is reasonable in the regard that

there is a higher chance of some noise in the data than a straight or polynomial line between the

monthly data points.

Another trade off is if the input time series in the models should be in levels or first differ-

ences, depending on if any of them has a high probability of non-stationarity. Either there is a

completely trend-stationary model but interesting data is thrown away, and vice versa. This is

debated intensively, and Lutkepohl is one of those proposing to analyse in levels as long as the

VAR-process is stable, which is the case with these models. VAR:s that are used in economics

to determine econometric long term relationship are also more sensitive since the interest often

lies in specific point estimates rather than general forecasts. The choice to perform this analysis

in levels, with some non-stationary time series is sound since the VAR-models that are created

are proven to be stable.

6.2 Analysis of Final Models

The final VAR(7) and VAR(5)-processes are deemed solid from a mathematical point of view.

The model diagnostics give stable results, and the models can be used for their purposes. Even

though the reader may notice that the proposed lags used for the model are a bit high, it must

be taken into consideration that these models are based on weeks, and not months which is more

common. Basing the lag-selection on the AIC and FPE is also sensible since this thesis main

task is to produce forecasts, albeit that the BIC and HC perhaps would give ”better” model

OLS-estimates. In the end the creation of the models is about getting reliable models for a given

problem, and not best estimates of coefficient matrices.

After producing the first forecasts, sub models could have been investigated in order to see

which data that contributed more, and which subset of time series that gave the best forecasts.

Guidance for this procedure could have been taken from the FEVD, perhaps by throwing out

the time series which had to low values. This is although not a solid method, but could be a

way to reduce the models and perhaps get better forecasts. Since there is not a smooth way to

perform a grid search of the time series contributions to the VAR-processes, this is not performed.

The ARIMA-models that are retrieved from the AIC grid search are deemed stable and solid for

the task of forecasting. The seasonality of one year(52 weeks) in both models is easily under-

stood, since much economic procedure is based on cyclic seasonality of our society. Both of them

lack any order of integration, echoing the tests of stationarity that showed that both their data

generating processes are at least trend stationary according to the Phillips-Perron test.

36

6.3 Analysis of Impulse Responses

Given the results from the Impulse Response Analysis, it can be said that all time series in this

model have an impact, more or less, on the freight rates. Given the extreme market volatility

that occurred during the covid-19 pandemic it is interesting to see the two market differences,

in the sense that shocks in similar underlying time series, or the three that both models have in

common, affect the models differently. Since their are 16 impulse responses, all will be briefly

commented, but all are interesting in their own right. In almost all om them the zero line in the

impulse response is contained via the confidence intervals, and given that, their can not be an

exclusion that the shock can be zero. In summary the answer to the research question is that it

is plausible to estimate market shocks impacts on the freight rates indices.

It is interesting to see that both oil price and iron ore price are not the impulse responses which

yields the shock with highest amplitude. They both peter out at around 20-25 lags. Although it

can clearly be seen even mathematically that the BDTI are higher for the first 10 lags, showing

probably the effects of the higher demand indicated by increase in the price of oil. While iron

ore price on the other hand jumps up and than decays, this could symbolise the higher demand

in iron ore, and therefore a higher demand to transport it. The Vale dam collapse that forced

negotiated trade routes to be changed since the ore is needed to be found elsewhere, and prices of

ore increasing is a typical example. The oil price increase does also create costs for the shipping

companies, since fuel oil is the main propellant for ships today. This should give a positive effect

on BDTI, since all shipping companies will have higher costs. It could be argued that the oil

price should also be included in the bulk model, to investigate the effects fuel costs have on the

Bulk-market.

The difference between the US oil import shock and the Australian ore export is interesting,

since it explains two significant market situations. When the US slowed down its own oil import,

the tanker market(and oil producers) found other routes to transport and sell the oil on, i.e. the

market recuperates faster since it is more adapted. What we can see is that a positive shock

in oil import gives a negative response in the freight rate, probably because of over supply in

the US, and that leads to lower need of imports the following period. This effect maxes out at

week 10, and then slowly retracts to zero. On the other hand, Australia is over 50% of the ore

exporting market of ore, and the shock in its export slowly lower the freight rate and then after

about 10 weeks hold the same level for the rest of the impulse response period. This could also

be explained with short term oversupply, and that iron ore is probably more slowly consumed

than crude oil. We can also look at the ratio between standard deviation and mean for both

US imports and Australian ore export, and it is clearly seen that the Australian ore exports has

bigger standard deviation, which could result in a higher probability of longer term oversupply

given a shock, resulting in lower demand for bulk freight the following time period. Both the US

oil independence and Vale dam collapse probably well estimated with impulse responses, since

US imports can vary depending on domestic production, and the import they still have saturates

the market rather quickly. The Vale dam collapse gave an over supply of bulk ships that where

chartered on the specific routes, decreasing the ore export from Brazil drastically and increasing

it from Australia in turn.

The reaction to shocks in financial data are as expected. A shock in Libor increase the freight

rates on short term, and then peters out, the Bulk index decreasing to under zero while the

37

Tanker stays above. An increase in Libor means higher loan rates, and therefore probably less

investments, which is a mechanism that probably impacts the bulk market more, since that mar-

ket is more driven by big investment cycles rather than consumption. The same hold for dollar,

since many investments world wide are made in dollar, an increase in dollar means higher cost

for domestic investments, while for the oil market, higher dollar index means that for example

OPEC countries get paid more for their oil in domestic currencies since oil is almost always

priced in dollars.

The impulses in Earnings simulate approximately the same movements, which is not surprising.

The reader can clearly see that the earnings times series and the freight rate indices covaries,

which is obviously because the higher the freight rates, the higher the earnings for the shipping

companies.

The fleet growth and second hand price shocks show symptoms of the same real pricing mech-

anisms in the freight rates. If the demand for second hand vessels grow, or the orderbooks for

new ships are filled, that usually indicates that the markets demand for seaborne trade is high,

and their is no supply-demand equilibrium. Freight rates increase in the short term, and then

when the demand either stabilizes or recesses, the freight rates slumps because of an oversupplied

market.

For the impulse responses regarding the data from Singapore, i.e. tanker tonnage arrivals and

dry-bulk arrivals, the suggested impacts for the shocks differs completely. The Tanker arrivals

data seems to have a very low contribution to the model in total, and therefore its impulse re-

sponses is disregarded on that note. The Bulk arrivals to Singapore on the other hand has the

greatest negative impact of all impulse responses. This is perhaps driven by the same market

mechanisms as in the case with the Australian ore export, but that is more a conjecture.

6.4 Analysis of Forecasts

As can clearly be seen from the results of the measures used to estimated the forecast precision,

their are deviating results for the different market models. This is not surprising if one inspects

the Forecast Error Variance Decompositions also, and contemplate the differences between the

markets. The VAR-model used for forecasting the BDTI has overall better performance from the

underlying time series than the VAR for the BDI. In the BDTI almost all variables contribute

with at least ≥ 1% of the forecast variance, except the Singapore oil data. This is not echoed in

the BDI forecast, where several contributions are ≤ 1%. Even the iron ore price has surprisingly

low contributions. In the end, all endogenous variables in these models contribute more or less to

the forecasts. It is also interesting to notice that in the freight rates only around 50% of forecasts

error variance comes from its own innovations.

Neither the ARIMA:s or the VAR:s performs a perfect forecast of the freight rates, but there are

some interesting observations. The actual BDTI does not exhibit any big volatile movements

during the forecast period, while the BDI exhibits rather big movements in short time.

The Tanker-Oil-market seems to have stabilized during the summer and fall. Given that many

countries, especially China and the US, almost hoarded oil, both for commercial and strategic

reserves during the spring 2020, the overall demand has slumped. The ARIMA-model probably,

38

by experience from the cyclically seasonal data from before, expects a increasing demand for

oil tankers during the fall, due to the colder seasons kicking in. This did not happen, since the

stores are more or less at full capacity, and because of lockdowns, the aviation industry is also

grounded, decreasing demand for oil further. The performance of the VAR is very satisfactory,

especially during these extreme market conditions. In the end, the dirty tanker market is only

dependent of oil as commodity, which makes it easier to aggregated a VAR-model, since the

input data is more obvious and easily determined.

The Bulk-market seems to rally during the beginning of the second half of 2020, around the

same time as the lockdowns in several countries eased over the summer. The ARIMA-forecast,

precisely as in the Tanker-forecasts, expects the seasonal increase in freight demand, because of

harvest season and other seasonal economic indicators, pushing it slightly upwards. During the

same period, the VAR-model is decreasing down to new lows. The VAR-models weak perfor-

mance here is obvious. The bulk market is more dependent on several different commodities, and

not just one as in the dirty tanker market. The iron ore only contributes with about a third of

the overall goods that is transported via seaborne bulk, and a more solid model should have per-

haps had more focus on several commodities rather than other economic and financial indicators.

Regarding the information and suggestions from the informal interview, the results differ between

the different markets, i.e. it is probably better with a simple model focusing on commodities for

the bulk market, but not in the tanker market. Although they are anecdotal, it is interesting

that from a mathematical point of view, the suggestions to keep the models simple is probably

right, in bulk especially.

6.5 Further Research

Further research that is suggested is to perhaps do some more advanced multiple time series

analysis. Perhaps perform a Vector Cointegration Model, and to see if there are long run re-

lationships between the time series, and if it would be better performed in first differences or

levels. A BVAR would also be interesting to test, assigning prior probabilities rather than just

estimate from given data. Another interesting development would be to take a more machine

learning approach to create a forecasting model, for example an LSTM or some other Artificial

Neural Network method.

On the data front, some data for sustainability or environmental impact would be highly in-

teresting to use in these models, given that this is a more and more important subject today

than it is 15 years ago. Could a shipping company increase their freight rates if they have a

sustainable edge? Here it could also be included to lift the question of different shipping fuels

and their impact on the forecasting and impulse responses, since emission friendly fuels often

cost more.

There could also be some comparisons between models with different time resolutions. For

example, does a VAR-model perform better on weekly basis or monthly. Given the inconsistent

way we reference time, there could possibly be some error sources in this. There is also room for

a VAR (or more advanced model tools) on daily basis, since many of the used time series in this

model are measured/priced every day (e.g. BDTI/BDI, oil price/iron ore price, libor, dollar).

39

7 Conclusion

In conclusion the results from this thesis are interesting and satisfies both expectations and the

research questions. The models are solid and properly assessed, and they work given the purpose,

even though the market conditions are the most volatile in many years.

All the impulse responses corresponds in plausible ways to real market mechanisms and expec-

tations, and not just depend on their own mathematical innovations. This makes them plausible to

be used efficiently as an instrument of analysis for the shipping industries, since shocks are among

the main features driving the market, especially in the wake of the covid-19 pandemic, which made

markets conditions extreme.

For the BDTI, the VAR-model is superior in forecasting, while in case of the BDI, the ARIMA-

model outperformed the VAR. With different results for the two different markets regarding the

forecasts, it is deduced that the dirty tanker market is simpler to forecast due to its obvious depen-

dence on only one commodity while the dry bulk market is more intricate and depends on several

commodities, and not just iron ore. It is also deduced that the commodities pricing in themselves

are not the most important indicator for forecasts, rather than other more market describing data.

Given that the main drivers of these markets are seasonality and black swan events, it will always

be very hard to give a fair and completely accurate forecast of these indices. The forecasts are never

expected to describe this volatile market perfectly rather than give guidance.

In future, further research is needed, and the dependence on these models for market references

are limited, but they can give an understanding of which data that drive the seaborne freight indus-

try.

40

Appendix A: Original Time Series

Figure 13: Plots of all the time series in the models, with there first difference and ACF:s andPACF.

41

Figure 14: Plots of all the time series in the models, with there first difference and ACF:s andPACF.

42

Appendix B: EM-Algorithm Results

QQ-plots of Imputed Time Series

Figure 15: QQ-plots of the distributions of the data series that are imputed, plotted against anormal distribution.

43

Distributions of Fitted Imputations

Figure 16: Distributions of the fitted imputations from the EM-algorithm. Blue is the distribu-tion of the existing data, and red is the fitted latent data.

44

Appendix C: OLS-Stability

Figure 17: Modulus of the eigenvalues of the Oil-Tanker-model.

Figure 18: Modulus of the eigenvalues of the Iron-Bulk-model.

45

Bibliography

Books

Stopford, Martin, Maritime Economics, 3rd ed., Routledge 2 Park Square, Milton Park, Abingdon, Oxon

OX14 4RN, 2009.

Lutkepohl, Helmut, New Introduction to Multiple Time Series Analysis, 1st ed., Springer-Verlag Berlin

Heidelberg 2005

Brockwell, Peter J., Davis, Richard A., Introduction to Time Series and Forecasting, 2nd ed., 2002,

Springer-Verlag New York, Inc.

Reports, Articles, Lecture Notes

UNCTAD, Review of Maritime Transport 2020. New York: United Nations Publications 405 East 42nd

Street New York, NY 10017, 2020. https://unctad.org/system/files/official-document/rmt2020/ en.pdf

(Retrieved 2020-12-01).

UNCTAD,2019 Handbook of Statistics. New York: United Nations Publications 405 East 42nd Street

New York, NY 10017, 2020. https://unctad.org/system/files/official-document/tdstat44/ en.pdf (Re-

trieved 2020-12-01).

Jackson, Mark, Lad, Deepa, Baltic Exchange, Guide to Market Benchmarks, Version 4.1 – March 2020

Baltic Exchange Information Services Ltd.

Vangelis Tsioumas, Stratos Papadimitriou, Yiannis Smirlis, Shaher Zahran Zahran, A Novel Approach

to Forecasting the Bulk Freight Market, The Asian Journal of Shipping and Logistics, Volume 33, Issue

1, 2017, Pages 33-41, ISSN 2092-5212, https://doi.org/10.1016/j.ajsl.2017.03.005.

(http://www.sciencedirect.com/science/article/pii/S2092521217300056)

Nektarios A. Michail, Konstantinos D. Melas, Quantifying the relationship between seaborne trade and

shipping freight rates: A Bayesian vector autoregressive approach, Maritime Transport Research, 2020,

100001, ISSN 2666-822X, https://doi.org/10.1016/j.martra.2020.100001.

(http://www.sciencedirect.com/science/article/pii/S2666822X20300010)

Lutkepohl, Helmut, Vector Autoregressive Models, European University Institute, Department of Eco-

nomics, EUI Working Paper ECO 2011/30, 2011.

Hult, Henrik, Lecture Notes in Statistical Inference, lecture 7, EM-algorithm, KTH Royal Institute of

Technology, Department of Mathematics , 2019, pdf, retrieved 2020-12-01.

Web Pages

Pfaff B, Stiegler M. Package VARS. R-Project.org. Published 2015 Feb 20. Retreived from cran.r-

project.org/web/packages/vars/vars.pdf

Honaker J, King G, Blackwell M. Package Amelia. R-Project.org. Published 2019 Nov 24. Retreived

from cran.r-project.org/web/packages/Amelia/Amelia.pdf

46

Raw Data

Baltic Exchange (2020), Baltic Dirty Tanker Index, 2009-2020, https://www.balticexchange.com/en/index.html,

extracted 2020-12-01.

US Energy Information Administration (2020), Brent Oil Price, 2009-2020,

https://www.eia.gov/dnav/pet/hist/RBRTED.htm, extracted 2020-09-15.

Clarksons World Fleet Register Time Series(2020), Tanker Earnings, 2009-2020,

https://www.clarksons.net/n//wfr/timeseries/advanced/50/1, extracted 2020-09-15

Investing.com (2020), Dollar Index, 2009-2020, https://se.investing.com/currencies/us-dollar-index, ex-

tracted 2020-09-15

Macrotrends (2020), LIBOR 3-month, 2009-2020, https://www.macrotrends.net/2520/3-month-libor-

rate-historical-chart,

Baltic Exchange (2020), Baltic Second Hand Tanker Price, 2009-2020,

https://www.balticexchange.com/en/index.html, extracted 2020-09-15.

Maritime and Port Authority of Singapore(2020), Tanker Arrivals, 2009-2020,

http://www.mpa.gov.sg/web/portal/home/port-of-singapore/port-statistics, extracted 2020-09-15

US Energy Information Administration (2020), Total Crude Imports,

https://www.eia.gov/dnav/pet/pet move impcus a2 nus ep00 im0 mbbl m.htm, extracted 2020-09-15

Clarksons World Fleet Register Time Series(2020), Ship Orderbook, 2009-2020,


Baltic Exchange (2020), Baltic Dry Index, 2009-2020, https://www.balticexchange.com/en/index.html,


Investing.com (2020), Iron ore fines 62% Fe CFR, 2009-2020, https://se.investing.com/commodities/iron-

ore-62-cfr-futures, extracted 2020-09-15

Clarksons World Fleet Register Time Series(2020), Bulk Earnings, 2009-2020,


Baltic Exchange (2020), Baltic Second Hand Bulk Price, 2009-2020, https://www.balticexchange.com/en/index.html,


Maritime and Port Authority of Singapore(2020), Vessel Arrivals, 2009-2020,

http://www.mpa.gov.sg/web/portal/home/port-of-singapore/port-statistics, extracted 2020-09-15

Australian Bureau of Statistics(2020), Preliminary merchandise exports, standard international trade

classification, 2009-2020, https://www.abs.gov.au/statistics/economy/international-trade/international-

merchandise-trade-preliminary-australia/latest-release, extracted 2020-09-15

47

TRITA -SCI-GRU 2020:384

www.kth.se

multiple time series analysis of freight rate indices

Documents