prediction of time series for electricity generation

109
Faculty of Industrial Engineering, Mechanical Engineering and Computer Science University of Iceland 2020 Faculty of Industrial Engineering, Mechanical Engineering and Computer Science University of Iceland 2020 Prediction of time series for electricity generation Guðmundur Smári Guðmundsson

Upload: others

Post on 03-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Faculty of Industrial Engineering, Mechanical Engineering andComputer Science

University of Iceland2020

Faculty of Industrial Engineering, Mechanical Engineering andComputer Science

University of Iceland2020

Prediction of time series forelectricity generation

Guðmundur Smári Guðmundsson

PREDICTION OF TIME SERIES FORELECTRICITY GENERATION

Guðmundur Smári Guðmundsson

60 ECTS thesis submitted in partial fulfillment of aMagister Scientiarum degree in Computer Science

AdvisorsHelmut Neukirchen

Morris RiedelÓlafur Pétur Pálsson

External ExaminerSebastian Lührs

Faculty of Industrial Engineering, Mechanical Engineering and ComputerScience

School of Engineering and Natural SciencesUniversity of IcelandReykjavik, May 2020

Prediction of time series for electricity generation

60 ECTS thesis submitted in partial fulfillment of a M.Sc. degree in Computer Sci-ence

Copyright c© 2020 Guðmundur Smári GuðmundssonAll rights reserved

Faculty of Industrial Engineering, Mechanical Engineering and Computer ScienceSchool of Engineering and Natural SciencesUniversity of IcelandHjarðarhagi 2-6107, Reykjavik, ReykjavikIceland

Telephone: 525 4000

Bibliographic information:Guðmundur Smári Guðmundsson, 2020, Prediction of time series for electricity gen-eration, M.Sc. thesis, Faculty of Industrial Engineering, Mechanical Engineering andComputer Science, University of Iceland.

Printing: Háskólaprent, Fálkagata 2, 107 ReykjavíkReykjavik, Iceland, May 2020

For Sunna, Heiðbjört, and Dio

Abstract

Electric energy meters are used for measuring how much energy is generated perhour in a power station. These measurements are time series which are typicallyonly available at the end of each month, nevertheless the data needs to be avail-able as soon as possible. In this thesis, using near real-time data, two methodsare presented for time series predictions: a ratio method and a deep learning LongShort-Term Memory (LSTM) neural network method. The results from these meth-ods are compared by two error metrics, Root Mean Square Error (RMSE) and MeanAbsolute Error (MAE) with an emphasis on a lower RMSE. Both methods are ap-plied to three hydropower stations: Ljósafoss, Hrauneyjafoss, and Fljótsdalur. Thebest acquired RMSE value for each station is: 0.066, 1.651, and 2.667 respectively.While the ratio method achieves a low RMSE for one station, the LSTM methodachieves the lowest RMSE for all three power stations. The results conclude theLSTM method to be a good choice for time series predictions for other hydropowerstations, improving speed of a data analysis by making data predictions available innear real-time.

Útdráttur

Raforkumælar eru notaðir í aflstöðvum til að mæla hversu mikil orka er unnin áhverri klukkustund. Mælingarnar eru tímaraðir sem hægt er að nálgast í lok hversmánaðar en þrátt fyrir það þurfa gögnin að vera aðgengileg eins fljótt og auðiðer. Í þessari ritgerð eru notuð nær rauntímagögn til að framkvæma tvær mis-munandi aðferðir fyrir tímaraðaspár: hlutfallsaðferð (e. ratio method) og djúpnáms-aðferð (e. deep learning method). Niðurstöðurnar úr þessum aðferðum eru bornarsaman með ferningsmeðaltalsrótarskekkju (e. Root Mean Square Error) og meðal-tals beinni skekkju (e. Mean Absolute Error) með áherslu á lægri ferningsmeðaltals-rótarskekkju. Aðferðirnar eru heimfærðar á þrjár vatnsaflsstöðvar: Ljósafossstöð,Hrauneyjafossstöð og Fljótsdalsstöð. Lægstu gildi ferningsmeðaltalsrótarskekkjusem fengust fyrir hverja af þremur aflstöðvunum eru, í sömu röð: 0.066, 1.651og 2.667. Á meðan hlutfallsaðferðin nær fram lágri ferningsmeðaltalsrótarskekkjufyrir eina aflstöð, þá nær djúpnámsaðferðin fram lægstu skekkjunni fyrir allar þrjáraflstöðvarnar. Af niðurstöðunum má álykta að djúpnámsaðferðin sé góð fyrir tíma-raðaspár fyrir aðrar vatnsaflsstöðvar, sem betrumbætir hraða gagnagreininga meðþví að hafa tímaraðaspár aðgengilegar á nær rauntíma.

v

Contents

List of Figures ix

List of Tables xi

Abbreviations xiii

Acknowledgments xvii

1. Introduction 11.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. Problem Statement, Approach and Research Question . . . . . . . . . 21.3. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2. Foundations 52.1. Selected Power Stations and The Transmission Grid . . . . . . . . . . 5

2.1.1. Ljósafoss Power Station . . . . . . . . . . . . . . . . . . . . . 82.1.2. Hrauneyjafoss Power Station . . . . . . . . . . . . . . . . . . . 102.1.3. Fljótsdalur Power Station . . . . . . . . . . . . . . . . . . . . 12

2.2. Power Station Data File Format . . . . . . . . . . . . . . . . . . . . . 142.3. Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4. Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.1. Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 192.4.2. Long Short-Term Memory (LSTM) . . . . . . . . . . . . . . . 20

2.5. Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.5.1. Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.5.2. Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . . . 222.5.3. NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.5.4. Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.5.5. Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.5.6. Plotly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.5.7. Microsoft SQL Server . . . . . . . . . . . . . . . . . . . . . . . 242.5.8. PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.5.9. KairosDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.5.10. Scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.5.11. Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

vii

Contents

3. Methods 293.1. Data Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2. Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3. Ratio Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.4. LSTM Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4. Design and Implementation 374.1. Data Structures, Storage and Implemented Algorithms . . . . . . . . 37

4.1.1. Ratio Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.1.2. LSTM Method . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2. Library Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3. Keras Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5. Results and Evaluation 475.1. Ratio Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2. LSTM Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2.1. Ljósafoss Power Station . . . . . . . . . . . . . . . . . . . . . 495.2.2. Hrauneyjafoss Power Station . . . . . . . . . . . . . . . . . . . 515.2.3. Fljótsdalur Power Station . . . . . . . . . . . . . . . . . . . . 54

5.3. Discussion and Comparison . . . . . . . . . . . . . . . . . . . . . . . 57

6. Related Work 616.1. Hydrological Time Series Predictions . . . . . . . . . . . . . . . . . . 616.2. Solar Irradiance Time Series Predictions . . . . . . . . . . . . . . . . 626.3. Financial Time Series Predictions . . . . . . . . . . . . . . . . . . . . 63

7. Summary and Outlook 657.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Bibliography 69

A. Long Short-Term Memory (LSTM) Parameter Search Experiments 77

viii

List of Figures

2.1. High voltage national grid for Iceland. Source: https://www.map.is/landsnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2. Power stations: Ljósafoss in Sog, Hrauneyjafoss in Þjórsá/Tungnaá,and Fljótsdalur in Kárahnjúkar. Source: https://www.openstreetmap.org . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3. Ljósafoss power station in Sog. Source: https://www.landsvirkjun.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4. Transmission/distribution lines Grafningslína, Skálholtslína, Skólalína,and Grímsneslína. Source: https://www.rarik.is/dreifikerfi-kortasja . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5. Hrauneyjafoss power station in Thjorsá/Tungnaá region. Source:https://www.landsvirkjun.com . . . . . . . . . . . . . . . . . . . . 10

2.6. Transmission lines HR1 and SI2. Source: https://map.is/landsnet 11

2.7. Fljótsdalur located inside Valthjófsstadur Mountain. Source: https://www.landsvirkjun.com . . . . . . . . . . . . . . . . . . . . . . . . 12

2.8. Transmission lines FL2, FL3, FL4, and KR2. Source: https://map.is/landsnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.9. A Command-Line Interface (CLI) client for downloading SVEF filesfrom Amper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.10. Supervised learning process. Source: [81] . . . . . . . . . . . . . . . . 18

2.11. Underfitting, Appropriate capacity, and Overfitting. Source: [17] . . . 19

ix

LIST OF FIGURES

2.12. A Venn diagram describing Artificial Intelligence (AI), Machine Learn-ing (ML), Artificial Neural Network (ANN), and Deep Learning (DL).Source: [83] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.13. A multi-layer feedforward neural network. Source: [73] . . . . . . . . 21

2.14. A Jupyter Notebook running on JURON, a High Performance Com-puting (HPC) cluster at Juelich . . . . . . . . . . . . . . . . . . . . . 23

2.15. Web User Interface on KairosDB . . . . . . . . . . . . . . . . . . . . 26

3.1. Generation, grid input and correction ratio per hour in the year 2016for Ljósafoss (LJO) power station . . . . . . . . . . . . . . . . . . . . 33

4.1. Data flow for electrical generation and grid input . . . . . . . . . . . 38

4.2. Keras model with three layers . . . . . . . . . . . . . . . . . . . . . . 43

4.3. Keras model with two layers . . . . . . . . . . . . . . . . . . . . . . . 44

4.4. Keras model with four layers . . . . . . . . . . . . . . . . . . . . . . . 44

4.5. Boxplot of training time per epoch . . . . . . . . . . . . . . . . . . . 46

5.1. Keras session batches for Ljósafoss power station . . . . . . . . . . . . 49

5.2. Keras session batches for Hrauneyjafoss power station . . . . . . . . . 52

5.3. Keras session batches for Fljótsdalur power station . . . . . . . . . . 55

x

List of Tables

2.1. Generation meters for Ljósafoss power station . . . . . . . . . . . . . 9

2.2. Grid input meters for Ljósafoss power station . . . . . . . . . . . . . 9

2.3. Generation meters for Hrauneyjafoss power station . . . . . . . . . . 11

2.4. Grid input meters for Hrauneyjafoss power station . . . . . . . . . . . 11

2.5. Generation meters for Fljótsdalur power station . . . . . . . . . . . . 13

2.6. Grid input meters for Fljótsdalur power station . . . . . . . . . . . . 14

5.1. Results for the ratio method and statistics . . . . . . . . . . . . . . . 48

5.2. Unchanged hyperparameters in the top 10 session batches for Ljósafosspower station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3. Keras session batches with configured hyperparameters for Ljósafosspower station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.4. Unchanged hyperparameters in the top 10 session batches for Hrauney-jafoss power station . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.5. Keras session batches with configured hyperparameters for Hrauney-jafoss power station . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.6. Unchanged hyperparameters in the top 10 session batches for Fljóts-dalur power station . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.7. Keras session batches with configured hyperparameters for Fljóts-dalur power station . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

xi

LIST OF TABLES

A.1. LJO Keras LSTM parameter search experiments, sorted descendingby Root Mean Square Error (RMSE) . . . . . . . . . . . . . . . . . . 81

A.2. HRA Keras LSTM parameter search experiments, sorted descendingby RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.3. FLJ Keras LSTM parameter search experiments, sorted descendingby RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

xii

Abbreviations

AI Artificial Intelligence

ANN Artificial Neural Network

API Application Programming Interface

ARIMA Autoregressive integrated moving average

avg average

CLI Command-Line Interface

CNTK Microsoft Cognitive Toolkit

CNN Convolutional Neural Network

CPU Central Processing Unit

CRISP-DM CRoss-Industry Standard Process for Data Mining

CSV Comma-Separated Values

DL Deep Learning

DNN Deep Neural Network

GPU Graphics Processing Unit

GSU Generator Step-Up transformer

GUI Graphical User Interface

HBM High Bandwidth Memory

HDF Hierarchical Data Format

xiii

Abbreviations

HPC High Performance Computing

IDE Integrated Development Environment

JSC Jülich Supercomputing Centre

JSON JavaScript Object Notation

kWh Kilowatt hours

LJO Ljósafoss

LR Linear Regression

LSTM Long Short-Term Memory

MAE Mean Absolute Error

ML Machine Learning

MSE Mean Squared Error

MWh Megawatt hours

NaN Not a Number

NoSQL Not Only SQL

PNG Portable Networks Graphics

ReLU Rectified Linear Unit

REPL Read-Eval-Print Loop

REST REpresentational State Transfer

RMSE Root Mean Square Error

RNN Recurrent Neural Network

SCADA Supervisory Control And Data Acquisition

SCP Secure Copy Protocol

xiv

Abbreviations

sd standard deviation

SOCKS Socket Secure

stdout standard output

SQL Structured Query Language

SSH Secure Shell

TSO Transmission System Operator

xv

Acknowledgments

Many thanks to Dr. Helmut Neukirchen, Dr. Morris Riedel, and Dr. Ólafur PéturPálsson, all of whom are professors in the Faculty of Industrial Engineering, Mechan-ical Engineering and Computer Science at the University of Iceland, for instructingthe project and for very valuable guidance. Many thanks go to Landsvirkjun andLandsnet for providing the data, especially my supervisor Eggert Guðjónsson for hisunderstanding and support. Special gratitude to Sebastian Lührs for his contribu-tion, being the external examiner of this thesis. Special thanks and gratitude goesto my family, my parents for their support and constant positivity, my sisters for al-ways being there, my lovely wife Sunna for her constant epic support and for alwaysreminding me of attending to the business in hand, and our daughter Heiðbjört formaking sure we take a play break now and then.

Research leading to these results has in parts been carried out on the Human BrainProject PCP Pilot Systems at the Juelich Supercomputing Centre, which receivedco-funding from the European Union (Grant Agreement no. 604102).

xvii

1. Introduction

1.1. Motivation

Electric energy meters are used widely for settling how much energy is fed into or outof a distribution network. The meters are there for the electrical energy company,charging a customer for electricity used by a home or a company. Electric energymeters are also used for the high voltage transmission network in Iceland to measurehow much energy is fed into or out of the transmission network.

Grid input is a variable used for settlement of high voltage electric energy, providedby a power station which is delivered to the national grid. This variable consists ofmeasurements (measured in Kilowatt hours (kWh) or Megawatt hours (MWh)) fromall grid input meters related to a power station. Total generation of a power stationis a variable, related to grid input, i.e. the power and energy is first generated andthen fed as grid input. The grid input and the total generation are typically notvalidated until the end of each month.

Unfortunately the metered data can have missing values and can be unreliable. Inaddition, the metered data needs to be available as soon as possible: one or twodays delay is acceptable, but the sooner the better, in particular needing to wait forsettlement data until the end of the month is sub-optimal. Metered data for gridinput is very handy for error checking a monthly settlement and metered data forenergy generation can be error checked against grid input.

In 2017, discussions were had at Landsvirkjun [46], Iceland’s largest electricity gen-erator (processing 75% of all electricity used in Iceland) owned by the Icelandicstate, where ideas for a prediction system were shared. This prediction system wasimagined to be a time series system capable of creating derived time series fromother time series, either by figuring out dependencies or by allowing the user todefine these dependencies. By this means, it would be possible to get near-real timepredictions of data derived from other available data, e.g. predict grid input fromenergy generation data. Time series predictions or forecasts are used widely, e.g.weather forecasts, tourism forecasts, electricity load forecasts, etc. The first thoughtswere to develop a distributed system with a web application which would serve as

1

1. Introduction

a Graphical User Interface (GUI) for its users. In the web application users couldquery data, view recordsets, add or update time series in the system, and describe arelationship against other databases that would be used as a source for calculations.These calculations would be for every power station owned by Landsvirkjun but theimplementation would need to be portable, to be usable for other users (outside ofLandsvirkjun) to create arbitrary calculated time series that could depend on othersystems, files or databases.

The national Transmission System Operator (TSO) [42], Landsnet operates a data-base, which is used to gather values of generated power with time resolution of both5 minutes and 30 minutes which is available almost instantly after each 5 minutesand 30 minutes have passed every day. The data can give insights into operationsat real-time and can be used as valuable information for making predictions.

1.2. Problem Statement, Approach and ResearchQuestion

Currently the grid input is only available to the power station operator on a monthlybasis and by predicting this data it becomes available (as predicted values) in nearreal-time. The generation data is used to predict grid input for a power station.

Two methods are presented in this thesis, which are applied to the three power sta-tions and are used to provide predictions. One method uses simple ratio calculationsand the other method uses deep learning neural networks.

The research question put forward is: Which of the two methods presented inthis thesis provides a lower prediction error, considering Root Mean Square Er-ror (RMSE) and Mean Absolute Error (MAE)?

1.3. Outline

Chapter 2 discusses the foundations for the three selected power stations, the trans-mission grid, data file format, machine learning, and software tools. Chapter 3describes the two methods used to deliver predictions for each power station. Chap-ter 4 covers development and method specific implementations, storage of data,dependencies, and deep learning models. Chapter 5 presents results for both meth-ods and a comparison of these methods. Chapter 6 covers related work regardingtime series predictions. Chapter 7 provides a thesis summary and descriptions of

2

1.3. Outline

future work and outlook. Appendix A shows tables of LSTM parameter searchexperiments along with results.

3

2. Foundations

This chapter provides information about power stations considered in this thesis,the national transmission grid, software tools, and the data used.

2.1. Selected Power Stations and TheTransmission Grid

Due to electricity law 65/2003 [62] Landsvirkjun’s division of transmission was ex-tracted to a new company named Landsnet in 2005. It is co-owned by other energycompanies.

In the year 2019 Landsnet was owned by Landsvirkjun (64.73%), Rarik (22.51%),Orkuveita Reykjavíkur (6.78%), and Orkubú Vestfjarða (5.98%) [45]. There arediscussions which started in the year 2019 about the government buying shares inLandsnet [43]. Today Landsnet is responsible for transferring electricity, throughhigh voltage transmission lines between parts of Iceland and has a monopoly licenseto do so from the government, bound by electricity law 65/2003 [62].

Landsnet, the Transmission System Operator (TSO) owns and operates the nationalhigh voltage grid and has about 70 substations all over the country [42]. Figure 2.1shows the national grid of Iceland. Lines colored green are 220 kV, red lines are132 kV, and blue are 66 kV. Yellow boxes are substations, dotted lines are under-ground cables.

Landsvirkjun generates electric power and provides approximately 72% of all elec-tric energy generated in Iceland in the year 2018 [44]. Landsvirkjun competes ona world-wide market to provide electricity to energy-intensive companies (e.g. alu-minum smelters, ferrosilicium plants, data centers) [47]. These companies are alllocated in Iceland since there is no transmission cable connecting Iceland to othercountries [53]. In 2019, the company operates 18 power stations and two wind mills.This portfolio of power stations consists of 15 hydropower stations and 3 geothermalpower stations [51].

5

2. Foundations

Figure 2.1: High voltage national grid for Iceland. Source: https: // www. map.is/ landsnet

Electric power and energy is provided to the Icelandic national grid that Landsnetis responsible for. Delivery point from a power station to a substation is rightafter a Generator Step-Up transformer (GSU) (based on a corresponding voltagerequirement from the TSO), that is a GSU is an essential link to the grid [59].The GSU is owned and operated by the power station. Electricity is fed from apower station into the grid through a substation owned by Landsnet. Grid inputconsists of accumulated sum of measurements from meters for transmission lines orGSUs. Electrical substation can be assigned to stepping up or stepping down highvoltage. This is done with the help of transformers, switchgear, circuit breakers andassociated devices [12]. There is loss of energy during this process, and that loss isdifferent for each power station. This is because power stations have different sizes,different equipment, and setup. These losses will be covered later in the thesis.

There is also loss in transmission lines that depends on e.g. voltage, length of atransmission line, and equipment on send/receive end of that line. However, thisloss in transmission lines is beyond the scope of this thesis (see [86] for a short, basicintroduction and see [2] for further reading).

Generation meters are used to measure how much energy is generated for eachinstalled unit. Grid input meters are used to measure how much energy is fed into

6

2.1. Selected Power Stations and The Transmission Grid

the grid for transmission. These meters may have different accuracy (see tables 2.1,2.2, 2.3, 2.4, 2.5, and 2.6) with measurements in Kilowatt hours (kWh) or Megawatthours (MWh). Total energy generated by a power station (or fed into the grid) isan accumulated sum of values from relevant meters.

Data for this project was acquired from three hydropower stations, owned by Landsvirkjun:Ljósafoss, Hrauneyjafoss, and Fljótsdalur (see Figure 2.2). These power stationshave different characteristics. Ljósafoss is old and relatively small in terms of power,Hrauneyjafoss is medium in terms of power, and Fljótsdalur is large in terms ofpower. These are described in the next sections.

Figure 2.2: Power stations: Ljósafoss in Sog, Hrauneyjafoss in Þjórsá/Tungnaá,and Fljótsdalur in Kárahnjúkar. Source: https: // www. openstreetmap. org

7

2. Foundations

2.1.1. Ljósafoss Power Station

The power station Ljósafoss is located in south-west of Iceland in the river Sog,close to Þingvellir (see Figure 2.2). Ljósafoss (see Figure 2.3) is a run-of-river powerstation and has almost no water storage facility [24]. It is Landsvirkjun’s oldestpower station, and started operation in 1937 with two turbines. At that time, thepower station quadrupled available electricity in the capital region. This made itpossible to use electrical kitchen stoves instead of charcoal stoves [50]. The station

Figure 2.3: Ljósafoss power station in Sog. Source: https: // www. landsvirkjun.com

is located in between of two other power stations, Steingrímsstöð and Írafoss. Waterflowing to Ljósafoss power station comes from Úlfljótsvatn and most of the waterflowing into Úlfljótsvatn comes from Steingrímsstöð. Water flow is around 100 m3/son average from Þingvallavatn [52]. Water flowing from Ljósafoss goes to Írafosslónand from there to Írafoss power station.

Currently Ljósafoss power station has a generation capacity of 105 GWh per year [51].It is equipped with three Francis turbines, two 4.4 MW and one 6.5 MW.

Figure 2.4 shows Ljósafoss power station and transmission lines. From this powerstation there are four transmission lines. On the figure, one line is positioned tothe left of the power station. The transmission line that goes to the left, over thedam on the figure is the so-called Grafningslína (also known as Þingvallalína) [11].The other three transmission lines are called Grímsneslína, Skólalína (also know asMjóanes), and Skálholtslína. They are positioned to the right of the power station.The bottom right line is Grímsneslína, Skólalína is the top right line, and the middleright line is Skálholtslína.

8

2.1. Selected Power Stations and The Transmission Grid

Figure 2.4: Transmission/distribution lines Grafningslína, Skálholtslína, Skólalína,and Grímsneslína. Source: https: // www. rarik. is/ dreifikerfi-kortasja

Table 2.1: Generation meters for Ljósafoss power station

Generation meter Unit Accuracy

G1 MWh 0.5%G2 MWh 0.5%G3 MWh 0.5%

Table 2.2: Grid input meters for Ljósafoss power station

Grid meter Voltage Unit Accuracy

GSU1 66 kV kWh 0.2%GSU4 11 kV kWh 0.2%Grafningslína 11 kV kWh 0.2%Skálholtslína 11 kV kWh 0.2%Skólalína 11 kV kWh 0.2%Grímsneslína 11 kV kWh 0.5%

Table 2.1 lists accuracy of generation meters for Ljósafoss. In Table 2.2 one can seethe accuracy of grid input meters. As the table shows, the grid input is a combi-nation of two GSUs and four transmission lines, all with accuracy of 0.2% exceptGrímsneslína 11 kV which has accuracy of 0.5%. All of these transmission lines aredelivered to the medium voltage (11 kV) distribution grid owned by Rarik [71].

9

2. Foundations

2.1.2. Hrauneyjafoss Power Station

Hrauneyjafoss power station, located in South Iceland (see Figure 2.2) harnessesTungnaá river. The power station (see Figure 2.5) is Landsvirkjun’s third largeststation and fourth largest over the country [60], in terms of power. The stationstarted to generate power into the national grid in 1981. It has a generation capacityof 1300 GWh per year [51], equipped with three Francis turbines each with installedcapacity of 70 MW. Generation values fluctuate a lot for diurnal variations and unitsare often stopped during the night. This is because it is not ideal for the nationalgrid to have too much spinning reserves. By having reasonable reserves spinning itis possible to lower risks of generators running below limits if there are outages inthe system. The power station has a small reservoir which is able to store 33 Gl [49].

The station is one of seven power stations in the same region. Vatnsfell is at thetop and gets water from Þórisvatn. Water from Vatnsfell goes to Sigalda, and waterfrom Sigalda goes to Hrauneyjafoss. Water from Hrauneyjafoss goes to Búðarháls,and water from Búðarháls to Sultartangi. Sultartangi provides water to both Búrfelland Búrfell II.

Transmission line Hrauneyjalína 1 (HR1) (see Figure 2.6) connects from Hrauney-jafoss to the substation at Sultartangi, with transmission line Búðarhálslína 1 (BU1)connecting in the middle. Transmission line Sigöldulína 2 (SI2) connects fromHrauneyjafoss to substation at Sigalda.

Figure 2.5: Hrauneyjafoss power station in Thjorsá/Tungnaá region. Source:https: // www. landsvirkjun. com

10

2.1. Selected Power Stations and The Transmission Grid

Figure 2.6: Transmission lines HR1 and SI2. Source: https: // map. is/ landsnet

Table 2.3: Generation meters for Hrauneyjafoss power station

Generation meter Unit Accuracy

G1 MWh 0.5%G2 MWh 0.5%G3 MWh 0.5%

Table 2.4: Grid input meters for Hrauneyjafoss power station

Grid meter Voltage Unit Accuracy

Hrauneyjafosslína 1 - HR1 220 kV kWh 0.2%Sigöldulína 2 - SI2 220 kV kWh 0.2%

Table 2.3 shows the accuracy of generation meters for Hrauneyjafoss. All threegeneration meters have accuracy of 0.5%. In Table 2.4 one can see accuracy of gridinput meters. Both grid meters are transmission lines with an accuracy of 0.2%.

11

2. Foundations

2.1.3. Fljótsdalur Power Station

Fljótsdalur power station is located in the East side of Iceland (see Figure 2.2). Con-struction of Kárahnjúkar dam started 2003 and four years later Fljótsdalur powerstation (see Figure 2.7) was brought online in 2007 [48]. The station is by far thelargest power station in Iceland in terms of power [60]. It has six Francis turbineseach with installed capacity of 115 MW and a total generation capacity of 4800 GWhper year [51]. The station has a large reservoir which is able to store 2100 Gl [48].Units are not stopped unless there are unexpected outages or if there is scheduledmaintenance.

Figure 2.7: Fljótsdalur located inside Valthjófsstadur Mountain. Source: https:// www. landsvirkjun. com

Accuracy of generation meters for Fljótsdalur is shown in Table 2.5. In Table 2.6one can see accuracy of grid input meters. In this case the GSU meters are measuredin MWh (just like all six generation meters in Table 2.5), but the others in kWh.

In Figure 2.8 one can see four transmission lines: Fljótsdalslína 2 (FL2), Fljótsdal-slína 3 (FL3), Fljótsdalslína 4 (FL4), and Kröflulína 2 (KR2). Also shown in thisfigure is a yellow box with a lightning bolt symbol inside. This is the substation atFljótsdalur.

In Table 2.6 there are six GSUs meters, one for each generator. In this case thegenerated power is stepped up from 11 kV to 220 kV or stepped down from 220 kVto 11kV. Landsvirkjun is responsible for the GSUs and the point of delivery to theTSO is right at the end of each GSU. The generated electricity is transmittedthrough these six GSUs, either from or to the substation located near Fljótsdalurpower station. From there it is transmitted from or to the high voltage transmissionlines mentioned above (FL2,FL3,FL4, and KR2). In Fljótsdalur there are meters

12

2.1. Selected Power Stations and The Transmission Grid

Figure 2.8: Transmission lines FL2, FL3, FL4, and KR2. Source: https: // map.is/ landsnet

Table 2.5: Generation meters for Fljótsdalur power station

Generation meter Unit Accuracy

G1 MWh 0.5%G2 MWh 0.5%G3 MWh 0.5%G4 MWh 0.5%G5 MWh 0.5%G6 MWh 0.5%

on each GSU, which is not the case for Ljósafoss (see Table 2.2) and Hrauneyjafoss(see Table 2.4). Like explained in Section 2.1 red lines are 132 kV, green lines are220 kV, and dotted lines are underground.

Transmission lines FL3 and FL4 connect to the aluminum smelter in Reyðarfjörður,Alcoa Fjarðaál. Transmission line FL2 connects to Hryggstekkur substation. KR2connects to Krafla, a geothermal power station in the north side of Iceland. KR2

13

2. Foundations

Table 2.6: Grid input meters for Fljótsdalur power station

Grid meter Voltage Unit Accuracy

GSU1 220 kV MWh 0.2%GSU2 220 kV MWh 0.2%GSU3 220 kV MWh 0.2%GSU4 220 kV MWh 0.2%GSU5 220 kV MWh 0.2%GSU6 220 kV MWh 0.2%Office and workshop usage 11 kV kWh 0.2%Station usage 11 kV kWh 0.2%Office and workshop usage 400 V kWh 0.2%

has a very important role for the grid to prevent islanding, i.e. the grid being splitinto separate partitions (see [84] for more information on grid islanding). Landsnettends to connect another transmission line, Kröflulína 3 (KR3) on 220 kV along theside of KR2, for the most part [10].

2.2. Power Station Data File Format

This section explores the data format used for receiving raw data, related to electricenergy time series.

Raw input data and valid outputs for training are in the SVEF/XX [82] file for-mat. These files are flat text files with tab separated columns. Values should berepresented with 3 decimal digits, empty lines and comments are allowed. Each mea-surand (time series) should be sorted in ascending order by the date time column(column number two, counting from the leftmost side) [82]. The header consists of:

• period size: 15, 30, 60, D, M, or Y. While it is not explicitly mentioned inthe SVEF/XX documentation, if period size is set to an integer, the numberrefers to minutes. D is defined as days, M is months, and Y is years

• time and date: time and date when data was created, sampled or written tofile on the format DD.MM.YY HH:MI:SS

• unit : one of the following: kW, MW, kWh, MWh, kVAr, MVAr, kVArh,MVArh, kVAr, MVAr, kVArh, MVArh, kVA, MVA, kVAh, MVAh. Other unitsare allowed but should be inside quotes

• LocalTime(DST): set to 0 or 1 where 0 is normal time and 1 is local time

14

2.2. Power Station Data File Format

• StartOrEndTime: STARTTIME or ENDTIME. (Time of a sample on a timeseries is in start time if that time is the start of the time of a sample to thenext time on the serie, but in end time if that sample is at the end of the timeof a sample to the next provided time.)

For each line in a SVEF/XX [82] file, a tab is used for separating columns and theorder of columns matters (see Listing 2.1 for an examples of value-lines):

• first column is measurand : the identification of a measurand

• second column is timedate: time and date in the format DD.MM.YY HH:MIwhere DD is day of a month [01-31], MM is month [01-12], Y Y year with 2digits [80,36], HH is the hour [00-23], MI is the minute which must be one ofthe following [00, 15, 30, 45]

• third column is status : an integer [0-9]. Where

0. Value manually set

1. Not used

2. Normal value

3. Temporary value

4. Smeared value

5. Estimated value

6. Uncertain value

7. Missing value

8. Not used

9. Invalid value

• fourth column is value: decimal number with 3 decimal. A comma or a dot isallowed as the decimal seperator

Once every day these files are downloaded and written into a database with metadatainformation such as at what date and time each file arrived, at what date andtime each file is written into the database, date and time of measurement of eachmeasurand in each file, value, and unit. Most of the time, the files have more thanone measurand.

15

2. Foundations

Listing 2.1 shows an example of SVEF/XXmeasurements. There can be seen 5 hoursof grid input at Fljótsdalur power station. As can be seen in the header: period sizeis 60 minutes, time and date is November 25th 2019 at time 09:05:14, unit is kWhand measurand is in end time. Since this measurand is in end time it means thatthe first value shown in Listing 2.1 at 18.11.19 01:00 is sampled between 18.11.1900:00 and 18.11.19 01:00. Measurand identification is 20010032, status is set to 2i.e. normal.

Listing 2.1: An example SVEF file for FLJ grid input

SVEF/XX :1/60/25.11.19 09:05:14/ kWh/0/ ENDTIME20010032 18.11.19 01:00 2 541837.00020010032 18.11.19 02:00 2 531450.00020010032 18.11.19 03:00 2 552121.00020010032 18.11.19 04:00 2 544316.00020010032 18.11.19 05:00 2 553636.000

The files are downloaded with the help of a Command-Line Interface (CLI) tool (seeFigure 2.9) provided by the TSO. Files can be retrieved from the non-public websiteof the TSO (called Amper).

Figure 2.9: A CLI client for downloading SVEF files from Amper

16

2.3. Time Series

2.3. Time Series

Chatfield [7] defines time series as follows:

A time series is a set of observations measured sequentially through time.

In practice, time series data can be on a e.g. hourly resolution, date resolution,yearly resolution, or arbitrary intervals.

2.4. Machine Learning

Machine learning is one form of applying statistics with emphasis on using computersto estimate complicated functions and less emphasis on proving confidence intervalsfor these functions [17]. This form of applying statistics is convenient today sincecomputers have never been so powerful before and never has so much data beenavailable.

In this project, supervised learning will be used. First, machine learning and traininga model will be explored and what that involves.

Machine Learning can be divided in to two main categories [17]:

• Supervised learning: when there is a known pattern in a dataset that can belearned by training on valid inputs and outputs.

• Unsupervised learning: when a pattern is not known and clustering is used tocreate groups based on similarity and distribute samples into these groups.

Figure 2.10 shows an example flow of solving a problem with supervised learning.First there is raw data that might need cleaning or manipulation to form a validdataset that is desired by a supervisor. The valid dataset and desired outputs arefed to a chosen algorithm and then processed by a machine. After processing, theoutputs are returned [81].

The goal of a machine learning model is to perform well on new unseen inputs, notjust the data that was used for training. To achieve that, the provided data is splitin two sets: training dataset and test dataset (validation dataset). By doing this it ispossible to measure both training error and test error (i.e. generalization error) [17].How the two sets are split is an implementation decision with no right or wronganswer, but it makes sense to have the training dataset larger than the test dataset.

17

2. Foundations

Figure 2.10: Supervised learning process. Source: [81]

This is because the idea is to extract as much knowledge as possible from the dataand feed it to a model.

Training is when a model is asked to create a function e.g. f(x) to fit outputs fromfunction f ∗(x), that is for each known value x and y, the model is fed with bothx and valid y values. The model should produce a value close to y (target) for x,where x is input and y is output. The closer it is to the correct value, the trainingerror gets smaller [17].

The performance of a machine learning model can be determined by the ability tominimize the training error and to minimize the gap between training error andtest error. This introduces two problems that need to be avoided, underfitting andoverfitting [17].

Underfitting is when a model is not learning enough from the provided data, with thesymptom of a too high training error. Overfitting is when the gap between trainingerror and test error is too large [17]. Here it is up to the supervisor to decidewhat is considered too high and too large to define underfitting and overfitting. InFigure 2.11 are examples of underfitting, overfitting and appropriate capacity.

The capacity of a model [17] is the model’s ability to fit a variety of functions. Amodel with lower capacity finds it more difficult to fit a training dataset, whichresult as underfitting. A model with higher capacity can learn properties from atraining dataset that are too complex, which may result as overfitting.

In the case of underfitting in Figure 2.11, a linear function is not believed to be

18

2.4. Machine Learning

Figure 2.11: Underfitting, Appropriate capacity, and Overfitting. Source: [17]

sufficient to capture the structure of the data. An appropriate capacity is in this casewhere a quadratic function is used to visit each sample in the data and representsa curved line. Overfitting in this case is where a more complex function visits thepoints exactly but fails to extract the desired structure of the data [17].

2.4.1. Neural Networks

An Artificial Neural Network (ANN) is made of artificial neurons. These so calledneurons are processors for calculating sequences of on-off signaling real-value func-tions called activations. Input neurons are activated through the environment ofthe network, that is other neurons are activated with weights (coefficients) fromprevious neurons. Neurons can have an effect on the environment with actions andneurons can be used for learning. Neural networks with few stages are not new [74]and have been around since at least 1960s and 1970s.

There are many methods and algorithms that make use of neural networks to achievesome certain goal(s), like extracting information from an image [17], for example textor object detection.

Deep learning [17] is one approach to gather knowledge from experience to avoidthe need for humans to specify all the knowledge needed by the computer. Thiscan enable the computer to learn complicated concepts by creating them out ofsimpler concepts. A graph showing these concepts built on top of each other hasmany layers, which is why it is called deep learning [17]. Figure 2.12 shows a

19

2. Foundations

Venn diagram which describes the relationships between Artificial Intelligence (AI),Machine Learning (ML), Artificial Neural Network (ANN), and Deep Learning (DL).

Figure 2.12: A Venn diagram describing AI, ML, ANN, and DL. Source: [83]

Most of neural networks algorithms involve optimization that maximizes or min-imizes a loss function, f(x) (also known as cost function or error function) withrespect to input variables x and y [17].

A feedforward neural network [17] is made of combinations of many different func-tions. For example, in case of five functions, f(x) would be:

f(x) = f (5)(f (4)(f (3)(f (2)(f (1)(x)))))

These functions are called layers in neural networks. The number of layers is definedas the depth of a deep learning network. The final layer (f (5)) is defined as theoutput layer and the first layer (f (1)) is defined as the input layer. Figure 2.13shows a typical architecture of a multi-layer feedforward neural network.

Since output is not shown for each layer, they are called hidden layers [17]. An ANNwith two or more hidden layers is a Deep Neural Network (DNN) [18].

2.4.2. Long Short-Term Memory (LSTM)

Normal neural networks are not able to learn based on previous knowledge [63]. Hu-mans for example can learn and build on top of previous knowledge. When learning

20

2.5. Software Tools

Figure 2.13: A multi-layer feedforward neural network. Source: [73]

from time series the information contained is relevant (i.e. previous knowledge).

Recurrent Neural Networks (RNNs) are sets of neural networks specialized for pro-cessing sequences of values [17]. Most RNNs are able to process sequences of variablelength. They use loops to allow information to persist in the neural network [63].However, the gap to the relevant information needs to be very small since they relyonly on short-term memory and have a hard time with long-term dependencies. Inother words, previous information is used in the present task, and the more recentthe information is in the sequence, the RNN evaluates such information as morerelevant and more valuable than further back in the sequence [21].

A Long Short-Term Memory (LSTM) [21] neural network is a type of RNN thatis designed to avoid problems regarding long-term dependencies within reasonabletime. LSTM networks are able to handle big gaps to the relevant information to theshort-term memory but can also handle long-term dependencies, hence the nameLong Short-Term Memory. LSTMs address the vanishing gradient problem [20] i.e.where the ANN struggles to learn because updates to weights become very small (orblow up to be very big) per iteration of training [20].

2.5. Software Tools

The following sections introduce programming libraries and tools used. These aredatabases, plotting tools, web application and neural network tools. All tools pre-sented in the section are free and open-source software, except for the proprietaryMicrosoft SQL Server on-premise at Landsvirkjun.

21

2. Foundations

2.5.1. Python

Python [15] is an interpreted programming language which makes it easy to do aRead-Eval-Print Loop (REPL) and it has a simple and effective approach to object-oriented programming. It is a core dependency for most of the tools covered in thissection.

Python uses whitespace/indentation (spaces or tabs) instead of brackets and semi-colons which are common to see in other programming languages [55]. The reasonfor this is to improve code readability, simplicity and explicitness.

This project is implemented in Python version 3.6.

Py_compile [14] is a Python module used to generate a byte-code file (pyc file) froma source file (py file). The module is packaged with Python and therefore there isno need to install it specifically.

2.5.2. Jupyter Notebook

Jupyter Notebook [28] is a web application that allows editing and execution ofnotebooks. A notebook interface is an environment for literate programming. Italso includes a web server with a file manager which makes it really easy to upload-/download/rename/move files in the browser of user’s choice. Notebooks supportover 40 programming languages that can be interactively edited and executed withina notebook. Being a web application Integrated Development Environment (IDE),it is able to provide interactive tools to manipulate output as well after execution.It is also simple to execute bash commands within a notebook. Jupyter is born outof IPython [67] as a kernel for interactive execution of scripts.

Jupyter is used as a convenient file manager in the browser, by default it is config-ured to serve all files from a defined root directory. This comes in very handy formanipulating files (e.g. downloading/uploading files) on a High Performance Com-puting (HPC) supercomputing cluster which only allows connections through SecureShell (SSH). This is possible with a local SSH tunnel and a Socket Secure (SOCKS)proxy connection through localhost.

Jupyter can also be used for file editing and for execution of bash scripts (see Fig-ure 2.14), Python scripts and interacting with scheduler, etc.

22

2.5. Software Tools

Figure 2.14: A Jupyter Notebook running on JURON, a HPC cluster at Juelich

2.5.3. NumPy

NumPy [61] (short for Numerical Python) is a scientific computing library forPython. Many Python libraries integrate with NumPy [55] as it provides a mul-tidimensional array object i.e. ndarray and fast, element-wise computations witharrays or mathematical operations between arrays, to name a few. NumPy is theprimary container for data to be passed between algorithms, as using NumPy arraysis more efficient than other built-in Python data structures when it comes to storingand manipulating numerical data.

2.5.4. Pandas

Pandas [64] is a common Python library for data analysis and data structures. Pan-das combines high performance array-computing from NumPy (see Section 2.5.3)with flexible data manipulation, similar to what can be done with relational databases(through Structured Query Language (SQL)). For example aggregations, selectingsubsets and pivoting.

23

2. Foundations

Pandas includes two types of data structures: Series and DataFrames. A Series isan ordered, one dimensional object containing an array of data and an index of thatdata. The stored data can be of any NumPy data type: bool_, int_, intc, intp, int8,int16, etc. If indexes are not provided when the object is created, they are createdin an incremental order of integers starting from 0. Each value is then preservedwith an index-value link [55].

A DataFrame is a two dimensional tabular column-oriented data structure withlabels, much like a spreadsheet [55]. DataFrames are commonly used in this project.

2.5.5. Matplotlib

Matplotlib [23] is a plotting library for Python. It is a popular Python library forproducing plots and other 2D visualizations [55]. It can be used to show plots insideJupyter notebooks (it integrates with IPython with inline plotting) or to save graphsto a file.

2.5.6. Plotly

Plotly for Python [68] provides interactive plotting online and offline for analysispurposes. The library has all matplotlib types and more and can use D3 [8] andWebGL [25] for visualization.

2.5.7. Microsoft SQL Server

Data for grid input is stored in Microsoft’s SQL Server 2017 [57] database on-premise at Landsvirkjun. That is accumulated sum of grid input as a time series,that originates from SVEF/XX files (see Section 2.2). Unique constraints are set toprotect data integrity against duplication of a value written to the same hour fora particular power station. Related metadata is also stored in a table in the samedatabase e.g. unit of measurements, identification of a measurand, datetimes, andthe relevant power station.

24

2.5. Software Tools

2.5.8. PostgreSQL

Data for generation is stored in a PostgreSQL [58] database on-premise at Landsnet.That is generation data for each generator as a time series, that originates from aSupervisory Control And Data Acquisition (SCADA) system. The data values arewritten to the database in real-time. Related metadata is stored in a table in thesame database, e.g. unit of measurements, identification of a measurand, datetimes,and the power station. The interface for acquiring the raw data is by an SQL queryand is available near real-time.

2.5.9. KairosDB

KairosDB [29] is a time series management system implemented in Java. It providesREpresentational State Transfer (REST) and Telnet Application Programming In-terfaces (APIs) to interact with the system. The system also includes a Web userinterface (see Figure 2.15) that makes it easy to create queries against the RESTAPI. Data for generation is prepared and then is stored in KairosDB on-premise atLandsvirkjun.

As datastore, KairosDB can either use Cassandra or H2. Cassandra [19] is a columnbased key-value store, i.e. a Not Only SQL (NoSQL) database. It is a distributedand decentralized database system. That is, it can run on one server or multipleservers but to the end user it is unified and serves as a single endpoint like oneserver and has no single point of failure because every node is identical. Joins arenot possible to do within Cassandra but can be achieved manually on the clientside of implementation. In contrast to Cassandra, H2 is easier to setup (but slow)and a good option if development is desired without the overhead of setting up andrunning Cassandra.

2.5.10. Scikit-learn

Scikit-learn [66] is a machine learning library in Python. It includes many toolsfor data mining and data analysis that includes classification, regression, clustering,dimensionality reduction, model selection, and preprocessing.

In preprocessing for training with deep learning models, data values are typicallynormalized. Scikit-learn supports this via the MinMaxScaler class [75] for scalingbetween desired values. Normalizing the inputs by scaling can speed up trainingtime and minimize bias within the neural network [26].

25

2. Foundations

Figure 2.15: Web User Interface on KairosDB

2.5.11. Keras

The Keras library [34] presents a high level approach to deep learning which makesit easier for people to dive into the field. Keras, which means horn in Greek, is builton top of other lower level neural network libraries: Microsoft Cognitive Toolkit(CNTK) [56], Theano [80], or TensorFlow [1]. It needs at least one of these threeneural network backends to run. In the project covered in this thesis, TensorFlow isused. Keras can run seamlessly on CPU and GPU and supports both ConvolutionalNeural Networks (CNNs) and RNNs.

For modelling with Keras there are two options. One is using the already availableSequential model [33] and the other is to implement a new model. The Sequential

26

2.5. Software Tools

model implements a linear stack of layers. Within this model object there are acouple of methods:

• Layers are added to the model by calling the add method of the model instancein Keras

• The fit method has arguments x and y. Argument x is for input data andargument y is for target data, that is for historical data there are some inputsand outputs that can be evaluated as valid. On calling the fit method on themodel, it initializes an optimization function of our choosing. Training is onlydone for a predefined, fixed number of iterations on a dataset. The methodreturns a History [31] object, consisting of information on losses for each epochfor the training dataset and the test dataset

• The predict method generates output predictions. This method takes inputdata and returns NumPy array(s) [55] of predictions

Hyperparameters (i.e. parameters set before learning begins) are used to tune themodels (parameters determined prior to training). Some examples of hyperparam-eters include the following:

• Epochs [39] are a fixed number of iterations on a training dataset, that isnumber of times to pass over the training dataset [32]

• Units (i.e. neurons) is a positive integer number which represents dimension-ality of the output space

• Batch is a set of N samples (sample meaning one value per hour in this case)of a training dataset. One caveat to having a large batch is that only oneupdate is done to the model per batch [32]

• Activation [35] is a function to use for the LSTM layer of our network (seeSection 2.4.1). If no activation is applied (by passing in None), it is set aslinear i.e. f(x) = x

• Loss function (cost function/objective function/optimization score function/cri-terion/error function) is the function that the optimizer tries to minimize.Keras requires users to choose a loss function for each model [36]

• Batch size are number of samples for every gradient update

• Optimizer is an algorithm chosen to compile a Keras model. Keras requiresusers to choose an optimizer for each model [37]

• Shuffle is a Boolean parameter [39], if set to true, training data is shuffled

27

2. Foundations

after each epoch

Two different layers are used in the Sequential model:

• Dense layer implements y = activation(x · weights) + bias) where

– activation is a chosen activation function passed to Dense as an argument

– weights is a weights matrix created by the layer. Weights will be learnedby the learning algorithm by backpropagation through time [17]

– bias is a bias factor created by the layer. This can be disabled, but herethe bias factor will always be enabled

• The LSTM layer implements Long Short-TermMemory layer defined by Hochre-iter and Schmidhuber in 1997 [21]:

– LSTM layer offers two possible implementations, that is mode 1 [21] ormode 2. Mode 1 will structure operations as a larger number of smallerdot products and additions. Mode 2 will batch those in fewer largeroperations [38]

– Recurrent activation function is by default hard_sigmoid with imple-mentation 1 [21] until Keras version 2.3.0 which then uses sigmoid as therecurrent activation function and implementation 2 by default [38]

Adam [41] (the name is derived from adaptive moment estimation) is an optimizationalgorithm used for stochastic optimization. This optimization algorithm attempts tosolve a problem with small memory requirements and computes individual adaptivelearning rates (alpha/stepsize) for different parameters from estimates of gradients.The idea is to combine advantages of two popular optimization algorithms: AdaGradand RMSProp [41]. Both of these algorithms are available in Keras as optimizers,in fact Keras offers seven different optimizers [37]. Adam updates are estimated ona running average of the first and second moment of a gradient.

28

3. Methods

This chapter describes the methods developed in the project described in this thesis.The first section is an introduction to measurements regarding each power stationand the second section describes preparation done for the data. The last two sec-tions describe the Ratio method and the LSTM method developed in this thesisfor time series prediction. These are essentially the steps Data Understanding (Sec-tion 3.1), Data Preparation (Section 3.2), and Modeling (sections 3.3 and 3.4) ofCRoss-Industry Standard Process for Data Mining (CRISP-DM) [78].

3.1. Data Understanding

At first, the focus was on implementing prediction methods for the hydropowerstation Fljótsdalur, since that station could cause the most significant errors ofthe power stations in Landsvirkjun. Later, it was decided that it would be goodto add two more power stations, perhaps older or with different structure thanFljótsdalur to be able to evaluate the applicability of the developed approaches forpower stations that have different characteristics. By reviewing other stations it wasdecided that Ljósafoss and Hrauneyjafoss would be representative examples for othertypes of hydropower stations and have thus been added to the project described inthis thesis.

At each power station there are two important variables considered, i.e. the gridinput and the power generation at the power station. Both variables are derivedfrom summing up hourly measurements. Grid input measurements are for monthlysettlements of electric energy, i.e. how much electricity is fed in to the national gridoperated by the TSO Landsnet (see Section 2.1).

Generation measurements are less stable, i.e. they are more likely to fail and morelikely to include missing data, compared to the grid input. Reasons for this canbe many, e.g. it is possible that after generator maintenance, measurements arenot accounted for and the generation meters could still be disconnected after thegenerator is available again. There are also cases of IT network communicationproblems resulting in missing values, e.g. not being able to connect to a databasefor gathering generation values.

29

3. Methods

Total generation values are always greater than the values for grid input at the samehour if the clocks on the meters are in sync. The difference between these values iscaused by loss of energy between the generation and the grid input. Typically thereis less loss of energy in the summer time. These losses are mainly in the GSUs (seeSection 2.1). In addition, most power stations use electricity for lighting, pumps,etc. that is drained from online generators within the power station. In cases whengenerators are not providing enough electricity for own usage, it is drained from thegrid. In this case, measured grid input is even lower than zero (negative grid input).That said, it is possible that within a power station that has online generators, thegeneration measurements are above zero, while the grid input is negative.

Desired results from both the ratio method (see Section 3.3) and the LSTM method(see Section 3.4) are predictions of total grid input for a particular power stationby making use of generation values of that power station, from KairosDB (see Sec-tion 2.5.9).

The hourly data of the total generation values and the grid input varies betweenhours. This is because typically an electrical load profile for each day needs to be metby generation (eventually, grid input) by a power station. Of course this needs tobe in context as some power stations have a more stable electricity generation thanothers and other factors need to be taken into account, e.g. generator maintenance.

3.2. Data Preparation

Missing data is replaced by front fill, i.e. propagating the last valid observationforward to the next valid value. Front fill is the chosen method rather than back fill(from next valid back to the last valid observation) or an interpolation because inpractice, the next value is not yet available.

MinMaxScaler class is used for normalizing the data, on the scale from -1 to 1 forall three power stations before fitting (training) the data to the model with Keras.

Data for generation values is prepared by a query using KairosDB’s REST API(see Section 2.5.9). The raw generation values from each generator, stored inthe PostgreSQL (see Section 2.5.8), are aggregated from a 30 minute resolutionto hourly resolution by an SQL query and stored in KairosDB. Cassandra is usedunder KairosDB’s hood and no experimentation was done with H2 since Cassandrawas already set up and running on-premise at Landsvirkjun. As part of this projectKairosDB is set up where other systems (Microsoft SQL Server, PostgreSQL, andCassandra) existed already.

30

3.2. Data Preparation

The query in Listing 3.1 is a Python dictionary object which is used for gatheringhourly values as an accumulated sum of generation values from all the generatorsin KairosDB in a power station. The generation values are already stored as hourlyvalues for each generator for a particular power station (set by the station variable).The keys start_absolute and end_absolute are absolute date and time of a chosenperiod in POSIX time [54] i.e. epoch time format (in the listing the values for thekeys are set by the variables fromdate and todate). The metrics key is composedof three keys: name, tags, and aggregators. The name key refers to the name of ametric (where a metric in this case can refer to multiple time series) and the tagskey is a filter within that metric e.g. a power station. The aggregators key defines anaggregator [30], in this case the accumulated sum on hourly resolution for a powerstation.

Data integration is applied where all values are gathered into a Pandas’ DataFrameobject for each power station and written to a Comma-Separated Values (CSV) file,i.e. one file per power station. The files are copied to the file system of the relevantserver for each of the two methods.

Listing 3.1: KairosDB REST query{

"start_absolute ": int(fromdate.timestamp ())*1000 ,"end_absolute ": int(todate.timestamp ())*1000 ,"metrics ": [{

"name": "OSK.orkuvinnsla","tags": {" station ": station},"aggregators ": [{

"name": "sum","align_sampling ": True ,"sampling ": {

"value": "1","unit": "hours"

}}]

}]}

31

3. Methods

3.3. Ratio Method

The ratio method is a simple method that can be used for predictions. That is bylooking at the power station generation in relation to grid input, both measures canbe calculated for arbitrary periods of time e.g. per hour. It is possible to define therelationship as

r(t) = f(t)/g(t) (3.1)

where r(t) is the correction ratio, f(t) is the total grid input at time t, and g(t) isthe total generation of a power station at time t.

In Figure 3.1 one can see an orange line representing generated energy in Ljósafosspower station (see Section 2.1.1) and a blue line which shows grid input where valuesare according to the left y axis, using the unit “MW”. The difference between the twois mainly due to loss of power in GSUs and the power usage in the power station. Itis also worth mentioning that a portion of the difference is due to different accuracyin the different meters. The green line in the figure is the calculated correction ratiofor each hour where values are according to the secondary axis on the right side inthe figure labeled as “Ratio”. The figure shows seasonality i.e. there is less loss ofenergy in the summer time. A steep decline can be seen in the calculated correctionratio on August 18th and 19th. Speculating, a likely cause is maintenance in Írafosswhich is another power station nearby in Sog, i.e. the same river as Ljósafoss powerstation. As indicated in Equation 3.1 the ratio is a function of time as it can havee.g. seasonality, which will not be considered in this thesis but it is assumed to beconstant throughout the year, for simplicity. In addition, correction ratio is differentbetween each power station and therefore needs to be calculated for each station.

The correction ratio between grid input and generation is used for test evaluation.Two constants are presented, for simplicity. For the first constant the ratio is cal-culated for the last hour in the year 2016 (December, 31st at time 23:00). For thesecond constant the yearly average of the ratio in the year 2016 is calculated. Theresults in Section 5.3 show that both constants have good performance when com-paring the two methods. Each constant is used for the whole test period which isthe year 2017. For predicting grid input according to electrical generation, each con-stant is multiplied by electrical generation according to KairosDB at a given hour,i.e.

fcorrected(t) = r31.12.2016;23:00 ∗ g(t)

fcorrected(t) = r̄2016 ∗ g(t)

The ratio method can be very effective and efficient (in terms of computing power)but one has to keep in mind that it can be a naive approach to acquire a solutionwhich could possibly not be accurate enough.

The method could be expanded and explored further to return more accurate results

32

3.4. LSTM Method

Figure 3.1: Generation, grid input and correction ratio per hour in the year 2016for Ljósafoss (LJO) power station

e.g. study the seasonality or to define a moving average instead of using the last hourin the training dataset.

The ratio method is chosen for simplicity since the relation of the total generationand the grid input is known to be very close. The LSTM method described in thenext section is chosen to see what results can be achieved by making use of the mostrecent improvements in neural networks for time series predictions, with the cost ofa lot more computing power in comparison to the ratio method.

3.4. LSTM Method

The LSTM method is more complex than the ratio method. It needs a lot morecomputing power and it takes time to train a model. It is interesting to consider thatinformation is fed in to a model without analysing or defining the distribution ofthe data itself before modeling. As with all ANN-based approaches there are manyparameters (model layers setup and hyperparameters) to explore in this method.This project only explores and experiments with a tiny portion of this parameterspace and focuses on hyperparameter adjustments.

33

3. Methods

An additional feature is added to the training dataset in order to provide moreinformation to the model about the training dataset. An initial experiment was toset a seasonality identifier, i.e. a number to represent a season of a year (an integerin the closed interval [1− 4]). Conclusion was that data changes between values fora time series at hourly resolution was not frequent enough. Instead it was decidedto implement a feature representing the hour from the datetime of measurement (aninteger in the closed interval [0− 23]).

Before the training dataset is fed to the model, it is preprocessed by a MinMaxScaler(see Section 2.5.10) which normalizes all values of the set to a floating point valuein the closed interval [−1, 1]. Prediction results from the model are also returnedon a scale between -1 and 1. The results are then inversed by the MinMaxScaler toreceive the desired values i.e. without any normalization.

The model selected is a Sequential model (see Section 2.5.11). Model implementationis simple for most of the cases, that is two layers: LSTM and Dense. In the modelit is common to have these two layers in the order Input → LSTM → Dense →Output. Experimental parameter search was done by changing one or more of thefollowing parameters (see Section 2.5.11 for information on these hyperparameters):

• Number of epochs (int): argument of fit/training

• Batch size (int): argument of fit/training

• Shuffle (bool): argument of fit/training

• Number of LSTM units (int): argument of the LSTM layer

• Number of Dense units (int): argument of the Dense layer

• Activation function (string): argument of the Dense layer (default is linear)and the LSTM layer (default is tanh)

• Loss function (string): argument of compile

Hyperparameters are explored by editing one or more of the above (e.g. increment ordecrement number of epochs), which are configured in a dictionary object in Pythonfor a session of experimentation. To run the changed source code it is compiled tobyte-code with py_compile (see Section 2.5.1). This is done to catch syntax errorsbefore submitting a job to the scheduler of the HPC cluster used for training,

Each experiment is recorded and labeled by an identifier, i.e. the date and time of theexperiment, the model itself (with architecture, weights, training configuration, lossconfiguration, optimizer configuration, and the state of the optimizer), configured

34

3.4. LSTM Method

hyperparameters, and predictions for the test period.

Optimizer is set to Adam (see Section 2.5.11) for all runs. The reason for choosingthis algorithm is mainly popularity [3]. Other optimizers were tested as well but didnot yield better results. Activation functions (see Section 2.5.11) which have beentested are:

• ReLU(x) = max(0, x)

• linear(x) = x

• tanh(x) = ex−e−x

ex+e−x

• sigmoid(x) = 11+e−x

• softmax(x)i = exi∑j e

xj

Loss functions which have been tested are:

• mae(x, y) = ( 1n)∑n

i=1|yi − xi|

• mse(x, y) = ( 1n)∑n

i=1(yi − xi)2

• logcosh(x, y) =∑n

i=1 log cosh (xi − yi) =∑n

i=1 log ( exi−yi+e−(xi−yi)

2)

Each power station has a specific profile e.g. capacity, manufacturer, type etc., andthus a model is trained for each station.

The models are saved to a file and can therefore be loaded for later use, i.e. fortraining each model even further by feeding more historical data to the model astime passes.

35

4. Design and Implementation

This chapter describes some aspects of the design and implementation. Data struc-tures, storage of data, and implemented algorithms for the ratio method and theLSTM method are described in Section 4.1. Section 4.2 addresses library dependen-cies of the implementation. Section 4.3 describes details of the models used in theLSTM method.

4.1. Data Structures, Storage and ImplementedAlgorithms

Figure 4.1 shows a visual representation of a data flow. This data flow describeshow near real-time raw electrical generation data and settlement data for grid inputis retrieved from either a database or files, processed, and written to a database.The data flow shown in the figure is further described in the following paragraphs.A PostgreSQL database stores near real-time raw electrical generation data for eachgenerator of the three power stations. This database is hosted and maintainedat the TSO Landsnet (see Section 2.5.8). A computer server which has access tothe PostgreSQL database runs a scheduled job which is executed once per hour toupdate the KairosDB (see Section 3.2) with the aforementioned electrical generationdata. KairosDB provides a REST API which is used for reading and writing data,i.e. only depending on low-level network libraries. KairosDB has an underlyingdatabase, Cassandra, a NoSQL database which makes reading and writing the datafast. But this dependency is internal to KairosDB and not exposed to the outside.This data flow can be seen on the left side of Figure 4.1.

As described in Section 2.2, the raw SVEF/XX files, containing settlement data forgrid input are downloaded once per month from a third party, the TSO Landsnetand stored in a directory on a file system. A couple of hours later, another scheduledjob takes care of gathering the files from the directory, parsing the files and writingthe contents of each file to an on-premise SQL Server database at Landsvirkjun (seeSection 2.5.7). This data flow can be seen on the right side of Figure 4.1.

37

4. Design and Implementation

Figure 4.1: Data flow for electrical generation and grid input

Grid input is queried from an on-premise SQL Server database at Landsvirkjun(as mentioned in the previous paragraph) and the energy generation time seriesare queried from KairosDB. These energy generation and grid input time series arethen combined, using the Pandas library (see Section 2.5.4), into flat CSV files. CSVfiles are convenient for portability, human readability, and for easy serialization anddeserialization. By storing the electrical generation and the grid input in files it ispossible to simply load the data for every experiment from a file instead of queryingthe database multiple times. This is also important for easy file transfers from oneserver to another server via the Secure Copy Protocol (SCP) [72].

The CSV format integrates nicely with Pandas. The CSV files are parsed toDataFrames (deserialized) with Pandas which are split into training dataset andtest dataset (this can be done by Keras by passing the validation_split argument tothe fit method on the model). The dataset is a set of three years, i.e. 2015–2017. Itis sliced to two years of data for training (years 2015–2016) and one year for testing(year 2017).

The dataset, consisting of three years, is stored in CSV files, one file for each powerstation. The predictions are saved to CSV files with the calculated correction ratioused on hourly resolution.

38

4.1. Data Structures, Storage and Implemented Algorithms

4.1.1. Ratio Method

The data structure used for the ratio method is simple, i.e. DataFrames. The datasetfor all three years is loaded from the CSV file to a DataFrame.

Pre-processing is done by taking all Not a Number (NaN) values and imputing them.This is done by calling the fillna method on the DataFrame object with ffill [65] asthe chosen method, i.e. propagating the last valid observation forward to next validvalue.

Next the ratio itself is calculated (i.e. correction ratio, see Section 3.3) to providetwo constants, where the first constant is calculated for the last hour of the trainingperiod (in 2016, December, 31st at time 23:00) and the second constant is the yearlyaverage of the ratio in 2016, assigned as a column of datatype float32 within theDataFrame. This is because the DataFrame object allows multiplying or dividingone column by another column which results in a Series (see Section 2.5.4) objectwith the results of each row in the object based on associated index value, with thesame time resolution.

4.1.2. LSTM Method

For the LSTM method all three years are loaded from the CSV file, as described inthe previous section, to a DataFrame and pre-processed by calling fillna with ffill.All values except the index are extracted from the DataFrame to a list. The list iscast to type float32 to make sure the same type is used for all inputs and all outputs.

Next, the MinMaxScaler (see Section 2.5.10) is used to normalize values for theinput data (generation from KairosDB and an hour of measurement) and returns aNumPy array [76]. The same is done for the output data (grid input).

The dataset is then split into a training dataset and a test dataset and is reshapedinto two three-dimensional NumPy array objects to be passed to the model. Thefirst layer in the model is the LSTM layer which expects the input shape of the inputdata to be three-dimensional [38] and needs to know the number of timesteps (i.e.the number of timesteps is the number of input values used to predict one outputvalue [5]) and the number of features beforehand, which is passed to the layer as atuple. The next step in the algorithm is the neural network design which consistsof add calls for the two layers in the model with arguments of configured numberof LSTM units, shape of training data (i.e. input_shape argument), and configuredactivation function.

39

4. Design and Implementation

In most of the LSTM parameter search experiments, an activation function is cho-sen and passed to the Dense layer and the activation function in the LSTM layeris set as default to tanh. Since Graphics Processing Units (GPUs) are used, amulti_gpu_model [40] (only available when TensorFlow is chosen as a backend inKeras) is compiled with a chosen loss function, and the optimizer Adam.

The configured batch size needs to be a multiple of the number of GPUs. Thisis because the model’s inputs are split into multiple sub-batches of the configuredbatch size for each GPU to process. After processing each sub-batch in a GPU theyare combined again to the configured batch size by a Central Processing Unit (CPU)(i.e. following the MapReduce [9] model).

The training is initiated by calling the fit method and passing in the three-dimensionalinput data (i.e. reshaped as (normalized values, timesteps, features)), stored in aNumPy array which holds the normalized input values. It is set to print on ver-bosity level 2, i.e. print one line per epoch (see Section 2.5.11 for information onepochs). When the training is finished, a History [31] object is returned from thefit method.

The History object holds valuable information on training (loss) and test error(a key defined as val_loss) per epoch in the history attribute of that object (i.e.History.history). These are plotted by the matplotlib library (see Section 2.5.5) andsaved to a PNG file.

The results from the LSTM method are saved as session batches (i.e. for each experi-ment). Each session uses predefined hyperparameters (i.e. parameters set before thelearning process is initiated, see Section 2.5.11) and are tested against each powerstation. This makes it easier to review results from a certain session and evaluatethe results. Every session batch has an identification string which is the time of ex-ecution formatted as YYYYmmDDHHMMSS (i.e. year, month, day, hour, minutes,and seconds) where the string is included in the filename unless otherwise stated.This is to keep track of the files that are related to each LSTM parameter searchexperiment. Every session batch includes the following files:

• a configuration file in JavaScript Object Notation (JSON) [27] format thatholds valuable information on LSTM settings such as number of epochs, batchsize, activation function, loss function, etc.

• an error results file in JSON format that holds Root Mean Square Error(RMSE) and Mean Absolute Error (MAE) for each power station

• a Hierarchical Data Format (HDF) (version 5, i.e. h5 file extension) model filesaved by the Keras library, i.e. one for each power station (see Section 3.4)

40

4.2. Library Dependencies

• a CSV file for every power station with inputs, timestamps, and predictionvalues by the model

• a CSV file for every power station with inputs, timestamps, and predictionvalues by the model (the file name for this file does not include the identifi-cation string described above but instead the string newest is included in thefile name and the file is overwritten if it already exists)

• a PNG image file with a plot of History.history for the session, i.e. trainingerror and test error per epoch

4.2. Library Dependencies

Python modules are installed in a virtual environment, which is invoked beforeexecution. The implementation depends on the following libraries:

• Python 3.6.1

• Keras 2.1.3

• h5py 2.7.0

• TensorFlow 1.4.1

• CUDA 8.0.61

• cuDNN 6.0

• GCC 4.8

• NumPy 1.15.2

• matplotlib 3.1.1

• sklearn 0.20.0

Since the Keras version used is 2.1.3, hard_sigmoid is used as the recurrent activa-tion function and implementation mode is set to 1 (see Section 2.5.11), in the LSTMlayer which is set by default in that version.

41

4. Design and Implementation

4.3. Keras Models

The recurrent activation function used is set to the default value in Keras 2.1.3, i.e.hard_sigmoid, and implementation mode 1 (see Section 2.5.11 for information onchanges to this in other, more recent versions of the Keras library).

Recorded LSTM parameter search experiments are forty-four for each of the threepower stations, where each experiment changes one or more of the configured pa-rameters (see Section 3.4 for information on hyperparameters and relevant objectsfor each hyperparameter).

All LSTM parameter search experiments that were recorded for Ljósafoss, Hrauney-jafoss and Fljótsdalur are in Appendix A (see Table A.1 for experiments withLjósafoss data, see Table A.2 for experiments with Hrauneyjafoss data, and seeTable A.3 for experiments with Fljótsdalur data). Recorded parameter keys foreach LSTM parameter search experiment are the following (see Section 2.5.11 formore information):

• epochs represents the number of epochs defined

• batchsize represents the size of each batch to train (i.e. each GPU will trainone sub-batch at a time, see Section 4.1.2)

• shuffle indicates if training data should be shuffled between epochs

• units represents the number of units in the LSTM layer

• LSTMactivation represents the activation function applied in the LSTM layer(if it is not present, the default is applied i.e. tanh)

• firstdenseunits represents the number of units in the Dense layer, subsequentto the LSTM layer (if it is not present, it is set to 1)

• seconddenseunits represents the number of units in the second Dense layer,subsequent to the first Dense layer (if it is not present, there is no secondDense layer in the model)

• firstactivation represents the applied activation function to the first Denselayer

• secondactivation represents the applied activation function to the second Denselayer (if it is not present, there is no second Dense layer in the model)

42

4.3. Keras Models

• loss represents the applied loss function to optimize

• optimizer represents the chosen optimizer

Models with three layers are the most common model structure in the experiments.Models with two layers and four layers were explored as well, for evaluating theinfluence of the number of layers. Figure 4.2 shows one of the many trained models,plotted by calling the plot_model function (from keras.utils [40]) with the modelobject as an argument, for one of the many LSTM parameter search experiments(see Section 3.4). The boxes on the far left show the name of each layer and itstype (i.e. <layer name>:<layer type>). The boxes on the far right are tuples thatdescribe the shape of the data i.e. the dimensions of the NumPy arrays [76]. Theoutput shape of one layer needs to match the input shape of the next layer.

Figure 4.2: Keras model with three layers

The shape of the input NumPy arrays passed to the InputLayer (see Figure 4.2),which is always the first layer, is (None, 1, 2) in all of the recorded test cases. Bybreaking the tuple down, the first element (i.e. None) describes the length of thearray which is arbitrary (i.e. if set to None, any batch size is allowed), the secondelement in the tuple is the number of timesteps (see previous Section 4.1.2), andthe third element is the number of features (i.e. generation values and the hour ofsample, see Section 3.4 for further information). The layer then reshapes the outputto the number of units defined as one of the hyperparameters.

Figure 4.3 shows another model implemented with two layers, i.e. one added to themodel subsequent to the default InputLayer (lstm_1_input). In this case the pa-rameter LSTMactivation mentioned in the details column is the activation functionpassed to the LSTM layer. That parameter is only used in models with two layersin total as shown in Figure 4.3.

Figure 4.4 shows a model implemented with four layers. In this case there are fourrelevant parameters in the details column worth mentioning:

43

4. Design and Implementation

Figure 4.3: Keras model with two layers

• firstdenseunits is the number of units passed to the first Dense layer, i.e.subsequent from the LSTM layer (e.g. lstm_2)

• seconddenseunits is the number of units passed to the second Dense layer, i.e.subsequent from the Dense layer (e.g. dense_3)

• firstactivation is the applied activation function to the first Dense layer

• secondactivation is the applied activation function to the second Dense layer

The parameters seconddenseunits and secondactivation are only used in models withfour layers in total as shown in Figure 4.4.

Figure 4.4: Keras model with four layers

The process of reshaping the NumPy arrays, passed to the InputLayer was challeng-ing at first. Listing 4.1 shows a ValueError exception raised at runtime for one ofthe LSTM parameter search experiments. The error is raised by Keras which makessure that the shape of the inputs is according to expectations. This is important for

44

4.3. Keras Models

the outputs of the current layer which are passed as inputs to subsequent layers. TheValueError reports that a layer with the name lstm_1 expected a three-dimensionalarray but an array of only two dimensions was passed to that layer. The absolutepaths for each file shown in the listing are omitted.

Listing 4.1: ValueError exception on shape of dataTraceback (most recent call last):

File "/.../ KerasLSTM.py", line 231, in <module >activation=config[’LSTMactivation ’]

File "/.../ keras /2.1.3/ lib/python3 .6/site -packages/keras/models.py", line492, in add

output_tensor = layer(self.outputs [0])File "/.../ keras /2.1.3/ lib/python3 .6/site -packages/keras/layers/recurrent

.py", line 488, in __call__return super(RNN , self).__call__(inputs , ** kwargs)

File "/.../ keras /2.1.3/ lib/python3 .6/site -packages/keras/engine/topology.py", line 573, in __call__

self.assert_input_compatibility(inputs)File "/.../ keras /2.1.3/ lib/python3 .6/site -packages/keras/engine/topology.

py", line 472, in assert_input_compatibilitystr(K.ndim(x)))

ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3,found ndim=2

The NumPy random seed [77] is set to a constant of 4 for reproducibility and fora fair comparison of different parameters [5]. Later it was discovered that this isnot enough for making reproducible prediction results from a Keras model [4]. It ishard to keep track of which library is using which seed, e.g. TensorFlow has its ownrandom seed function, tf.random.set_random_seed.

The time which is spent on training depends much on the batch size. The bigger thebatch size is, the quicker model training is, since weights are only updated once foreach batch (see Section 2.5.11). Since the history attribute described in Section 4.1.2is not saved per batch it is challenging to gather the time spent on training eachmodel. However, the verbose level passed to the fit function reports seconds spenton each epoch which is written to the standard output (stdout) log. The log isparsed and searched by a regular expression to gather information on the time spenton training the models. Please consider that this can be biased since the logs areparsed without considering if the job finished without raising an exception or not.Figure 4.5 shows a boxplot representing time spent in training an epoch over allexperiments. Statistics for that data is the following:

• the total number of epochs for training 11,563

• the mean 38.99

• the standard deviation (sd) 47.63

45

4. Design and Implementation

• the minimum value is 0

• the maximum value is 459

• the median value is 6

Figure 4.5: Boxplot of training time per epoch

46

5. Results and Evaluation

The chapter discusses the acquired results from the two methods described in Chap-ter 3 and their evaluation. There are two metrics used for evaluation of results, RootMean Square Error (RMSE) and Mean Absolute Error (MAE). Both metrics arenegatively oriented, i.e. lower values are better. Between RMSE and MAE values,results are evaluated better for a lower RMSE value. The two metrics have beencompared [6] where a RMSE can be more appropriate for model performance thanMAE [85] whereas MAE is a more natural measure of average error. Both arepresented to consider both the aspect of average error and the model performance.

In this chapter, there are three sections. Section 5.1 covers a short recap of theapplied ratio method and its results and evaluation for each power station. Sec-tion 5.2 covers the LSTM method, its experiments, hyperparameters, results, andevaluation. Section 5.3 discusses the experiments, results, and the two metrics willbe explored and evaluated in the section. The problem which is solved by these ex-periments can be typed as a regression problem and both methods rely on historicaldata for their development.

5.1. Ratio Method

The ratio method is about using a calculated correction ratio from historical data(see Section 3.3) where all missing data is replaced by a front fill method (seeSection 3.2). The correction ratio is r(t) = f(t)/g(t), where r(t) is the correctionratio, f(t) is the accumulated sum of grid input, and g(t) is the accumulated sum ofelectrical generation, at hour t. Two constants are calculated for each power stationwhere the first constant has t set to the last hour in the year 2016 (which is the lasthour of the training dataset) and the second constant is the yearly average of theratio in the year 2016. The correction ratio is assumed to be constant for the wholetest period, i.e. the year 2017.

Two experiments are applied to each power station. Table 5.1 presents the errorresults for the test period, i.e. showing the Root Mean Square Error (RMSE) andthe Mean Absolute Error (MAE) for the ratio method where the grid input valuesare predicted from the generation values. As these errors depend on the absolute

47

5. Results and Evaluation

Table 5.1: Results for the ratio method and statistics

Station Constant RMSE MAE avg sd

Ljósafoss Last hour in 2016 0.084 0.057 12.914 1.176Yearly average of 2016 0.067 0.033

Hrauneyjafoss Last hour in 2016 6.294 1.090 155.377 29.420Yearly average of 2016 33.954 33.145

Fljótsdalur Last hour in 2016 2.682 0.715 575.735 27.642Yearly average of 2016 2.678 0.543

value, the average (avg) and standard deviation (sd) of the grid input for the threepower stations in the test period are shown as well. The table shows a lower RMSEand a lower MAE for Ljósafoss by the yearly average constant, compared to the lasthour constant. For Hrauneyjafoss the RMSE and the MAE are lower by using thelast hour constant, compared to the yearly average constant. For Fljótsdalur, theRMSE and the MAE are lower by using the yearly average of 2016, compared to thelast hour constant.

5.2. LSTM Method

The LSTM method has many configuration combinations of hyperparameters thatcan affect a model and the performance of that model. In the LSTM parametersearch experiments in this thesis, a few selected parameters are explored.

The training period is the first 2 years of data, 2015–2016. The test period is theyear 2017. All missing data is replaced before training, by using a front fill method(see Section 3.2).

Training the neural networks is done on the technology pilot HPC computer sys-tem JURON [13]. It is located at the Jülich Supercomputing Centre (JSC) ofForschungszentrum Jülich reseach centre, Germany. It is one of several systemsthat are a part of the Human Brain Project [22]. It consists of 18 IBM S822LCnodes where each node is equipped with four NVIDIA Tesla P100 GPUs and eachGPU is equipped with 16 GByte High Bandwidth Memory (HBM). Only one sharednode is used for training. The training is done by implementing and training (i.e.fitting to a dataset) a Keras model (see Section 2.5.11 and Section 3.4) on JURON.

All recorded LSTM parameter search experiments are available in Appendix A. Inthe following three sections, the ten experiments which have the lowest RMSE foreach of the three power stations are discussed. These ten experiments span 22.7%

48

5.2. LSTM Method

of all the recorded LSTM parameter search experiments.

5.2.1. Ljósafoss Power Station

Figure 5.1 shows the ten best results for Ljósafoss power station which were acquiredfrom the LSTM parameter search experiments (see Section 3.4), with the lowest scoreof RMSE on the far left on the figure. The orange bar represents the calculatedRMSE and the blue bar represents the calculated MAE. In the figure the RMSEand MAE seem to co-relate, hence it can be argued that large errors [6] are likely notpresent in the top 10 results. The parameters leading to these results are discussedin the following paragraphs.

Figure 5.1: Keras session batches for Ljósafoss power station

Table 5.2 gives an overview of the hyperparameters that were unchanged in the top10 runs that led to lowest RMSE. First dense units is set to 1, activation functionis set to linear, and optimizer is set to Adam for all the top 10 experiments.

Parameters that were varied in the top 10 experiments are shown in Table 5.3. Theexperiments in the table are in the same order as in Figure 5.1. The differencebetween rows in the table is as follows:

49

5. Results and Evaluation

Table 5.2: Unchanged hyperparameters in the top 10 session batches for Ljósafosspower station

Feature Representation/Value

First dense units 1Activation linearOptimizer Adam

• between rows 1 and 2 the batch size is increased from 6 to 24

• between rows 2 and 3 the number of units is increased from 2 to 8

• between rows 3 and 4 the number of units is increased from 8 to 64

Table 5.3: Keras session batches with configured hyperparameters for Ljósafoss powerstation

Row id MAE RMSE Hyperparameters

1 11 0.031 0.066 epochs: 150, batch size: 6, shuffle: True,units: 2, loss: MSE

2 23 0.032 0.066 epochs: 150, batch size: 24, shuffle: True,units: 2, loss: MSE

3 39 0.032 0.067 epochs: 150, batch size: 24, shuffle: True,units: 8, loss: MSE

4 17 0.033 0.067 epochs: 150, batch size: 24, shuffle: True,units: 64, loss: MSE

5 24 0.033 0.067 epochs: 150, batch size: 48, shuffle: True,units: 64, loss: MSE

6 5 0.035 0.069 epochs: 150, batch size: 48, shuffle: True,units: 2, loss: MSE

7 1 0.036 0.069 epochs: 150, batch size: 12, shuffle: True,units: 2, loss: MSE

8 27 0.036 0.069 epochs: 150, batch size: 12, shuffle: True,units: 2, loss: logcosh

9 29 0.039 0.075epochs: 25, batch size: 1, shuffle: False,

units: 128, second dense units: 1,loss: MSE

10 8 0.043 0.076 epochs: 200, batch size: 1, shuffle: False,units: 1, loss: MSE

50

5.2. LSTM Method

• between rows 4 and 5 the batch size is increased from 24 to 48

• between rows 5 and 6 the number of units is decreased from 64 to 2

• between rows 6 and 7 the batch size is decreased from 48 to 12

• between rows 7 and 8 the loss function is changed from MSE to logcosh

• between rows 8 and 9 the number of epochs is decreased from 100 to 25, thebatch size is decreased from 12 to 1, shuffle is inverted, the number of unitsis increased from 2 to 128, the loss function is changed from logcosh to MSE,and row 9 has a second dense layer with 1 unit

• between rows 9 and 10 the number of epochs is increased from 25 to 200,number of units is decreased from 128 to 1, and row 10 has no dense layer

For the top 10 results, one can argue that there is an indication of less impactby both the number of units and the batch size since those are the most commonchanges between the experiments with the lowest calculated RMSE value. If onewould emphasize on achieving a lower MAE rows 1 to 5 in Table 5.3 would still beevaluated with the same rank since they also have the lowest MAE of all experiments(see Table A.1 in Appendix A for details).

The best acquired results of the experiments for Ljósafoss were achieved on theexperiment with id 11 out of all forty-four experiments made for Ljósafoss. Theexperiment with id 11 is configured for hyperparameters of 150 epochs, a batchsize of 6, applying shuffle on the training dataset after each epoch, 2 LSTM units,activation function linear, and the loss function Mean Squared Error (MSE).

5.2.2. Hrauneyjafoss Power Station

Figure 5.2 shows the top ten results for Hrauneyjafoss power station which weregathered from LSTM parameter search experiments (see Section 3.4), where theexperiment on the left of that figure has the lowest calculated RMSE value. Theorange bar is the RMSE metric and the blue bar is the MAE metric. In the diagram,the first eight experiments (ids 31, 26, 21, 33, 3, 28, 18, and 43) form a cluster andthe last two (ids 11 and 1) form another cluster in terms of RMSE and MAE. Thefirst cluster has a common hyperparameter configuration i.e. setting batch size to1 and shuffle disabled and the second cluster has epochs set to 150, units set to 2,and shuffle enabled in common.

Table 5.4 shows the unchanged hyperparameters for these ten experiments. First

51

5. Results and Evaluation

Figure 5.2: Keras session batches for Hrauneyjafoss power station

Table 5.4: Unchanged hyperparameters in the top 10 session batches for Hrauney-jafoss power station

Feature Representation/Value

First dense units 1Optimizer Adam

dense units is set to 1 and optimizer is set to Adam for all the top 10 experiments.

Variable parameters are shown in Table 5.5, sorted descending by RMSE (i.e. in thesame order as in Figure 5.2). The difference from one row to the next one is:

• between rows 1 and 2 the loss function is changed from logcosh to MSE

• between rows 2 and 3 the activation function is changed from linear to tanhand loss function is changed from MSE to logcosh

• between rows 3 and 4 the activation function is changed from tanh to linear,loss function is changed from logcosh to MSE, and row 4 has a second denselayer

52

5.2. LSTM Method

Table 5.5: Keras session batches with configured hyperparameters for Hrauneyjafosspower station

Row id MAE RMSE Hyperparameters

1 31 0.421 1.651 epochs: 25, batch size: 1, shuffle: False, units: 256,first activation: linear, loss: logcosh

2 26 0.492 1.664 epochs: 25, batch size: 1, shuffle: False, units: 256,first activation: linear, loss: MSE

3 21 0.468 1.862 epochs: 25, batch size: 1, shuffle: False, units: 256,first activation:: tanh, loss: logcosh

4 33 0.473 2.067epochs: 25, batch size: 1, shuffle: False, units: 256,second dense units: 1, first activation:: linear,

second activation: linear, loss: MSE

5 3 0.731 2.087epochs: 25, batch size: 1, shuffle: False, units: 512,

second dense units: 1, first activation:: linear,second activation: linear, loss: MSE

6 28 0.751 2.224 epochs: 10, batch size: 1, shuffle: False, units: 256,first activation:: linear, loss: logcosh

7 18 0.581 2.346epochs: 25, batch size: 1, shuffle: False, units: 256,

second dense units: 1, first activation: linear,second activation: tanh, loss: MAE

8 43 0.641 2.369 epochs: 10, batch size: 1, shuffle: False, units: 128,first activation:: tanh, loss: logcosh

9 11 1.317 5.978 epochs: 150, batch size: 6, shuffle: True, units: 2,first activation:: linear, loss: MSE

10 1 1.422 5.986 epochs: 150, batch size: 12, shuffle: True, units: 2,first activation:: linear, loss: MSE

• between rows 4 and 5 the number of units is increased from 256 to 512

• between rows 5 and 6 the number of epochs is decreased from 25 to 10, numberof units is decreased from 512 to 256, and the loss function is changed fromMSE to logcosh

• between rows 6 and 7 the number of epochs is increased from 10 to 25, theloss function is changed from logcosh to MAE, and row 7 has a second denselayer

• between rows 7 and 8 the number of epochs is decreased from 25 to 10, thenumber of units is decreased from 256 to 128, the activation function is changedfrom linear to tanh, and the loss function is changed from MAE to logcosh

53

5. Results and Evaluation

• between rows 8 and 9 the number of epochs is increased significantly from 10to 150, the batch size is increased from 1 to 6, shuffle is inverted, the numberof units is decreased significantly from 128 to 2, the activation function ischanged from tanh to linear, and the loss function is changed from logcosh toMSE

• between rows 9 and 10 the batch size is increased from 6 to 12

One could draw the conclusion that the loss function and the activation function hasless impact for getting better predictions from a model. Another hyperparameterthat one could argue to have less impact, according to the data in Table 5.5, is thenumber of units rather than the number of epochs since rows 9 and 10 have thehighest RMSE of the top ten results. Row 9 has exactly the same hyperparametersthat are evaluated as the best hyperparameters for Ljósafoss (see Section 5.2.1).

The best model evaluated has 25 epochs, a batch size of 1, without shuffling thetraining dataset between epochs, the number of LSTM units set to 256, the activa-tion function set to linear, and the loss function set to logcosh. It has RMSE valueof 1.651 and MAE value of 0.421 (see Table 5.5). Considering the lowest MAE of allthe experiments, row 1 would still be considered the best since it has both the lowestRMSE and MAE. Rows 2 and 4 would trade places and the second best would berow 3.

5.2.3. Fljótsdalur Power Station

The ten best results from the LSTM parameter search experiments (see Section 3.4)for Fljótsdalur power station are shown in Figure 5.3 where the lowest RMSE valueis on the far left and the highest value is on the far right in the figure. The orangebar is the RMSE metric and the blue bar is the MAE metric. The figure showsthat in these top 10 results, the MAE varies more than the RMSE, i.e. the averageerror over all the test period is different between the experiments and a few largeerrors [6] are likely to be present.

Table 5.6: Unchanged hyperparameters in the top 10 session batches for Fljótsdalurpower station

Feature Representation/Value

Optimizer Adam

Table 5.6 shows unchanged hyperparameters for the top ten experiments. Only oneparameter is unchanged for all the top 10 experiments, i.e. optimizer is set to Adam.Other parameters vary between the top 10 experiments.

54

5.2. LSTM Method

Figure 5.3: Keras session batches for Fljótsdalur power station

Variable parameters are shown in Table 5.7, sorted descending by RMSE in the sameorder as in Figure 5.3. The difference from one row to the next one is:

• between rows 1 and 2 the batch size is increased from 12 to 24

• between rows 2 and 3 shuffle is inverted, and the number of units is decreasedfrom 64 to 2

• between rows 3 and 4 the shuffle is inverted, the number of units is increasedfrom 2 to 8

• between rows 4 and 5 the number of units is decreased from 8 to 2

• between rows 5 and 6 the batch size is decreased from 24 to 6

• between rows 6 and 7 the batch size is increased from 6 to 12

• between rows 7 and 8 the loss function is changed from MSE to logcosh

• between rows 8 and 9 the batch size is increased from 12 to 48 and the loss

55

5. Results and Evaluation

Table 5.7: Keras session batches with configured hyperparameters for Fljótsdalurpower station

Row id MAE RMSE Hyperparameters

1 24 0.785 2.667 epochs: 150, batch size: 12, shuffle: True, units: 64,first dense units: 1, first activation:: linear, loss: MSE

2 17 0.605 2.683 epochs: 150, batch size: 24, shuffle: True, units: 64,first dense units: 1, first activation:: linear, loss: MSE

3 14 0.662 2.685 epochs: 150, batch size: 24, shuffle: False, units: 2,first dense units: 1, first activation:: linear, loss: MSE

4 39 0.593 2.686 epochs: 150, batch size: 24, shuffle: True, units: 8,first dense units: 1, first activation:: linear, loss: MSE

5 23 0.605 2.691 epochs: 150, batch size: 24, shuffle: True, units: 2,first dense units: 1, first activation:: linear, loss: MSE

6 11 0.608 2.692 epochs: 150, batch size: 6, shuffle: True, units: 2,first dense units: 1, first activation:: linear, loss: MSE

7 1 0.552 2.708 epochs: 150, batch size: 12, shuffle: True, units: 2,first dense units: 1, first activation:: linear, loss: MSE

8 27 0.552 2.708 epochs: 150, batch size: 12, shuffle: True, units: 2,first dense units: 1, first activation:: linear, loss: logcosh

9 5 0.654 2.717 epochs: 150, batch size: 48, shuffle: True, units: 2,first dense units: 1, first activation:: linear, loss: MSE

10 16 1.170 2.759 epochs: 200, batch size: 1, shuffle: False, units: 1,LSTM first activation:: linear, loss: MSE

function is changed from logcosh to MSE

• between rows 9 and 10 the number of epochs is increased from 150 to 200, thebatch size is decreased from 48 to 1, shuffle is inverted, the number of units isdecreased from 2 to 1, the activation function applied in the LSTM is changedfrom tanh to linear and there is no Dense layer in row 10

Rows 1 to 9 have similar hyperparameters with only a few changes between eachrow. Row 10 has a different model structure, as well as there is no dense layer, wheresome of the experiments in Section 5.2.2 and Section 5.2.1 have two dense layers.Row 6 has the same hyperparameters as the best results in Section 5.2.1.

When considering a lower MAE better than a lower RMSE, row 7 would be at thetop in first place and row 8 would be in second place, row 1 would be in ninth placeand row 2 would be in the fourth place.

56

5.3. Discussion and Comparison

5.3. Discussion and Comparison

In this chapter, forty-six experiments (i.e. two for the ratio method and forty-fourdistinctly different experiments in the LSTM method) are presented which are ap-plied to each of the three power stations. For all the experiments, the two metricsRMSE and MAE, are calculated for the three power stations.

Results presented here need to be considered by different accuracy in grid inputmeters in each power station. Meter accuracy should be put in to context for theevaluation which can be multiplied by the available electricity generation capacity(accuracy error should not be able to exceed that number):

• Worst value of the grid input meters accuracy for Ljósafoss is 0.5% i.e. 0.077 MWif that is multiplied by the capacity of the station, 15.3 MW (see Table 2.2 inSection 2.1.1)

• Worst value of accuracy in the grid input meters for Hrauneyjafoss is 0.2% i.e.0.420 MW if that is multiplied by the capacity of the station, 210 MW (seeTable 2.4 in Section 2.1.2)

• Worst value of the grid input meters accuracy for Fljótsdalur is 0.2% i.e.1.380 MW if maximum value of accuracy is multiplied by the capacity of thestation, 690 MW (see Table 2.6 in Section 2.1.3)

For Ljósafoss, the calculated results by the ratio method, using the last hour in theyear 2016, the RMSE is 0.084 which ranks in eighteenth place out of the forty-sixexperiments. That performance is better than approximately 60.87% of all experi-ments presented here for Ljósafoss. Evaluating by the lowest MAE, calculated by theratio method, with a value of 0.057 ranks in twenty-fourth place out of the forty-sixexperiments made for Ljósafoss, i.e. better than 47.83% out of all the experiments.The calculated results using the yearly average in the year 2016, the RMSE is 0.067which ranks in third place out of the forty-six experiments. That performance isbetter than 93.48% of all experiments presented here for Ljósafoss. Evaluating bythe lowest MAE, calculated by the ratio method, with a value of 0.033 ranks in sixthplace out of the forty-six experiments made for Ljósafoss, i.e. better than 86.96%out of all the experiments.

Calculated RMSE by the ratio method, using the last hour in the year 2016, forHrauneyjafoss is 6.294 which is ranked in thirteenth place out of the forty-six ex-periments. This evaluates as a better performance than 71.74% of all forty-sixexperiments made for Hrauneyjafoss. The calculated MAE by the ratio method, is1.090 which ranks in the eighth place over the forty-six which evaluates better than82.61% of all performed experiments for Hrauneyjafoss. Using the yearly average in

57

5. Results and Evaluation

the year 2016, for Hrauneyjafoss is 33.954 which is ranked in thirty-eighth place outof the forty-six experiments. This evaluates as a better performance than 17.39%of all forty-six experiments made for Hrauneyjafoss. The calculated MAE by theratio method, is 33.145 which also ranks in the thirty-eighth over all the forty-sixexperiments.

Calculating the RMSE by the ratio method for Fljótsdalur, using the last hour inthe year 2016, resulted as 2.682 which ranks in second place out of the forty-sixexperiments. The performance evaluates as better than 95.65% of all the forty-sixexperiments made for Fljótsdalur. Ranking by MAE, the calculated MAE by theratio method resulted with 0.715 which ranks in eight place over all, which is betterthan 82.61% of all performances regarding in the experiments. Using the yearlyaverage in the year 2016, the RMSE resulted as 2.678 which ranks in second placeout of the forty-six experiments. The performance evaluates as better than 95.65% ofall the forty-six experiments made for Fljótsdalur. Ranking by MAE, the calculatedMAE using the yearly average, resulted with 0.543 which ranks in first place overall, which is better than 97.83% of all performances regarding in the experiments.

The LSTM method applies forty-four experiments, for each of the three power sta-tions, each with unique hyperparameters. A couple of them involve adding or re-moving a layer from the model (see Section 4.1.2).

The lowest calculated RMSE by the LSTMmethod for Ljósafoss is 0.066 which ranksin first place. This performance evaluates as better than 97.83% of all the appliedexperiments. By evaluating the lowest calculated MAE by the LSTM method,Ljósafoss has a value of 0.031 which ranks in first place. The same experimenthas the lowest RMSE and the lowest MAE. The performance evaluates better thanall other forty-four experiments made for Ljósafoss.

The lowest calculated RMSE by the LSTM method for Hrauneyjafoss is 1.651 whichranks in first place out of the forty-six experiments. This performance evaluatesas better than 97.83% of all the applied experiments. By evaluating the lowestcalculated MAE by the LSTM method, the power station has a value of 0.421which ranks in first place. The same experiment has the lowest RMSE and the lowestMAE. The performance evaluates better than all other forty-four experiments madefor Hrauneyjafoss.

The lowest calculated RMSE by the LSTM method for Fljótsdalur is 2.667 whichranks in first place. This performance evaluates as better than 97.83% of all the ap-plied experiments. By evaluating the lowest calculated MAE by the LSTM method,Fljótsdalur has a value of 0.552 which ranks in first place. The performance evalu-ates better than all other forty-four experiments made for Fljótsdalur.

The recommended method is the LSTM method since it presents a lower RMSE and

58

5.3. Discussion and Comparison

MAE overall, for all three power stations, with the costs of both human resourcesand hardware resources, i.e. for training and hyperparameter exploring. The ratiomethod should be easier to implement for other than the three power stations pre-sented in this thesis as the method used for providing a prediction is a simple formula(see Section 3.3). The ratio method only requires two experiment with the simpleapproach of calculating the correction ratio which is then assumed as a constant forthe whole test period.

Note that also a couple of LSTM parameter search experiments have been madethat were executed with the Rectified Linear Unit (ReLU) activation function (seeSection 3.4 for function descriptions and results). Two experiments pass the acti-vation function to the LSTM layer where the other thirteen experiments pass thefunction to the Dense layer. The results for these experiments can be reviewed inAppendix A. The results acquired with the activation function ranked in fourteenthplace for both Ljósafoss power station and Hrauneyjafoss power station, and nine-teenth place for Fljótsdalur power station and therefore did not qualify for the top10 results.

Although the results present low error values, more historical data is believed tobe useful for providing better results in the LSTM method. That data could bebeneficial in both the training dataset and the test dataset.

All predictions are long-term archived and can be found via https://doi.org/10.5281/zenodo.3836817

59

6. Related Work

The criterion used for deciding what related work to cover is time series predictionsusing LSTM. Three articles are discussed, one article in each section of this chapter.

6.1. Hydrological Time Series Predictions

Qin et al. [69] conducted a research in 2019 which presents two models for predictinghydrological time series on a daily time resolution and values in m3/s. Both modelsare developed in Python 3 with TensorFlow integration. One model is defined asan auto regressive model where the other model is an LSTM model. The historicaldata of flow in Hanjiang River Basin spans from July 2015 to July 2016. Theauto regressive model uses July 2015 to June 2016 as the training dataset andthe LSTM model uses January 2016 to June 2016 as the training dataset. Bothmethods use July 2016 as the test dataset. There are no missing values mentionedin the article. The LSTM model’s hyperparameters are not explored according tothe paper but are configured to one hundred and twenty-eight LSTM units, andoptimizer is set to Adam. Both methods are for predicting and simulating thirtytime steps. Two metrics are used, RMSE and MAE for evaluating the predictions.The auto regressive model is better of the two, i.e. both RMSE and MAE are smaller,although the models are not trained on the same period where the training datasetis smaller for the LSTM model. The implementation is on a lower level than aKeras model, i.e. Keras is built on top of a backend engine e.g. TensorFlow. Kerasis used as the interface for deep learning in this thesis where TensorFlow is used inthe article. This thesis and the article are both using the same error metrics forprediction evaluation, RMSE and MAE. The article addresses the historical datato be non-stationary which can be argued to be related to seasonality seen in thisthesis.

61

6. Related Work

6.2. Solar Irradiance Time Series Predictions

An article by Qing and Niu [70], published in the year 2018, addresses using anLSTMmodel (as well as an Linear Regression (LR) algorithm and a backpropagationalgorithm) implemented with Keras, for predicting a time series of solar irradianceon hourly resolution. The historical data for irradiance is collected for 30 monthsfrom a solar plant in Cape Verde. The data used within each day is for the hoursfrom 8:00 AM to 6:00 PM and spans 875 days of hourly data without any missingvalues. The weather data acquired is corresponding to the data for irradiance. Theweather data described in the article makes use of nine features of weather forecastdata at a specific hour t. The input data is as follows:

• month

• day of the month

• hour of the day

• temperature (◦C) at hour t

• dew point (◦C) at hour t

• humidity at hour t

• visibility (Kilometer) at hour t

• wind speed (m/s) at hour t

• weather type at hour t i.e. a descriptive weather summary where there arethirteen types of weather descriptions

They describe an experiment of which the historical data is used to predict futurehourly solar irradiance. The whole dataset is split into a training dataset and a testdataset where the training dataset is March 2011 to June 2013 and the test datasetis July 2013 to December 2013. The LSTM network algorithm is described withthree layers: an input layer, an LSTM layer and a Dense layer. The input layer isdefined for the nine features and eleven timesteps. The LSTM layer is configured tothirty units and to return sequences (i.e. return the last output [38]). The last layerin the model is a Dense layer of one unit with the linear activation function. Epochsare set to fifty and the batch size is set to fifty as well. Before training, the datais scaled for normalization on a closed interval of [-1,1]. The article also presentsthe persistence algorithm where historical hourly irradiance are assumed to be thesame values in the previous day to be the day-ahead prediction. A metric of RMSE

62

6.3. Financial Time Series Predictions

is used for evaluating the prediction performance. In this thesis, the values beingpredicted are electricity generation where the article predicts solar irradiance. Themodel structure is very similar if the models are compared, considering the layersin the Sequential Model and the normalization of values is on the same intervali.e. [-1,1]. In the article the number of features are nine, and in this thesis theyare set to two. The article describes a simple method i.e. the persistence algorithmwhich can be argued that this thesis and the article both present simple methods forcomparison with LSTM. The LSTM results in the article achieve the lowest scoreof all the models in the article. Historical dataset, of hourly resolution, used in thearticle is slightly smaller than in this thesis and the test dataset in the article is 6months where in this thesis it is 12 months.

6.3. Financial Time Series Predictions

In the year 2018 Namini and Namin [79] implemented two models to compare thepredictions of stock indices and economics time series, of each model with respectto lower forecast errors and higher accuracy of those predictions. The first modeldescribed is an Autoregressive integrated moving average (ARIMA) model. ARIMAmodels have been used for time series predictions (i.e. forecasting) for a long time.The second model described is an LSTM model implemented in Python using Keraswith Theano as a backend where version numbers are not included in the article.The datasets compose of six financial stock indices time series and six economicstime series. The six financial stock indices time series are acquired from YahooFinance for the period of January 1985 to August 2018. They are as follows:

• Nikkei 225 index (N225) on a monthly resolution

• NASDAQ composite index (IXIC) on a monthly resolution

• Hang Seng Index (HIS) on a monthly resolution

• S&P 500 commodity price index (GSPC) on a monthly resolution

• Dow Jones industrial average index (DJ) as two time series, where one is on amonthly resolution and the other is on a weekly resolution

The six economics time series, which have variable periods, are acquired from theFederal Reserve Bank of St. Louis, and the International Monetary Fund. They areas follows:

• Medical care commodities for all urban consumers index (MC) for the period

63

6. Related Work

of January 1967 to July 2017

• Housing for all urban consumers index (HO) for the period of January 1967to July 2017

• Trade-weighted US dollar index (EX) for the period of August 1967 to July2017

• Food and Beverages for all urban consumers (FB) for the period of January1967 to July 2017

• M1 Money Stock billions of dollars (MS) for the period of January 1959 toJuly 2017

• Transportation for all urban consumers index (TR) for the period of January1947 to July 2017

One feature is chosen (i.e. univariate) from six variables in the original dataset. Thechosen dataset is split as 70% of the dataset is set as the training dataset and 30% isset as the test dataset. The presented LSTM algorithm, starts by setting the randomseed to a constant of seven for replication purposes. The model in the algorithm isa Sequential model consisting of one LSTM layer, of which is set to stateful, a lossfunction which is set to MSE, and the optimizer which is set to Adam. Predictionsare defined as rolling, i.e. one-step ahead where one value in input returns one valueas output which is the subsequent value of the input. A metric of RMSE is used forevaluating a prediction performance. The results conclude that the LSTM modelsoutperform the ARIMA models. The authors experimented with the number ofepochs from a value of one to one hundred for each dataset and concluded that thereare no improvements achieved by changing the number of epochs for the datasetsthey used and therefore kept that hyperparameter set to one. In this thesis it can beargued that the number of epochs matters, and the Sequential Model is not set tostateful in any runs. The predictions are one-step ahead i.e. one value is predictedagainst one input. Historical data in the article is on a monthly resolution of manyyears where this thesis uses hourly resolution of three years.

64

7. Summary and Outlook

7.1. Summary

Time series are widely used and they are a common form of representation forkeeping rows of values that need to be on the side of a timestamp.

Three different power stations with different characteristics (e.g. age, capacity) areinvestigated, all of which are operated by Landsvirkjun, a power company in Iceland.There are two variables considered, the grid input and the power generation at eachpower station. The grid input is only available to the power station operator on amonthly basis and by predicting this data it becomes available (as predicted values)in near real-time. The generation data is used to predict grid input for a powerstation. All of the meters presented have a defined accuracy which is taken intoconsideration. The data from the grid input meters are accumulated into one sumfor a time series of grid input on hourly resolution. The data from the generationmeters are also accumulated into one time series as an accumulated sum of generationmeters on hourly resolution. The dataset is considered small, consisting of threeyears i.e. years 2015–2017.

Two methods were used with the goal of providing time series predictions withthe lowest error. The ratio method is a simple method consisting of calculating acorrection ratio, i.e. dividing total grid input by total generation. This correctionratio is assumed as a constant which is then used to predict the grid input from therelated data which is available before the grid input and is measured for that samehour, which is known to be a related time series. The second method, based on LongShort-Term Memory (LSTM) neural networks, is more complex in comparison tothe ratio method and it has many possibilities of hyperparameter combinations. TheLSTM method is implemented with Keras and later used to train a deep learningmodel. Just like each power station has its own correction ratio, each power stationneeds to have its own model which is trained by the power station’s data. Resultsobtained by using a 2/3 training and 1/3 testing split show that both the lowestRMSE and the lowest MAE are achieved with the LSTM method. In some casesthe ratio method also shows a low RMSE and MAE (e.g. ranking in second placefor Fljótsdalur power station out of the forty-six experiments).

65

7. Summary and Outlook

7.2. Outlook

The thesis presents methods which are applied to time series data for the three powerstations. Initially the idea was to implement methods for all of Landsvirkjun’s powerstations. Future work will be to implement models for the other eighteen powerstations and the two wind mills. For implementing the ratio method there are nohyperparameters needed but only historical data which is also needed for the LSTMmethod. For the LSTM method, the most appropriate hyperparameters need to bediscovered for each power station.

Both of the presented methods can be further improved. The approach in the ratiomethod is very naive, considering that the ratio in Figure 3.1 shown in Section 3.3is not stationary [17]. The approach could be expanded by studying the trend inthe data, e.g. adjusting the correction ratio for every hour in the day, a specificcorrection ratio to each month or week with in a year, or using the calculated ratioof last month.

Possible improvements for the LSTM method are a better selection of the hyperpa-rameter combinations, such as number of epochs, activation functions or other layers– an even lower RMSE and MAE is believed to be possible without overfitting themodels (see Section 2.4). Many other possibilities can be studied further, such as:

• Adjustments to gradient clipping or gradient scaling to avoid exploding gradi-ents [17]

• Skipping hours with missing values instead of imputing with ffill before train-ing and validating

• More historical data could be used in both the training dataset and the testdataset

• Transfer learning, i.e. can any knowledge from other Keras models be usedas the initial model state ahead of training new models for other power sta-tions [16]

• Using more than one timestep

• Using sequentially stacked LSTM layers with return_sequences, i.e. returningthe last output in the output sequence [38] for all LSTM layers except the lastone in the sequence

• Implementing own model structure instead of the Sequential model, e.g. uselayer sharing

66

7.2. Outlook

Furthermore, the methods could be combined into a third method of which wouldmake use of Keras models to predict the correction ratio which is then multipliedby the total generation values, as a last step for acquiring predictions. In practiceit would be convenient to be able to reverse inputs for outputs and vice versa, i.e.predict total generation values if grid input values are provided.

The author believes that future work is worth the effort, e.g. making it easier to addother power stations and also achieve an even lower RMSE and MAE.

67

Bibliography

[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado,A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving,M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané,R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner,I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas,O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. Ten-sorFlow: Large-scale machine learning on heterogeneous systems, 2015. Soft-ware available from tensorflow.org.

[2] R. Begamudre. Extra high voltage AC transmission engineering. New AgeInternational, 2006.

[3] J. Brownlee. Gentle introduction to the adam optimization algorithm fordeep learning. https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/. Accessed on 2020-02-17.

[4] J. Brownlee. How to get reproducible results with keras. https://machinelearningmastery.com/reproducible-results-neural-networks-keras/. Accessed on 2020-05-08.

[5] J. Brownlee. Time series prediction with lstm recurrent neural networks inpython with keras. https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/. Accessedon 2020-02-17.

[6] T. Chai and R. Draxler. Root mean square error (rmse) or mean absolute error(mae)? GMDD, 7(1):1525–1534, 2014.

[7] C. Chatfield. Time-series forecasting. CRC press, 2000.

[8] D3.js. D3.js - data-driven documents. https://d3js.org/. Accessed on 2018-04-02.

[9] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on largeclusters. 2004.

[10] Efla. Kröflulína 3 – 220 kv háspennulína | mau | efla.is. https://www.efla.is/mat-a-umhverfisahrifum/kroflulina-3. Accessed on 2020-02-28.

69

BIBLIOGRAPHY

[11] L. Einarsson. personal communication, 2020.

[12] Enmax. Substations. https://www.enmax.com/generation- wires/transmission-and-distribution/our-system/substations. Accessed on2020-02-25.

[13] Jülich Forschungszentrum. Juron (ibm-nvidia pilot) | high performance ana-lytics & computing platform – guidebook. https://hbp-hpc-platform.fz-juelich.de/?page_id=1073. Accessed on 2020-04-27.

[14] The Python Software Foundation. 32.10. py_compile — compile python sourcefiles — python 3.6.10 documentation. https://docs.python.org/3.6/library/py_compile.html#py_compile.compile. Accessed on 2020-04-27.

[15] The Python Software Foundation. Welcome to python.org. https://www.python.org/. Accessed on 2019-11-20.

[16] Y. Freund and R. Schapire. A desicion-theoretic generalization of on-line learn-ing and an application to boosting. Journal of Computer and System Sciences,55(1):119–139, 1997.

[17] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.http://www.deeplearningbook.org.

[18] A. Géron. Hands-On Machine Learning with Scikit-Learn and TensorFlow Con-cepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, 2017.

[19] E. Hewitt. Cassandra: the definitive guide. O’Reilly Media, 2010.

[20] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, et al. Gradient flow inrecurrent nets: the difficulty of learning long-term dependencies, 2001.

[21] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computa-tion, 9(8):1735–1780, 1997.

[22] Human brain project home. https://www.humanbrainproject.eu/en/. Ac-cessed on 2020-04-30.

[23] J. Hunter. Matplotlib: A 2d graphics environment. Computing In Science &Engineering, 9(3):90–95, 2007.

[24] Hydropower, types of hydropower. https://www.hydropower.org/types-of-hydropower. Accessed on 2018-03-22.

[25] The Khronos Group Inc. The khronos group inc. https://www.khronos.org/webgl/. Accessed on 2018-04-02.

70

BIBLIOGRAPHY

[26] T. Jayalakshmi and A. Santhakumaran. Statistical normalization and backpropagation for classification. International Journal of Computer Theory andEngineering, 3(1):1793–8201, 2011.

[27] Json. https://www.json.org/json-en.html. Accessed on 2020-05-06.

[28] Jupyter.org. Project jupyter. http://jupyter.org/. Accessed on 2018-04-01.

[29] Kairosdb. https://kairosdb.github.io/. Accessed on 2019-11-20.

[30] Kairosdb. https://kairosdb.github.io/docs/build/html/restapi/Aggregators.html. Accessed on 2020-05-19.

[31] keras.io. Callbacks - keras documentation. https://keras.io/callbacks/#history. Accessed on 2020-05-06.

[32] keras.io. Faq - keras documentation. https://keras.io/getting-started/faq/#what-does-sample-batch-epoch-mean. Accessed on 2020-02-14.

[33] keras.io. Guide to the sequential model - keras documentation. https://keras.io/getting-started/sequential-model-guide/. Accessed on 2018-09-08.

[34] keras.io. Home - keras documentation. https://keras.io. Accessed on 2020-02-11.

[35] keras.io. Home - keras documentation. https://keras.io/activations/.Accessed on 2020-04-27.

[36] keras.io. Losses - keras documentation. https://keras.io/losses/. Accessedon 2020-02-11.

[37] keras.io. Optimizers - keras documentation. https://keras.io/optimizers/#adam. Accessed on 2018-09-08.

[38] keras.io. Recurrent layers - keras documentation. https://keras.io/layers/recurrent/. Accessed on 2020-02-15.

[39] keras.io. Sequential - keras documentation. https://keras.io/models/sequential/. Accessed on 2018-09-09.

[40] keras.io. Utils - keras documentation. https://keras.io/utils/. Accessedon 2020-05-05.

[41] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXivpreprint arXiv:1412.6980, 2014.

[42] Landsnet. About the grid. https://www.landsnet.is/english/about-us/about-the-grid/. Accessed on 2020-02-24.

71

BIBLIOGRAPHY

[43] Landsnet. Raforkuspá 2019-2050. https://www.stjornarradid.is/raduneyti/nefndir/nanar- um- nefnd/?itemid=2d0c4de5- 44dc- 11e9-9436-005056bc4d74. Accessed on 2020-04-08.

[44] Landsnet. Raforkuspá 2019-2050. https://orkustofnun.is/gogn/Skyrslur/OS-2019/OS-2019-13.pdf. Accessed on 2020-04-08.

[45] Landsnet. Ársreikningur landsnets 31.12.19.pdf. https://www.landsnet.is/library/Skrar/%C3%81rsreikningur%20Landsnets%2031.12.19.pdf. Ac-cessed on 2020-04-08.

[46] Landsvirkjun. About us - the national power company of iceland. https://www.landsvirkjun.com/company/. Accessed on 2020-05-20.

[47] Landsvirkjun. Customers - the national power company of iceland. https://www.landsvirkjun.com/productsservices/customers. Accessed on 2020-02-24.

[48] Landsvirkjun. Fljótsdalur power station - the national power companyof iceland. https://www.landsvirkjun.com/Company/PowerStations/FljotsdalurPowerStation/. Accessed on 2020-02-27.

[49] Landsvirkjun. Hrauneyjafoss power station - the national power companyof iceland. https://www.landsvirkjun.com/Company/PowerStations/HrauneyjafossPowerStation/. Accessed on 2020-02-26.

[50] Landsvirkjun. Ljósafossstöð - orka úr 100% endurnýjanlegum orkugjöfum.https://www.landsvirkjun.is/Fyrirtaekid/Aflstodvar/Ljosafossstod.Accessed on 2020-03-15.

[51] Landsvirkjun. Power stations - the national power company of iceland. https://www.landsvirkjun.com/company/powerstations. Accessed on 2018-03-22.

[52] Landsvirkjun. Steingrimsstöð power station - the national power companyof iceland. https://www.landsvirkjun.is/Fyrirtaekid/Aflstodvar/Steingrimsstod. Accessed on 2020-02-26.

[53] Landsvirkjun. Submarine cable to europe - the national power com-pany of iceland. https://www.landsvirkjun.com/researchdevelopment/submarinecabletoeurope. Accessed on 2020-02-25.

[54] D. Lewine. POSIX programmers guide. " O’Reilly Media, Inc.", 1991.

[55] W. McKinney. Python for Data Analysis: Data Wrangling with Pandas,NumPy, and IPython. O’Reilly Media, 2012.

[56] Microsoft. The microsoft cognitive toolkit - cognitive toolkit - cntk. https://docs.microsoft.com/en-us/cognitive-toolkit/. Accessed on 2020-02-11.

72

BIBLIOGRAPHY

[57] Microsoft. SQL server 2017 on windows and linux | microsoft. https://www.microsoft.com/en-us/sql-server/sql-server-2017. Accessed on 2020-05-01.

[58] Microsoft. SQL server 2017 on windows and linux | microsoft. https://www.postgresql.org/. Accessed on 2020-05-01.

[59] K. Muthanna, A. Sarkar, K. Das, and K. Waldner. Transformer insulation lifeassessment. IEEE Transactions on Power Delivery, 21(1):150–156, 2005.

[60] Netorka. Virkjanir | netorka. https://netorka.is/raforkukerfid/virkjanir/. Accessed on 2020-02-26.

[61] Numpy. https://numpy.org/. Accessed on 2020-05-21.

[62] Alþingi National Parliament of Iceland. 65/2003: Raforkulög | lög | alþingi.https://www.althingi.is/lagas/nuna/2003065.html. Accessed on 2020-02-24.

[63] C. Olah. Understanding lstm networks – colah’s blog. https://colah.github.io/posts/2015-08-Understanding-LSTMs/. Accessed on 2020-04-20.

[64] Pandas data analysis library. https://pandas.pydata.org/. Accessed on2018-03-31.

[65] pandas.pydata.org. Pandas.dataframe.fillna – pandas 0.23.4 documentation.https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html. Accessed on 2018-09-06.

[66] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Ma-chine learning in Python. Journal of Machine Learning Research, 12:2825–2830,2011.

[67] F. Pérez and B. Granger. IPython: a system for interactive scientific computing.Computing in Science and Engineering, 9(3):21–29, May 2007.

[68] plotly. Plotly for python - plotly. https://plot.ly/d3-js-for-python-and-pandas-charts/. Accessed on 2018-04-02.

[69] J. Qin, J. Liang, T. Chen, X. Lei, and A. Kang. Simulating and predicting ofhydrological time series based on tensorflow deep learning. Polish Journal ofEnvironmental Studies, 28(2), 2019.

[70] X. Qing and Y. Niu. Hourly day-ahead solar irradiance prediction using weatherforecasts by lstm. Energy, 148:461–468, 2018.

73

BIBLIOGRAPHY

[71] RARIK. Rarik - rarik - iceland state electricity. https://www.rarik.is/english. Accessed on 2020-02-27.

[72] T. Rinne and T. Ylonen. scp(1) - openbsd manual pages. https://man.openbsd.org/scp. Accessed on 2020-02-18.

[73] A. Salimi, O. Erdem, and M. Rafighi. Applying a multi sensor system to predictand simulate the tool wear using of artificial neural networks. Scientia Iranica,24, 08 2017.

[74] J. Schmidhuber. Deep learning in neural networks: An overview. Neural net-works, 61:85–117, 2015.

[75] scikit learn.org. sklearn.preprocessing.minmaxscaler – scikit-learn 0.20.0 docu-mentation. http://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.MinMaxScaler.html. Accessed on 2018-10-08.

[76] numpy.ndarray.shape — numpy v1.15 manual. https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.ndarray.shape.html. Ac-cessed on 2020-03-05.

[77] numpy.random.seed — numpy v1.15 manual. https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html. Accessedon 2018-03-31.

[78] C. Shearer. The CRISP-DM model: the new blueprint for data mining. Journalof data warehousing, 5(4):13–22, 2000.

[79] S. Siami-Namini and A. Namin. Forecasting economics and financial time series:Arima vs. lstm. arXiv preprint arXiv:1803.06386, 2018.

[80] Theano Development Team. Welcome — theano 1.0.0 documentation. http://www.deeplearning.net/software/theano/. Accessed on 2020-02-11.

[81] R. van Loon. Machine learning explained: Understanding supervised, unsu-pervised, an. https://datafloq.com/read/machine-learning-explained-understanding-learning/4478. Accessed on 2020-04-10.

[82] Fileformat svef/xx. http://doc.energy.vitec.net/ManualData/ve/en/aiolos/index.html#!fileformatsvefxx.htm. Accessed on 2019-11-05.

[83] V. Vigneswaran. Deep learning is just a marketing term | misconceptionsand truths. https://medium.com/datadriveninvestor/deep-learning-is-just-a-marketing-term-misconceptions-and-truths-9c7cd9e7ffec. Ac-cessed on 2020-04-10.

[84] R. Walling and N. Miller. Distributed generation islanding-implications onpower system dynamic performance. In IEEE Power Engineering Society Sum-mer Meeting,, volume 1, pages 92–96. IEEE, 2002.

74

BIBLIOGRAPHY

[85] C. Willmott and K. Matsuura. Advantages of the mean absolute error (mae)over the root mean square error (rmse) in assessing average model performance.Climate research, 30(1):79–82, 2005.

[86] J. Wirfs-Brock. Lost in transmission: How much electricity disappears betweena power plant and your plug? | inside energy. http://insideenergy.org/2015/11/06/lost-in-transmission-how-much-electricity-disappears-between-a-power-plant-and-your-plug/. Accessed on 2020-04-01.

75

A. LSTM Parameter SearchExperiments

The following appendix presents results for the forty-four experiments performed bythe LSTM method for each of the three power stations.

Row id MAE RMSE Hyperparameters

1 11 0.031058 0.066448 epochs: 150, batchsize: 6, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

2 23 0.031595 0.066642 epochs: 150, batchsize: 24, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

3 39 0.031843 0.067089 epochs: 150, batchsize: 24, shuffle: True,units: 8, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

4 17 0.032599 0.067127 epochs: 150, batchsize: 24, shuffle: True,units: 64, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

5 24 0.032742 0.067153 epochs: 150, batchsize: 48, shuffle: True,units: 64, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

6 5 0.035351 0.068712 epochs: 150, batchsize: 48, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

7 1 0.035702 0.068954 epochs: 150, batchsize: 12, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

77

A. LSTM Parameter Search Experiments

8 27 0.035721 0.068962 epochs: 150, batchsize: 12, shuffle: True,units: 2, firstdenseunits: 1, firstactivation:linear, loss: logcosh, optimizer: Adam

9 29 0.039278 0.074858 epochs: 25, batchsize: 1, shuffle: False,units: 128, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

10 8 0.042773 0.075910 epochs: 200, batchsize: 1, shuffle: False,units: 1, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

11 37 0.044519 0.076561 epochs: 100, batchsize: 1, shuffle: False,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

12 31 0.063596 0.076936 epochs: 25, batchsize: 1, shuffle: False, units:256, activation: linear, loss: logcosh, opti-mizer: Adam

13 15 0.041001 0.078965 epochs: 9, batchsize: 12, shuffle: True, units:2, firstdenseunits: 1, firstactivation: linear,loss: logcosh, optimizer: Adam

14 42 0.048700 0.079151 epochs: 25, batchsize: 1, shuffle: False, units:128, firstdenseunits: 1, seconddenseunits: 1,firstactivation: relu, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

15 9 0.050861 0.079724 epochs: 25, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

16 28 0.066459 0.080190 epochs: 10, batchsize: 1, shuffle: False, units:256, activation: linear, loss: logcosh, opti-mizer: Adam

17 22 0.052246 0.081409 epochs: 25, batchsize: 1, shuffle: False,units: 256, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

18 14 0.054288 0.081494 epochs: 150, batchsize: 24, shuffle: False,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

78

19 3 0.062328 0.084708 epochs: 25, batchsize: 1, shuffle: False, units:512, firstdenseunits: 1, seconddenseunits: 1,firstactivation: linear, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

20 6 0.033980 0.085299 epochs: 10, batchsize: 1, shuffle: False, units:128, activation: relu, loss: logcosh, opti-mizer: Adam

21 26 0.073750 0.088201 epochs: 25, batchsize: 1, shuffle: False,units: 256, activation: linear, loss:mean_squared_error, optimizer: Adam

22 16 0.070999 0.095290 epochs: 200, batchsize: 1, shuffle: False,units: 1, LSTMactivation: linear, loss:mean_squared_error, optimizer: Adam

23 7 0.034277 0.098513 epochs: 100, batchsize: 24, shuffle: True,units: 2, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

24 34 0.036419 0.099092 epochs: 100, batchsize: 1, shuffle: False,units: 2, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

25 36 0.038158 0.100425 epochs: 25, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

26 38 0.062880 0.100883 epochs: 10, batchsize: 24, shuffle: False,units: 128, activation: relu, loss: logcosh,optimizer: Adam

27 0 0.042677 0.103052 epochs: 10, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

28 30 0.046906 0.104761 epochs: 25, batchsize: 1, shuffle: False,units: 512, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

29 12 0.046634 0.105429 epochs: 25, batchsize: 1, shuffle: False,units: 128, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

79

A. LSTM Parameter Search Experiments

30 13 0.049364 0.106897 epochs: 25, batchsize: 1, shuffle: False,units: 256, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

31 18 0.090826 0.109888 epochs: 25, batchsize: 1, shuffle: False,units: 256, firstdenseunits: 1, seconddenseu-nits: 1, firstactivation: linear, secondactiva-tion: tanh, loss: mean_absolute_error, op-timizer: Adam

32 2 0.062695 0.110368 epochs: 25, batchsize: 1, shuffle: False, units:256, firstdenseunits: 1, seconddenseunits: 1,firstactivation: tanh, secondactivation: sig-moid, loss: logcosh, optimizer: Adam

33 40 0.067648 0.115850 epochs: 50, batchsize: 1, shuffle: False,units: 1, LSTMactivation: relu, loss:mean_squared_error, optimizer: Adam

34 21 0.088277 0.121960 epochs: 25, batchsize: 1, shuffle: False, units:256, activation: tanh, loss: logcosh, opti-mizer: Adam

35 25 0.078203 0.125296 epochs: 10, batchsize: 1, shuffle: False,units: 128, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

36 33 0.102158 0.130471 epochs: 25, batchsize: 1, shuffle: False, units:256, firstdenseunits: 1, seconddenseunits: 1,firstactivation: linear, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

37 10 0.111842 0.135064 epochs: 25, batchsize: 1, shuffle: False, units:256, firstdenseunits: 1, seconddenseunits: 1,firstactivation: relu, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

38 20 0.124206 0.146303 epochs: 1, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

39 43 0.228021 0.263771 epochs: 10, batchsize: 1, shuffle: False, units:128, activation: tanh, loss: logcosh, opti-mizer: Adam

40 41 0.361933 0.483504 epochs: 10, batchsize: 365, shuffle: False,units: 128, activation: relu, loss: logcosh,optimizer: Adam

80

41 32 0.764660 1.301588 epochs: 25, batchsize: 1, shuffle: False, units:512, firstdenseunits: 1, seconddenseunits: 1,firstactivation: relu, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

42 4 0.530493 2.001571 epochs: 1, batchsize: 1, shuffle: False, units:512, firstdenseunits: 1, seconddenseunits: 1,firstactivation: linear, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

43 19 1.893581 2.230929 epochs: 10, batchsize: 1, shuffle: False, units:128, activation: softmax, loss: logcosh, opti-mizer: Adam

44 35 5.520626 5.641009 epochs: 300, batchsize: 24, shuffle: True,units: 2, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

Table A.1: LJO Keras LSTM parameter search experiments, sorted descending byRMSE

Row id mae rmse details

1 31 0.420635 1.651410 epochs: 25, batchsize: 1, shuffle: False, units:256, activation: linear, loss: logcosh, opti-mizer: Adam

2 26 0.492133 1.663832 epochs: 25, batchsize: 1, shuffle: False,units: 256, activation: linear, loss:mean_squared_error, optimizer: Adam

3 21 0.468390 1.862421 epochs: 25, batchsize: 1, shuffle: False, units:256, activation: tanh, loss: logcosh, opti-mizer: Adam

4 33 0.473382 2.067328 epochs: 25, batchsize: 1, shuffle: False, units:256, firstdenseunits: 1, seconddenseunits: 1,firstactivation: linear, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

5 3 0.730868 2.087265 epochs: 25, batchsize: 1, shuffle: False, units:512, firstdenseunits: 1, seconddenseunits: 1,firstactivation: linear, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

81

A. LSTM Parameter Search Experiments

6 28 0.751464 2.223607 epochs: 10, batchsize: 1, shuffle: False, units:256, activation: linear, loss: logcosh, opti-mizer: Adam

7 18 0.581193 2.346150 epochs: 25, batchsize: 1, shuffle: False,units: 256, firstdenseunits: 1, seconddenseu-nits: 1, firstactivation: linear, secondactiva-tion: tanh, loss: mean_absolute_error, op-timizer: Adam

8 43 0.641122 2.368930 epochs: 10, batchsize: 1, shuffle: False, units:128, activation: tanh, loss: logcosh, opti-mizer: Adam

9 11 1.316635 5.977618 epochs: 150, batchsize: 6, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

10 1 1.421835 5.986155 epochs: 150, batchsize: 12, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

11 27 1.368900 5.992238 epochs: 150, batchsize: 12, shuffle: True,units: 2, firstdenseunits: 1, firstactivation:linear, loss: logcosh, optimizer: Adam

12 17 1.371943 5.997670 epochs: 150, batchsize: 24, shuffle: True,units: 64, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

13 23 1.326950 5.998016 epochs: 150, batchsize: 24, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

14 14 1.276686 6.006522 epochs: 150, batchsize: 24, shuffle: False,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

15 22 1.167527 6.014548 epochs: 25, batchsize: 1, shuffle: False,units: 256, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

16 24 1.346804 6.022217 epochs: 150, batchsize: 48, shuffle: True,units: 64, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

82

17 5 1.377823 6.039778 epochs: 150, batchsize: 48, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

18 39 2.021732 6.065105 epochs: 150, batchsize: 24, shuffle: True,units: 8, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

19 32 1.213060 6.066315 epochs: 25, batchsize: 1, shuffle: False, units:512, firstdenseunits: 1, seconddenseunits: 1,firstactivation: relu, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

20 29 1.187524 6.067082 epochs: 25, batchsize: 1, shuffle: False,units: 128, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

21 8 1.161658 6.067620 epochs: 200, batchsize: 1, shuffle: False,units: 1, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

22 37 1.143735 6.070046 epochs: 100, batchsize: 1, shuffle: False,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

23 42 1.172865 6.095465 epochs: 25, batchsize: 1, shuffle: False, units:128, firstdenseunits: 1, seconddenseunits: 1,firstactivation: relu, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

24 10 1.172690 6.097973 epochs: 25, batchsize: 1, shuffle: False, units:256, firstdenseunits: 1, seconddenseunits: 1,firstactivation: relu, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

25 15 1.894677 6.129632 epochs: 9, batchsize: 12, shuffle: True, units:2, firstdenseunits: 1, firstactivation: linear,loss: logcosh, optimizer: Adam

26 20 1.246189 6.133287 epochs: 1, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

83

A. LSTM Parameter Search Experiments

27 9 1.228207 6.156736 epochs: 25, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

28 16 1.373364 6.206150 epochs: 200, batchsize: 1, shuffle: False,units: 1, LSTMactivation: linear, loss:mean_squared_error, optimizer: Adam

29 38 1.867731 6.268174 epochs: 10, batchsize: 24, shuffle: False,units: 128, activation: relu, loss: logcosh,optimizer: Adam

30 2 2.133040 6.288692 epochs: 25, batchsize: 1, shuffle: False, units:256, firstdenseunits: 1, seconddenseunits: 1,firstactivation: tanh, secondactivation: sig-moid, loss: logcosh, optimizer: Adam

31 41 2.321250 6.327148 epochs: 10, batchsize: 365, shuffle: False,units: 128, activation: relu, loss: logcosh,optimizer: Adam

32 6 2.548590 6.348500 epochs: 10, batchsize: 1, shuffle: False, units:128, activation: relu, loss: logcosh, opti-mizer: Adam

33 35 3.031911 8.367361 epochs: 300, batchsize: 24, shuffle: True,units: 2, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

34 7 3.207127 8.370221 epochs: 100, batchsize: 24, shuffle: True,units: 2, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

35 34 2.651385 8.380322 epochs: 100, batchsize: 1, shuffle: False,units: 2, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

36 40 3.542567 8.588399 epochs: 50, batchsize: 1, shuffle: False,units: 1, LSTMactivation: relu, loss:mean_squared_error, optimizer: Adam

37 4 4.144086 20.933792 epochs: 1, batchsize: 1, shuffle: False, units:512, firstdenseunits: 1, seconddenseunits: 1,firstactivation: linear, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

38 25 39.186405 46.120302 epochs: 10, batchsize: 1, shuffle: False,units: 128, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

84

39 30 39.186405 46.120302 epochs: 25, batchsize: 1, shuffle: False,units: 512, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

40 13 39.186405 46.120302 epochs: 25, batchsize: 1, shuffle: False,units: 256, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

41 12 39.186405 46.120302 epochs: 25, batchsize: 1, shuffle: False,units: 128, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

42 36 39.186405 46.120302 epochs: 25, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

43 0 39.186405 46.120302 epochs: 10, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

44 19 57.373775 64.306206 epochs: 10, batchsize: 1, shuffle: False, units:128, activation: softmax, loss: logcosh, opti-mizer: Adam

Table A.2: HRA Keras LSTM parameter search experiments, sorted descending byRMSE

Row id mae rmse details

1 24 0.785121 2.667186 epochs: 150, batchsize: 48, shuffle: True,units: 64, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

2 17 0.604542 2.683032 epochs: 150, batchsize: 24, shuffle: True,units: 64, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

3 14 0.662247 2.685020 epochs: 150, batchsize: 24, shuffle: False,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

85

A. LSTM Parameter Search Experiments

4 39 0.592666 2.686128 epochs: 150, batchsize: 24, shuffle: True,units: 8, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

5 23 0.604605 2.691383 epochs: 150, batchsize: 24, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

6 11 0.607984 2.692199 epochs: 150, batchsize: 6, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

7 1 0.552011 2.707797 epochs: 150, batchsize: 12, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

8 27 0.552056 2.708260 epochs: 150, batchsize: 12, shuffle: True,units: 2, firstdenseunits: 1, firstactivation:linear, loss: logcosh, optimizer: Adam

9 5 0.654123 2.717374 epochs: 150, batchsize: 48, shuffle: True,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

10 16 1.170461 2.759480 epochs: 200, batchsize: 1, shuffle: False,units: 1, LSTMactivation: linear, loss:mean_squared_error, optimizer: Adam

11 8 0.946512 2.781725 epochs: 200, batchsize: 1, shuffle: False,units: 1, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

12 37 1.404279 2.883925 epochs: 100, batchsize: 1, shuffle: False,units: 2, firstdenseunits: 1, firstactiva-tion: linear, loss: mean_squared_error, op-timizer: Adam

13 15 0.901289 4.180523 epochs: 9, batchsize: 12, shuffle: True, units:2, firstdenseunits: 1, firstactivation: linear,loss: logcosh, optimizer: Adam

14 40 1.017894 5.224220 epochs: 50, batchsize: 1, shuffle: False,units: 1, LSTMactivation: relu, loss:mean_squared_error, optimizer: Adam

15 9 1.360432 9.604777 epochs: 25, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

86

16 35 1.463697 9.614680 epochs: 300, batchsize: 24, shuffle: True,units: 2, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

17 29 1.340134 9.629594 epochs: 25, batchsize: 1, shuffle: False,units: 128, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

18 26 1.225417 9.655201 epochs: 25, batchsize: 1, shuffle: False,units: 256, activation: linear, loss:mean_squared_error, optimizer: Adam

19 22 1.236389 9.666354 epochs: 25, batchsize: 1, shuffle: False,units: 256, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

20 3 1.410296 9.943058 epochs: 25, batchsize: 1, shuffle: False, units:512, firstdenseunits: 1, seconddenseunits: 1,firstactivation: linear, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

21 34 1.623676 10.223691 epochs: 100, batchsize: 1, shuffle: False,units: 2, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

22 33 1.506023 10.482851 epochs: 25, batchsize: 1, shuffle: False, units:256, firstdenseunits: 1, seconddenseunits: 1,firstactivation: linear, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

23 31 1.565719 11.252187 epochs: 25, batchsize: 1, shuffle: False, units:256, activation: linear, loss: logcosh, opti-mizer: Adam

24 7 2.122972 11.943007 epochs: 100, batchsize: 24, shuffle: True,units: 2, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

25 10 2.247424 11.985276 epochs: 25, batchsize: 1, shuffle: False, units:256, firstdenseunits: 1, seconddenseunits: 1,firstactivation: relu, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

26 28 2.745437 13.948799 epochs: 10, batchsize: 1, shuffle: False, units:256, activation: linear, loss: logcosh, opti-mizer: Adam

87

A. LSTM Parameter Search Experiments

27 43 4.406537 14.712089 epochs: 10, batchsize: 1, shuffle: False, units:128, activation: tanh, loss: logcosh, opti-mizer: Adam

28 18 3.871440 15.322457 epochs: 25, batchsize: 1, shuffle: False,units: 256, firstdenseunits: 1, seconddenseu-nits: 1, firstactivation: linear, secondactiva-tion: tanh, loss: mean_absolute_error, op-timizer: Adam

29 21 4.309050 15.963064 epochs: 25, batchsize: 1, shuffle: False, units:256, activation: tanh, loss: logcosh, opti-mizer: Adam

30 2 4.990944 20.849043 epochs: 25, batchsize: 1, shuffle: False, units:256, firstdenseunits: 1, seconddenseunits: 1,firstactivation: tanh, secondactivation: sig-moid, loss: logcosh, optimizer: Adam

31 42 20.784307 27.640145 epochs: 25, batchsize: 1, shuffle: False, units:128, firstdenseunits: 1, seconddenseunits: 1,firstactivation: relu, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

32 32 20.784307 27.640145 epochs: 25, batchsize: 1, shuffle: False, units:512, firstdenseunits: 1, seconddenseunits: 1,firstactivation: relu, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

33 38 7.769746 42.036785 epochs: 10, batchsize: 24, shuffle: False,units: 128, activation: relu, loss: logcosh,optimizer: Adam

34 30 7.839225 42.039660 epochs: 25, batchsize: 1, shuffle: False,units: 512, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

35 13 7.856265 42.041671 epochs: 25, batchsize: 1, shuffle: False,units: 256, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

36 25 7.915883 42.043426 epochs: 10, batchsize: 1, shuffle: False,units: 128, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

37 36 8.040632 42.045532 epochs: 25, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

88

38 12 8.203131 42.046180 epochs: 25, batchsize: 1, shuffle: False,units: 128, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

39 0 8.269551 42.048594 epochs: 10, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, firstactivation:relu, loss: mean_squared_error, optimizer:Adam

40 6 8.346835 42.054064 epochs: 10, batchsize: 1, shuffle: False, units:128, activation: relu, loss: logcosh, opti-mizer: Adam

41 41 17.604446 43.968917 epochs: 10, batchsize: 365, shuffle: False,units: 128, activation: relu, loss: logcosh,optimizer: Adam

42 20 14.541536 49.435391 epochs: 1, batchsize: 1, shuffle: False,units: 64, firstdenseunits: 1, second-denseunits: 1, firstactivation: linear, loss:mean_squared_error, optimizer: Adam

43 19 70.248291 75.481424 epochs: 10, batchsize: 1, shuffle: False, units:128, activation: softmax, loss: logcosh, opti-mizer: Adam

44 4 18.441334 91.388059 epochs: 1, batchsize: 1, shuffle: False, units:512, firstdenseunits: 1, seconddenseunits: 1,firstactivation: linear, secondactivation: lin-ear, loss: mean_squared_error, optimizer:Adam

Table A.3: FLJ Keras LSTM parameter search experiments, sorted descending byRMSE

89