comparisons between different hybrid statistical models for accurate forecasting of photovoltaic...

Universit del Salento

Facolt Di Ingegneria

Corso di Laurea Magistrale in Ingegneria Meccanica

Tesi di Laurea in:

Impianti termotecnici

COMPARISONS BETWEEN DIFFERENT

HYBRID STATISTICAL MODELS FOR

ACCURATE FORECASTING OF

PHOTOVOLTAIC SYSTEMS POWER

Relatori:

Prof. Ing. Paolo M. Congedo

Prof.ssa Ing. M.G. De Giorgi

Correlatore:

Ing. Maria Malvoni

Laureando:

Andrea Cret

Sessione Autunnale

Anno Accademico 2012/2013

To my Family

To God

1

Contents

Abstract 4

1 Introduction 5

2 Photovoltaic system description 8

2.1 Places description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Climate data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 General specication of the PV plant . . . . . . . . . . . . . . . . . 13

2.4 Mechanical dimensioning of the PV plant . . . . . . . . . . . . . . . 14

2.5 Electrical dimensioning of the PV plant . . . . . . . . . . . . . . . . 16

2.6 Data acquisition system . . . . . . . . . . . . . . . . . . . . . . . . 17

2.7 Electrical Power Production Data . . . . . . . . . . . . . . . . . . . 19

3 Electrical time series forecasting 24

3.1 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 What is a Learning Machine . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Created Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4 Training and Test Datasets . . . . . . . . . . . . . . . . . . . . . . . 30

3.5 Model Performances Evaluations methods . . . . . . . . . . . . . . 31

4 Articial Neural Networks (ANNs) 34

4.1 Elman Back-propagation Neural Network . . . . . . . . . . . . . . . 34

4.2 Forecasting with Model I - Input Vector I and II . . . . . . . . . . . 35

5 Support Vector Machines (SVMs) 45

5.1 Introduction to Support Vector Machines . . . . . . . . . . . . . . . 45

5.2 SVM for Regression Models - SVR . . . . . . . . . . . . . . . . . . 50

5.3 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.4 Nonlinear SVR using kernel . . . . . . . . . . . . . . . . . . . . . . 54

2

Contents

5.5 Least Square Support Vector Machine for Regression . . . . . . . . 57

5.6 LsSVM Matlab Toolbox . . . . . . . . . . . . . . . . . . . . . . . . 59

5.7 Forecasting with Model II - Input Vector I . . . . . . . . . . . . . . 60

5.8 Forecasting with Model II - Input Vector II . . . . . . . . . . . . . . 66

6 LSSVM with Wavelet Transform 74

6.1 Fourier transform and short-term Fourier transform . . . . . . . . . 74

6.2 Continuous Wavelet transform . . . . . . . . . . . . . . . . . . . . . 75

6.3 Discrete Wavelet transform . . . . . . . . . . . . . . . . . . . . . . . 77

6.4 Daubechies type 4 Discrete Wavelet transform . . . . . . . . . . . . 78

6.5 Matlab

Wavelet Toolbox

and Wavelet transforming algorithm . . 81

6.6 Forecasting results with Input Vector I . . . . . . . . . . . . . . . . 82

6.7 Forecasting results with Input Vector II . . . . . . . . . . . . . . . . 89

7 Multistep forecasting 96

8 Comparisons between Model I, II and III 98

9 Conclusions 105

9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

9.2 Future work recommendations . . . . . . . . . . . . . . . . . . . . . 106

Acknowledgements 108

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Connecting the Dots... . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Terms, denitions, abbreviations and symbols 111

Bibliografy 119

3

Abstract

A very high inuence of the photovoltaic energy into electricity free market

requires ecient PV power forecasting systems. This study is focused on the power

productivity forecasting of a Photovoltaic System located in Apulia - South East

of Italy - using dierent hybrid statistical models and comparing the performances

between all this models. Statistical models created and analyzed in this thesis

are based on: Articial Neural Networks (ANNs), Least Square Support Vector

Machines (LS-SVMs) and an hybrid model based on Least Square Support Vector

Machines (LS-SVMs) with the Wavelet Decomposition of the input dataset.

In the rst part of the thesis a description of the photovoltaic systems technology

has been made and also a description of the PV park located in Monteroni di Lecce

- Puglia - Italy with an accurate analysis of the climate data and the productivity

of the plant. In the second part of the thesis dierent models for electric power

forecasting are proposed. Learning Machines theory was explained and forecasting

simulations using Mathworks Matlab software are made for dierent forecasting

horizons (+1h,+3h,+6h,+12h,+24h). The forecasting errors obtained with the dif-

ferent models are investigated and also an accurate error distribution analysis is

made in order to evaluate the model that reaches the best performance. In the nal

chapter a short discussion about the Multistep technique applied to the LS-SVM

models and also forecasting simulations are mare. It was found that hybrid methods

based on LS-SVM and WD outperform oher methods in the majority of cases.

4

Chapter 1

Introduction

According to the statistical analysis of the photovoltaic PV systems in Italy,

performed by the Italian Energy Service (Gestore Servizi Energetici, GSE S.p.A.),

at the end of 2011 about 330,200 plants have become operative in Italy with a total

installed power of 12,780 MW. In September 2012 the number of plants increased up

to 440,387 and the total power was equal to 15,482.8 MW. In the middle September

2013 the installed PV plants increased to 549,918 with a total power equal to 17,439

MW and the Apulia Region - in the south-east of Italy - is the rst region in Italy

for installed PV power (2,493 MW), with about 128 kW/km2. One of the Apulian

PV plant was installed in the "Ecotekne Campus" of the University of Salento

in Monteroni di Lecce (LE), that promotes the use of renewable energy and also

participating in international research projects in this eld.

This study is part of the founded research project "Building Energy Advanced

Management Systems (BEAMS)". BEAMS is an EU Research and Development

project funded by the EC in the context of the 7th Framework Program. Its strate-

gic goal is the development of an advanced, integrated management system which

enables energy eciency in buildings and special infrastructures from a holistic

perspective. The project is developing an open interoperability gateway that will

allow the management of diverse, heterogeneous sources and loads, some of them

typically present nowadays in spaces of public use (e.g. public lighting, ventilation,

air conditioning), some others emergent and to be widespread over the next years

(e.g. electric vehicles).

BEAMS is a user driven, demonstration oriented project, where evidence of the

energy and CO2 savings achieved by the project's technologies will be collected. By

means of a decentralized architecture, BEAMS will enable new mechanisms to ex-

tend current building management systems and achieve higher degrees of eciency.

5

Figure 1.1: Italian Solar Map

The solution proposed will not only support the human operator of the building or

facility to achieve higher eciency in the use of energy, but it will also open new

opportunities to third parties - such as Energy Service Companies (ESCO), utili-

ties and grid operators - needing and willing to interact with BEAMS management

system through the interoperability gateway in order to improve the quality and

eciency of the service - both inside and outside the perimeter of the facility.

Figure 1.2: BEAMS Project logo

The purpose of this thesis is to design dierent hybrid statistical models for

photovoltaic power forecasting, applied to the PV Park located in Monteroni di

6

Lecce (LE), and also evaluate the performances of these dierent forecasting models.

In the First part of this thesis, the author presents a detailed description of the

Photovoltaic Power Plant, including a detailed climate data analysis and the Data

acquisition system, based on past studies made by P.M. Congedo et. al. [1]. Sec-

ondly, a productivity analysis is proposed relying on the original project of the plant

made by the Italian company ESPE srl. A very accurate literature search of the

Electrical time series forecasting state of the art was done, underlining the evolu-

tion of time series forecasting, starting from traditional statistical methods such as

multi-linear regression models, Box-Jenkins methods, Kalman ltering-based meth-

ods and ARMA models. It is focused that electric load time series are usually non-

linear functions of exogenous variables [13], so to incorporate non-linearity, many

researches started using Articial Neural Networks (ANNs), Support Vector Ma-

chines (SVMs) and Hybrid models based on Wavelet Transform of the signals. It is

also underlined that no one already used a Least Square Support Vector Machines

(LSSVMs) based model and also an hybrid model with Wavelet Transform and

LSSVMs in order to evaluate the performances of a photovoltaic power plant. So,

the innovative character of this thesis is to use these hybrid statistical models in

order to reach better performances in PV power forecasting.

In order to store in ecient way the acquired data from the PV Power plant, a

software called "Solar Data Extractor" was developed, that provides a real time ac-

quisition from the PV plant design company web site to a local MS Access Database.

Also a Java routine for MS Access to MySQL database conversion is developed.

In the Second part of the thesis a description of the mathematical theory of

Articial Neural Networks, Least Square Support Vector Machines and Wavelet

Transform is proposed, underlining the strengths and the weaknesses of each al-

gorithm. After that, forecasting simulations are made using Mathworks Matlab

R2012b software with each forecasting models and using as input two dierent in-

put vectors. All the results are analyzed and an accurate performance evaluation

by using a statistical approach is proposed.

Performance of the created models are compared in order to focus the best

forecasting models for particular applications. A nal discussion about the used

forecasting techniques is proposed, underlining positive and negative aspects of

each model and also some future work recommendations in order to improve the

performance of the created models.

7

Chapter 2

Photovoltaic system description

2.1 Places description

The PV Park under study is located in the campus of the University of Salento,

in Monteroni di Lecce (LE), Puglia (40193216N, 1855244E) and maps in

Figures 2.1 and 2.2 indicates its geographical collocation.

Figure 2.1: Geographical collocation of the Campus "Ecotekne" (source: Google

Maps)

The PV Panels of the PV Park are installed on shelters used as car parking, as

shown in Figures 2.3 and 2.4

8

2.2 Climate data analysis

Figure 2.2: Map of the Campus Ecotekne (source: Google Maps)

Figure 2.3: Example 1 of PV modules installed on shelters in the Ecotekne Cam-

pus


The PV Park is characterized by a warm Mediterranean climate with a dry

summer [1]. The Cartesian and polar solar maps of the site are proposed in Fig-

ures 2.6 and 2.7. The climate data of the site, reported in www.meteo-am.it and

www.ilmeteo.it were analyzed in terms of temperature, humidity and wind speed

during three temporal periods: from 1961 to 1990, from 1991 to 2000 and from 2001

to 2011.

Table 2.1 shows the values of average ambient temperature and PV module

9


Figure 2.4: Example 2 of PV modules installed on shelters in the Ecotekne Cam-

pus

(a) 1961-1990 (b) 1991-2000

(c) 2001-2011

Figure 2.5: Means of Maximum and minimum temperatures during three periods

temperature for each range of solar irradiance for the target period (2012). The

average ambient temperature ranges from 18.9 C to 1 100 W/m2 and 27.5 C at1100 1200 W/m2. Moreover, the average PV module temperature ranges from 16

10


Figure 2.6: Height of the sun for twelve months for the desired latitude - Cartesian

diagram (source: ENEA)

Figure 2.7: Height of the sun for twelve months for the desired latitude - Polar

diagram (source: ENEA)

C to 0 100 W/m2 and 48.6 C at 1100 1200 W/m2. In the range between 1100and 1200 W/m2 the PV module temperature reaches the highest increment (about

21

C).

11


Solar Irradiance Ambient Temperature Ambient Temperature

(W/m2) (C) (C)

0-100 18.9 16.0

100-200 22.2 22.8

200-300 23.5 26.4

300-400 24.3 29.7

400-500 24.9 32.7

500-600 25.8 36.2

600-700 25.8 38.6

700-800 26.8 41.9

800-900 27.1 43.9

900-1000 27.5 45.3

1000-1100 25.5 45.7

1100-1200 27.5 48.6

1200-1300 17.9 28.1

Table 2.1: Average ambient temperature and PV module temperature for each range

of solar irradiance

Max Solar Irradiance

PV1

Max Solar Irradiance

PV2

(W/m2) (W/m2)

March 984.2 1073.4

April 1253.1 1344.4

May 1245.0 1309.0

June 1249.9 1265.0

July 1094.7 1124.3

August 1101.4 1122.1

September 1220.2 1249.5

October 924.8 1103.5

Table 2.2: Monthly ranges variation of solar irradiance

Table 2.2 shows the maximum values recorded for each month regarding solar

irradiance and the maximum and minimum values of ambient temperature and

module temperature are shown in Table 2.3. The maximum dierence between Tc

12

2.3 General specication of the PV plant

Month Ambient temperature (

C) Module temperature (C)

Min Max Min Max

March 4.4 25.1 -1.1 50.4

April 4.6 29.4 -0.1 57.2

May 8.7 33.7 4.82 60.2

June 13.6 43.1 9.1 73.6

July 18.1 44.4 13.7 70.5

August 15.7 41.2 11.3 69.6

September 10.1 37.1 4.5 62.5

October 6.8 33.4 0.8 55.9

Table 2.3: Monthly ranges variation of PV module temperature and ambient tem-

perature

and Ta of about 40.3C is also noted. The maximum PV module temperature is

73.6 C, when the solar irradiance and ambient temperature are 915.5 W/m2 and

33.4 C.

2.3 General specication of the PV plant

In the University Campus "Ecotekne" located in Monteroni di Lecce (LE) -

Apulia - Italy is installed a Photovoltaic plant for the electrical energy production

by the direct conversion of the solar radiation, means photovoltaic eect, composed

primarily of a set of PV modules, one or more conversion groups from continuous

current to alternate current and other minor electrical components. The speci-

cation of the single PV module are shown in Tab. 2.4. The modules installed

in the PV plant are produced by the company "SUNPOWER Corporation"; they

have a declared ecient of 19.6%, a reduced tension-temperature coecient and a

anti-reective glass. The solar cells used for this module are produced by "Maxeon

Corp." with "back-contact" patented technology.

In the site under study there are the following 4 PV sub-plants with dierent

nominal power:

FV1: 960 kWp

FV2.1: 990.72 kWp

13

2.4 Mechanical dimensioning of the PV plant

FV2.2: 979.20 kWp

FV3: 84.436 kWp

PV module Specication

Type Mono-crystalline silicon

Nominal power (Pn) 320 Wap

Maximum power voltage (Vpm) 54.70 V

Maximum power current (Ipm) 5.86 A

Open circuit voltage (Voc) 64.80 V

Short circuit current (Isc) 6.24 A

Weight 18.6 Kg

Net [gross] module surface 1.57 m2 [1.63 m2]

Table 2.4: Specications of the PV module

This study focuses on the sub-plant FV1, that have specication shown in Tab.

2.5. The FV1 sub-plant is composed by two dierent module groups: the rst group

with nominal power of 606.7 kWp and a modules Tilt angle of 15and a second

group with nominal power of 353.3 kWp and a modules Tilt angle of 3.


The support structures of the shelters are metallic structures that ensure the

anchorage to the ground to the PV modules. They also ensure the designed right

angle to the PV modules. All the mechanical structures were designed and built

by the company ESPE srl, respecting the Italian legislation (Leggi 1086/71, 64/74,

D.M. 14 January 2008). The shelters were dimensioned to resist to the following

loads:

Permanents loads

1. Structures weight

2. Ballast weight

3. Modules weight

14


Overloads

1. Snow loads

2. Wind loads

3. Thermal variations

4. Seismic eects

The nal checks of the structure was made with the most unfavorable load

conditions and also applying a safety factor for the tipping checks equal to 1.5.

For the resistance checks was applied allowable stresses equal to 1.125amm and

1.125amm.

PV module Specication

Type Mono-crystalline silicon

Nominal power of PV system 960 kWp

Total number of modules 3000

Total number of inverter 3

Total number of strings 250

Total number of module for each string 12

Net [gross] modules' surface 4710 m2 [4892 m2]

PV1 PV system

Nominal power of PV system 353.3 kWp

Azimuth -10

Tilt 3


Net [gross] modules' surface 1733.3 m2 [1799.5 m2]

PV2 PV system

Nominal power of PV system 606.7 kWp

Azimuth -10

Tilt 15


Net [gross] modules' surface 2976.7 m2 [3090.5 m2]

Table 2.5: Specications of the PV system FV1

15

2.5 Electrical dimensioning of the PV plant

2.5 Electrical dimensioning of the PV plant

The FV1 PV Plant was partitioned in two sub-plants having peak power equal

to 606.72 kWp and 353.28 kWp. For the electrical dimensioning of the FV1 Plant a

productivity study was made, as shown in Tab. 2.6 for PV1 modules group and in

Tab. 2.7 for PV2 modules group.

606.72 kWp PV sub-plant productivity

Fixed system: Tilt = 15 , Orientation = 10

Month Ed Em Hd Hm

January 1.90 59.0 2.39 74.1

February 2.36 66.1 3.0 84.0

March 3.40 105 4.43 137

April 4.36 131 5.79 174

May 4.82 149 6.57 204

June 5.11 153 7.13 214

July 5.17 160 7.26 225

August 4.83 150 6.81 211

September 4.05 121 5.52 166

October 3.14 97.3 4.18 130

November 2.16 64.9 2.79 83.8

December 1.70 52.6 2.15 66.5

Year Mean 3.59 109 4.48 147

Year Total 1310 1770

Table 2.6: Productivity of the 606.72 kWp PV sub-plant

Where:

Ed: Electrical average daily productivity (kWh/kWp per year);

Em: Electrical average monthly productivity (kWh/kWp per year);

Hd: Average solar daily radiation for square meter (kWh/m2);

Hm: Average solar monthly radiation for square meter (kWh/m2);

16

2.6 Data acquisition system

353.28 kWp PV sub-plant productivity

Fixed system: Tilt = 3 , Orientation = 10

Month Ed Em Hd Hm

January 1.52 47.2 1.96 60.7

February 2.04 57.1 2.60 72.9

March 3.11 96.3 4.02 125

April 4.19 126 5.52 166

May 4.81 149 6.51 202

June 5.19 156 7.20 216

July 5.20 161 7.25 225

August 4.71 146 6.58 204

September 3.74 112 5.06 152

October 2.71 84.1 3.62 112

November 1.75 52.5 2.30 68.9

December 1.33 41.2 1.73 53.6

Year Mean 3.37 102 4.54 138

Year Total 1230 1660

Table 2.7: Productivity of the 353.28 kWp PV sub-plant

Where:

Ed: Electrical average daily productivity (kWh/kWp per year);

Em: Electrical average monthly productivity (kWh/kWp per year);

Hd: Average solar daily radiation for square meter (kWh/m2);

Hm: Average solar monthly radiation for square meter (kWh/m2);


The Company ESAPRO, who designed the PV plant, installed a data acquisi-

tion system in order to monitor the system main parameters. The solar irradiation

is monitored by LP-PYRA02 sensors with resolution of 1 W/m2. PT100 type

temperature sensors are used to measure the PV module temperature and the am-

bient temperature. The data acquisition system consist of three inverters, the solar

irradiation sensors and the PV module/ambient temperature sensors. The data

17


from inverters and sensors are characterized by protocols Modbus, Probus, clean

contacts or digital inputs, and they are collected by a PLC Siemens with scada

WINCC for processing and storage. Another scada WINCC is used to extract and

duplicate local data. All the acquired data is down-loadable by the web site of

the society ESAPRO "http://supervisione.espe.it/fotovoltaicoWeb/index.htm" that

designed and installed the PV plant.

DATA TYPE Units

Total Energy Production kWh

Active Power - Radiation kW W/m2Daily Energy Production kWh

Monthly Energy Production kWh

Annual Energy Production kWh

Ambient Temperature

C

Module Temperature

C

Integral Radiation W/m2

Table 2.8: Acquired Data Type from ESAPRO Website

The Web-site of ESAPRO allows to download data in XLS, CSV and PDF

digital formats by using a web module for selecting the desired period of time. This

procedure for data extracting is static: it not allows the user to have a real-time

acquisition but a manual downloading is necessary. In addiction, XLS and CSV

les become too big for high acquisition periods of time and are dicult to manage.

In order to obtain a real-time acquisition system and a historical local database

with the data from the put-into-service-day of the PV plant, a software has been

created. The developed software called "SOLAR DATA EXTRACTOR" allows to

interrogate the ESAPRO web-site every 10 minutes and extract the data type shown

in Tab. 2.8. The extracted data is inserted into an MS ACCESS 2007 Database. By

interrogating the Access Database is possible to have real-time information about

the main parameters of the PV plant. Fig. 2.8 shows the main screen of the

software: it's possible to select the MANUAL or AUTOMATIC acquisition mode.

In order to implement a Web application and to have a fast data interrogation from

Matlab software, a data conversion from ACCESS DB to MySQL Database was

made by a developed Java software, as shown in Fig. 2.10. A general scheme of the

data acquisition system developed is shown in Fig. 2.11.

18

2.7 Electrical Power Production Data

Figure 2.8: Software SOLAR DATA EXTRACTOR main screen

Figure 2.9: Software SOLAR DATA EXTRACTOR export screen


The relation between power production, time and meteorological conditions are

vital for an accurate prediction of solar power production. These factors should be

taken into account when determining the data that will be used in power production

forecasting. In this study, the data sets used in the forecasting process consist of past

peak electrical power production values, ambient temperature values, PV modules

temperature values and solar irradiation values. A plot of the Data sets used in this

study is shown in Figure 2.13. For each hour i considered as the beginning time of

the forecasting, the input vector was given by:

The average value of the power produced by the PV plant in the previous 60

minutes respect to the hour i given by:

Pm(i) =1

6

it=i50min

P (t) i = 1, ..., 6297 (2.1)

19


Figure 2.10: Software for ACCESS DB to MySQL DB conversion

The hourly average value of the module temperature (

C), ambient tempera-

ture (

C), irradiance on plain inclined at a tilt angle of 3 and irradiance for

a tilt angle of 15

(W/m2).

The target used to evaluate model prediction is given by Pt(i, l), the sum of the

average hourly powers Pm(r) during the forecast time horizon l, dened as:

Pt(i, l) =1

6

i+lr=i+1

Pm(r) i = 1, ..., 6297 (2.2)

Figure 2.14 shows the correlation between the solar radiation for PV modules

at 3and Output Power, solar radiation for PV modules at 15and Output Power,

Ambient Temperature and Output Power and also Module Temperature and Output

Power, on the basis of one year collected data. The correlation coecient of Pearson-

Bravais (R2) is used to evaluate the data correlation: it is evident that the most

correlated parameter with PV power is given by irradiation but in view of the R2

values obtained, all parameters have been taken into consideration to implement

the forecasting Models in this study.

The very high correlation between Solar Irradiation and PV Power is also shown

in Fig. 2.12: the solar irradiation curve is almost the same of the PV Power curve,

it follows the same trend for every time instant.

20


Figure 2.11: General PV Dataset Management System

Figure 2.12: Correlation between Solar Radiation and Output Power

21


(a) Output Power (b) Ambient Temperature

(c) Module Temperature (d) Solar Radiation at 3

(e) Solar Radiation at 15

Figure 2.13: Input Dataset plot

22


(a) Ambient Temperature - Output Power (b) Module Temperature - Output Power

(c) Irradiation at 3 - Output Power (d) Irradiation at 15 - Output Power

Figure 2.14: Input Dataset plot

23

Chapter 3

Electrical time series forecasting

3.1 State of the Art

Figure 3.1: Italian renewable energy production

Load and Productivity forecasting has always been a key instrument in power

system operation. Many operational decisions in power systems, such as unit com-

mitment, economic dispatch, automatic generation control, security assessment,

maintenance scheduling and energy commercialization depend on the future be-

havior of loads and productivity. In particular, with the rise of deregulation and

free competition of the electric power industry all around the world, loads and

productivity forecasting has become more important than ever before [12]. Since

renewable energy power plants such as PV systems and Wind farms were used, the

productivity forecast for the national energy system become dicult due to the

24


high variability of the electricity production of this new systems.

In recent years, accuracy of electricity productivity forecasting has become very

important on regional and national scale. In terms of precision, electricity suppliers

are interested in various horizons in order to estimate the fossil fuel saving, to man-

age and dispatch the power plants installed [14]. The uncertainty of power from

the sun is a limitation of PV system, inuencing the quality of the electrical system

that connected. So, the possibility to predict the PV Power (up to 24 h or even

more) can become very important for an ecient planning of the Grid Connected

photovoltaic systems [2]. The PV Power production mainly includes two kinds of

methods (Fig. 3.2): physical model-based and historical data-based methods [33].

Physical models are based on numerical weather predictions (NWP) to predict so-

lar radiation and other meteorological data. It does well in in medium-term and

long-term predictions. The historical data methods requires only past power or cli-

mate data and do better in short-term prediction. All the dierent wind and solar

power prediction studies, and also electric loads predictions studies, underline the

necessity to implement forecasting models by using physical and statistical models

or to combine short-term and medium-term models to improve the forecasting per-

formance. In this study only historical data-based models for short term PV Power

forecasting are used.

Figure 3.2: Renewable Energy forecasting methods

In literature dierent historical data-based forecasting methods were developed

to evaluate the performance of PV systems. Statistical models include moving av-

erage and exponential smoothing methods such as, multi-linear regression models,

stochastic process, data mining approaches, autoregressive and moving averages

(ARMA) models, Box-Jenkins methods, and Kalman ltering-based methods [12].

25


However, electric load time series are usually nonlinear functions of exogenous vari-

ables. Therefore, to incorporate non-linearity, Articial Neural Networks (ANNs)

have received much more attention in solving problems of electricity load or pro-

ductivity forecasting [13]. In [15] the power forecasting of a PV system has been

presented by calculating the solar radiation, collecting data from weather forecast-

ing, and using Elman neural network to forecast by using data from PV system.

In [16] a MLP network for 24 h forecasting ahead of solar irradiation was devel-

oped. The proposed model used as input parameters the mean daily irradiation

and the mean daily air temperature. De Giorgi et al [18] compared ARMA models,

which perform a linear mapping between inputs and outputs with Articial Neu-

ral Network (ANNs) and Adaptive Neuro-Fuzzy Inference Systems (ANFIS), which

perform a non-linear mapping, underlining that ANNs presents an higher accuracy

at long time horizon in wind power forecasting. The higher forecasting accuracy for

long time horizon given by non-linear models as the ANN is also shown in [3842]

for PV power prediction and in [4349] for wind power signals. However, one major

risk in using ANN models is the possibility of excessive training data approxima-

tion, i.e., over-tting, which usually increases the out-of-sample forecasting errors

[16].

Recently, new methods for time series forecasting based on Learning Machines

were developed, using Support Vector Machines (SVMs). A Support Vector Ma-

chine for Regression problems is based on the Vapnik statistical learning theory

[4, 5] and has been used for electrical productivity forecasting (PV systems, Wind

parks, etc..). Several studies demonstrate that SVMs are more resistant to the over-

tting problem, by achieving high generalization performance in solving forecasting

problems of various time series. In addiction SVM can model complex problems

with datasets made up of several variables and a reduced training dataset. In [35]

is focused the less computational times of the SVM, compared to ANN models us-

ing back-propagation algorithms. Ruidong Xu et. al. [36] shown that PV power

forecasting by using SVM models is more ecient than that of the ANN and more

practicable and also in [50] and [51] forecasting models based on Support Vector

Machines Regression are proposed. Also in [37] SVM model outperform ANN.

A variant of the standard SVM is the Least Square Support Vector Machine

(LSSVM) [10], which uses a simplied linear model, more simple and computation-

ally easier but with the same advantages of the ANNs and SVMs models. LSSVM

models are already applied for wind power forecasting such as in [52] and [53]: is

is focused that this models outperform standard SVM Regression, in particular for

26

3.2 What is a Learning Machine

very-short forecasting horizons.

Improvement of prediction performance is noticeable in particular for hybrid

methods based on Wavelet decompositions [13], [19] and [33]. The non-stationary

nature of PV power (such as wind power) is interesting for a WD studying of this

kind of signals; the time series can be decomposed into approximately stationary

components and allowing to the model to analyze those components separately.

Xiyun Yang et. al. [7] used wavelet transform to process data for PV power pre-

diction with a SVMs Model, and the Wavelet Transform was also used to improve

the performance of Neural Networks Forecasting Models for Wind Power and Na-

tional Electric Loads [33], [34]. In [54] an hybrid approach based on WD and

ANNs and evolutionary algorithm was successfully proposed. No one already used

a LS-SVM model or an hybrid LS-SVM model for PV Power forecasting, so the

innovation introduced in this thesis concern the usage of a LS-SVM model for PV

Power forecasting and also an Hybrid Model based on LS-SVM. A nal evaluation

of the performance of ANNs, LS-SVMs and LS-SVMs with Wavelet Transform is

proposed in order to reach the best performance for every forecasting horizon.


A learning system is a system that provides an adaptive answer to external

urges. A Learning process requires a feedback from the external environment that

give information to the system that learn from the quality of the answer associated

to the urge. Learning machines are based on three types of learning processes (Fig.

3.3):

Figure 3.3: Learning Machines

27


Supervised Learning: the feedback provided to the system is implemented

by an error function that evaluate the bias from the system response to the

optimal response. The target of the learning process is to minimize the error

function in order to obtain an optimal response;

Non-Supervised Learning: the optimal response to the urge input does not

exist. The learning machine can be able to extract information of similarity

between input data (without desired output association) in order to make a

categorization;

Reinforcement Learning: is a programming philosophy that allows to re-

alize algorithms able to learn and adapt themselves to the environment mu-

tations. This programming technique is based on the possibility to receive

external urges depending on the algorithm choices. A correct choice will re-

sult in an award while a incorrect choice will result in a penalization of the

system. The target of the system is to reach the higher premium and then

the best result.

It isn't always possible to specify a deterministic relation between an input

dataset (urges) and an associated output dataset (response). The solution to these

problems is to Learn from some examples the functional relation (target function)

that map the input space in the output space. The evaluation obtained by the

target function from the extrapolated input example information is called Learning

Problem Solution. It is important to select from a set of functions the one with the

best input-output mapping performance.

An AGENT is a system designed inspiring on a human or animal model. An

Agent is made up of a Sensor System integrated with the external environment in

order of acquiring data, a Decisional System in order to take decisions basing on

acquired data and an Actuation System for operating on the environment using the

decision from the Decisional System.

In a system for reinforcement learning:

The Agent receive sensations from the environment using its sensors

The Agent decide the actions on the external environment

According to the results of the actions, the agent can be awarded

28

3.3 Created Models

Figure 3.4: Reinforcement Learning

In order to use an automatic learning method, general suppositions about en-

vironment properties are made, in particular the environment is described by a

Markov Decision Process (MDP), formally dened by:

A nite action set A;

A nite states set S;

A transition function T (T : S A (S)) that assign to every couplestate-action a probability distribution on S;

A reinforcement function (reward function) R (R : S A A

3.4 Training and Test Datasets

Input Vector Description

I Output Power

II Output Power, Irradiation at 3, Irradiation at 15, Module

Temperature, Ambient Temperature

Table 3.2: Input Vectors used with every Models

Before applying one of the forecasting Models, some operations on the Datasets

are necessary in order to obtain better performance and to allow to the simulation

software (Mathworks Matlab) to success the forecasting procedure, as shown in

Fig. 3.5 . After the acquisition, the data was adjusted to suit the Matlab routine,

normalized between a range [1; 1], uploaded into a Matlab database and after theforecasting procedure the dataset was denormalized and compared with the real

power data, evaluating the performances of the model. The data normalization was

made by a Min-Max Normalization procedure, using the Eq. 3.1.

= minA

maxA minA (NmaxA NminA) +NminA (3.1)

Where:

= value to normalize;

= normalized value;

minA = minimum value of ;

maxA = maximum value of ;

Nmax = maximum value of the new range;

Nmin = minimum value of the new range;

3.4 Training and Test Datasets

All the collected data time series (365 days/6297 hourly records) were divided

in two sets: Training and Testing data sets. The training data set included 65% of

the time series data, the testing data set 35% (Figure 3.6), in the same way done

by De Giorgi et. al. [2]. Figures 3.7 and 3.8 shows the trend of some samples from

the training and test datasets.

30

3.5 Model Performances Evaluations methods

Figure 3.5: Acquired dataset adjustment

Figure 3.6: Training and Test Dataset division


The Mean Absolute Error (MAE), Normalized Mean Absolute Percentage Error

(NMAPE) and the Standard Deviation (Std) of MAE [33] can be used to measure

the prediction performance of the created models:

MAE =1

n|Pi Ti| (3.2)

NMAPE =1

n |Pi Ti|

C 100 (3.3)

31


Figure 3.7: Example of used Training data

Figure 3.8: Example of used Test data

Std =

1

n 1

Pi Ti MAE2 (3.4)

Where:

i: generic time instant;

n: number of observations;

Pi: forecasted PV power at time instant i;

32


Ti = real PV power at time instant i;

C = maximum real PV at time instant i.

De Giorgi et. al. [2] have chosen the NMAPE as the best parameter in order

to evaluate in a correct way the quality of the forecasts. The normalizations of

the dierences between predicted power and real power allows to prevent that very

dierent errors have the same weight for the performances evaluation. For Example,

without normalization, a dierence between Pi = 2 kW and Ti = 4 kW have the

same importance that a dierence between Pi = 200 kW and Ti = 400 kW . The

NMAPE parameter will be used for evaluating the performances of every Models

used in this thesis. Analyzing the output data obtained from the forecasting Models

created, it was noticed that, in lot of cases, the output data is a negative value. The

negative output data will be replaced with a zero value. Even though the NMAPE

is a valid indicator of the performances of a forecasting Model, a better statistical

analysis of the output data has been proposed: has been studied the probability of

obtaining an NMAPE value between dierent Ranges of values, as shown in Tab.

3.3.

Forecasting Models Evaluated Probability Range

Model I 1%, 5%, 10%, 20%

Model II 1%, 5%, 10%, 20%

Model III 1%, 5%, 10%, 20%

Table 3.3: Probability range for the NMAPE

33

Chapter 4

Articial Neural Networks (ANNs)

Neural networks are composed by simple elements operating in parallel and

inspired on the biological nervous system. The complexity of the real neurons is

highly abstracted when modeling articial neurons. These basically consist of inputs

(like synapses), which are multiplied by weights (strength of the respective signals),

and then computed by a mathematical functions which determines the activation

of the neuron. Another function (which may be the identity) computes the output

of the articial neuron (sometimes in dependance of a certain thereshold). ANNs

combine articial neurons in order to process information. By adjusting the weights

of an articial neuron we can obtain the output we want to specic input and this

process of adjusting the weights is called learning or training.

Figure 4.1: Articial Neuron

4.1 Elman Back-propagation Neural Network

A rst prediction model based on Elman Neural Network has already been used

by P.M. Congedo et. al. in order to forecast the productivity of the PV system

of the Campus Ecotekne [1]. This kind of Neural Network has a feedback from

the output of the rst layer to the input of the same layer - as shown in Fig. 4.1

34

4.2 Forecasting with Model I - Input Vector I and II

- thus enabling the detection and generation of time-varying patterns [55]. This

characteristic is of great importance as the time-length ot the prediction increases.

The used scheme consists of three layers of neurons. The number of neurons in

each layer is dened in Tab. 4.1. In the rst layer the hyperbolic tangent sigmoid

transfer function (TANSIG) [56] was applied and in the second layer the linear

transfer function (PURELIN) [57] was used. The "gradient descent weight and

bias" was used as learning function (LEARNGD) [58] to determine how to adjust

the neuron weights to maximize performance.

Figure 4.2: Typical architecture of an Elman Back Propagation ANN

The implementation of the Elman Neural Network in the calculator was done

by using the Matlab

Neural Network Toolbox

Version R2011b. The input data

were processed in order to delete wrong acquisitions (negative values and out-of-

scale values) and also normalized between the intervals [1; +1]. The parametersused for designing the Elman Neural Network are shown in Tab. 4.1. The Model

based on Elamn ANN was called MODEL I, as already shown in Tab. 3.1; INPUT

VECTOR I and INPUT VECTOR II are used as input for the created Model I.

4.2 Forecasting with Model I - Input Vector I and

II

In this section, experiments were carried out to evaluate the performance of

the Model I and simulations results are shown in Tab. 4.2. Firstly, forecasting

performance are evaluated with the NMAPE for Model I and Input Vectors I and

II.

35


Parameter Value

Training function TRAINGDX

Adapt learning function LEARNGD

Number of layers 3

Neurons (layer 1) 5

Neurons (layer 2) 5

Neurons (layer 3) 1

Activation function hidden layer TANSIG

Activation function output layer PURELIN

Epochs 500

Table 4.1: Training parameters for the Elman ANN

NMAPE

Horizon

+1

NMAPE

Horizon

+3

NMAPE

Horizon

+6

NMAPE

Horizon

+12

NMAPE

Horizon

+24

[%] [%] [%] [%] [%]

Input

Vector I

9.40 15.11 20.18 21.12 18.54

Input

Vector II

6.49 10.37 13.46 14.22 19.60

Table 4.2: Productivity forecasting results for Model 1

The NMAPE rises as the forecasting horizon increases, except for the forecasting

horizon +24 with Input Vector I. The lower NMAPE for horizon +24 with Input

Vector I is due to the low correlation of dataset for an high forecasting horizon.

As expected, using Input Vector II instead of Input Vector I, better forecasting

performances can be obtained. Figures 4.3 and 4.4 show histograms with the data

in Table 4.2 while Figure 4.5 shows a line chart comparison between the NMAPE

values obtained with Input Vectors I and II on Model I: using Input Vector II the

NMAPE decreasing is higher for high forecasting horizons.

Figures from 4.6 to 4.9 show a graphical comparison between the Real PV Power

Production and the forecasted PV Power Production for some time samples and also

36


Figure 4.3: NAMPE for Model I with Input vector I

Figure 4.4: NAMPE for Model I with Input vector II

Figure 4.5: Comparisons between NMAPE for Model I - Input Vectors I and II

a plot of the corresponding error. Graphs show that using Input Vector I there is

always a bias error between the forecasted and real PV power (visible when the real

37


PV power as zero value). It is also shown that usign Input Vector I the forecasting

model is not very capable to follow abrupt changes of the Real PV power signal,

such as in case of an unexpected passage of clouds over the PV plant. Using Input

Vector II the performance increases: the forecasting power better follows abrupt

changes of the real power and also the bias error is cleared. For both Input Vector

and for every forecasting horizons, the forecasted PV power signal presents a delay

for the edge of the real power signal and an advance for falling edge of the real

power signal. This behavior of the model is also visible from the error signal that

has a sinusoidal for real power peaks.

Tables 4.3 and 4.4 show the probability distribution of the NMAPE for all the

forecasting horizons. Figure 4.10 shows the error distribution for the Model I with

Input Vector I and II and for all the forecasting horizons. These graphs allows

to evaluate the tendency af the Model to underestimate or overestimate the real

PV power: generally, using Input Vector I there is and underestimation of the error

while using Input Vector II the error distribution has an average value closer to zero.

It can be also inferred from the graphs that error distributions for long forecasting

horizons have higher standard deviation values, especially for the horizon +24 h.

Probab.

(+1 h)

Probab.

(+3 h)

Probab.

(+6 h)

Probab.

(+12 h)

Probab.

(+24 h)

[%] [%] [%] [%] [%]

1 % 2 2 1 2 3

5 % 16 9 6 8 17

10 % 72 56 12 17 34

20 % 87 76 67 37 61

Table 4.3: Probability analysis results for Model I with Input Vector I

Fig. 4.11 shows the error distribution comparison between Input Vector I and

II for Model I: the best performance of the Input Vector II in terms of mean of the

Gaussian distribution are clearly perceptible. A graphical comparison between the

probability distributions of the NMAPE for Model I is also provided and is shown

in Figures 4.12 and 4.13. As expected, the probability to have a low NMAPE is

not very good using Input Vector I, while it raises using Input Vector II. Figure

4.12 shows that using Input Vector I a forecasting horizon increasing is not always

related to a Probability decreasing, due to a low correlation between the input and

the output data, while Figure 4.13 shows that using Input Vector II, a forecasting

38


Figure 4.6: Comparisons between actual and forecasted PV power and Error for

Model I with Input Vector I - forecasting horizon +1 hour

Figure 4.7: Comparisons between actual and forecasted PV power for Model I with

Input Vector II - forecasting horizon +1 hour

39



Input Vector I - forecasting horizon +6 hour


Input Vector II - forecasting horizon +6 hour

40


horizon increasing always cause a Probability decreasing, for every NMAPE range.

In conclusion, Input Vector II allows to achieves the best global performance: a

low NMAPE for every forecasting horizon and an error distribution without high

overestimation or underestimation of the real PV power. Input Vector I may be

used just in case a lower computational time is desired.

Probab.

(+1 h)

Probab.

(+3 h)

Probab.

(+6 h)

Probab.

(+12 h)

Probab.

(+24 h)

[%] [%] [%] [%] [%]

1 % 29 16 11 7 4

5 % 64 41 27 21 15

10 % 78 65 53 44 31

20 % 91 82 78 78 57

Table 4.4: Probability analysis results for Model I with Input Vector II

41


(a) Model I - IV I - +1h (b) Model I - IV II - +1h

(c) Model I - IV I - +3h (d) Model I - IV II - +3h

(e) Model I - IV I - +6h (f) Model I - IV II - +6h

(g) Model I - IV I - +12h (h) Model I - IV II - +12h

(i) Model I - IV I - +24h (j) Model I - IV II - +24h

Figure 4.10: Error distribution for Model I and all Horizons

42


(a) Horizon +1h (b) Horizon +3h

(c) Horizon +6h (d) Horizon +12h

(e) Horizon +24h

Figure 4.11: Error distributions comparisons between Input Vector I and II of Model

I for all forecasting Horozons

43


(a) Input Vector I (b) Input Vector II

Figure 4.12: Absolute error distributions for Model I

(a) Input Vector I (b) Input Vector II

Figure 4.13: Absolute error distributions for Model I

44

Chapter 5

Support Vector Machines (SVMs)

5.1 Introduction to Support Vector Machines

Support Vector Machines (SVMs) - also called Support Vector Network or Kernel

Machines - was designed and developed by the Soviet statistic and mathematician

Vladimir Vapnik in the 90es in the Bell AT&T laboratories. The algorithm on which

it is based falls in the Vapnik-Chervonenkis theory or Statistical Learning Theory,

that consist of a supervised learning that allows to generalize and classify new

elements starting from a base of elements learned in the past. The rst industrial

applications of SVMs where:

Optical Character Recognition (OCR)

Text Classifying

Objects Recognition

In order to illustrate the theory of SVMs, a data set of l observations was considered,

where every observation consist of a couple:

xl


f :


Figure 5.2: Dierent separation iperplanes: only H2 is the optimal hyperplane

so every points xi A are contained in a half-space and every points xj B arecontained in the other half-space. There is a vector w 0. It's possible to rescale the last relations simply dividing for , without

generality losing:

wTxi + b 1,xi AwTxj + b 1,xj B(5.6)

Now are introduced some denitions, lemmas and propositions that allows a better

comprehension of the concept of separating hyperplane, in particular the denition

of Separation Margin will be introduced.

Denition 5.1. Let H be a Separator Hyperplane. The Separation Margin of H

is the minimum distance between the points in A B and the given hyperplane:

(w, b) = minxiAB

{xTxi + bw

}(5.7)

47


Denition 5.2. An Optimal Hyperplane H(w, b) is the separation hyperplane

having the maximum Separation Margin. The optimal hyperplane is found by

solving the optimization problem:

(w, b) = maxx


Proposition 5.1.4. It's demonstrated that if (w, b) is the solution of the opti-

mization problem, then (w, b) is the only solution for that problem:

minx

5.2 SVM for Regression Models - SVR

4. yi(xi w + b) 1 0, i = 1, ..., l

Where:

5. Lp(w, b, ) =12w2 li=1 iyi(xiw + b) +i i: is the Lagrangian of theproblem

Through mathematical steps, not shown in this thesis to avoid burdening the

discussion, it is possible to determine the Lagrangian form of the optimization

problem:

min () =1

2

li=1

li=1

yiyj(xi)Txjij

li=1

i

t.c.l

i=1

iyi = 0

i 0 i = 1, ..., l

(5.15)

The training vectors xi are also called Support Vectors. Support Vectors have

non-null corresponding multipliers i .

Form the 5.15 it is possible to obtain the following classication function, that

allows to calssicate the x vectors after a learning phase:

f(x) = sign((w)T x+ b) = sign(

li=1

i yi(xi)Tx+ b

)(5.16)

Where xi are the support vectors, i are the related Lagrange coecients and b0 is

a constant.


Suppose that we want to approximate a linear relation between a set of input

data (x vector) and a output observations (y vector), using as linear estimator a

function f :


> 0 is the precision used to approximate the function. is called Tube size. The

model estimation is correct if:

|yi wTxi b| (5.18)

The Loss Function is introduced:

|y f(x;w, b)| = max {0, |y f(x;w, b)| } (5.19)

and also the Training error is dened:

E =l

i=1

|yi f(xi;w, b)| (5.20)

The training error is zero only if the following system of equations is satised:

wTxi + b yi yi wTxi b

i = 1, ..., l

(5.21)

The articial variables i,i are introduced in 5.22, with i = 1, ..., l:

wTxi + b yi + i

yi wTxi b + i

i, i 0(5.22)

It is noticed that the term:

li=1

i, i 0 (5.23)

is an Upper Bound for the Training error. As the linear SVM for classication

problems, the following problem is studied:

51


minw,b,i,i

1

2w2 + C

li=1

(i + i

)wTxi + b yi + i

yi wTxi b + i

i, i 0i = 1, ..., l

(5.24)

In SVR, the parameters w and b are estimated by minimizing the regularized risk

function:

minw,b,i,i

1

2w2 + C

li=1

(i + i

)(5.25)

Where the rst term

12w2 represent the regularized term (or complexity penalizer)and C

li=1

(i + i

)is the empirical risk with error calculated using intensiveloss function given by 5.19. The regularized constant, C, determines the trade o

between the model complexity and training error. , known as tube size, controls

the deviation of f(x) from y. Both C and are parameters specied by user. This

regularized risk function is the key in balancing the needs between learning accuracy

and capacity for learning. It turns out that the optimization in Eq. 5.24 can be

solved more easily in its dual formulation. This can be done with the introduction

of Lagrange multipliers , , ,. It can be shown that [2426] this function has a

saddle point with respect to the primal and dual variables at optimal solution. It is

minimum at the saddle point with respect to primal variables and maximum with

respect to dual variables. The dual of the problem 5.24 is the following problem:

52


maxL(w, b, , , , , , ) =1

2w2 + C

li=1

(i + i

)

li=1

(i + i

)

li=1

(i + i

)+

li=1

i[wTxi + b yi i

]

li=1

i[yi wTxi b i

]t.c. wL = 0

L

b= 0

L

i= 0 i = 1, ..., l

L

i= 0 i = 1, ..., l

, 0, 0

(5.26)

Or rather:

maxL(w, b, , , , , , ) =1

2w2 + C

li=1

(i + i

)

li=1

(i + i

)

li=1

(i + i

)+

li=1

i

t.c. wL = 0

L

b= 0

L

i= 0 i = 1, ..., l

L

i= 0 i = 1, ..., l

, 0, 0

(5.27)

53

5.3 Loss Functions

The Problem 5.40 can be written in the following form:

min (, i) =1

2

li=1

lj=1

(i i

)(j j

) (xi)Txj

l

i=1

(i i

)yi +

li=1

(i + i

)+

lj=1

(i i

)0 C i = 1, ..., l0 C i = 1, ..., l

(5.28)

The linear estimator in Eq. 5.17 can be expressed as:

f(xi, i, i) =mi=1

(i i

) (xi, xj

)+ b (5.29)

It is demonstrated that the Lagrange multipliers can only be non-zero when |f(x)y| . This means that (i , i) will be zero for data lying inside the tube. Hencethe sparse representation of data falling outside the tube by (i , i) are knownas the Support Vectors. However, having a sparse representation by increasing the

tube size depreciate the accuracy of approximation. Therefore, presents atrade-o between sparseness representation and accuracy.

5.3 Loss Functions

Loss functions are used as a measure of errors between estimate and actual

values. Choice of loss function depends very much on the problem at hand. A

discussion on general loss function in SVM can be found in [27]. Fig. 5.3 shows a

intensive loss functions used in regression. It is proved that the optimal choiceof loss functions in regression is actually related to the noise density distribution in

the data [4]. It is also proved [28] that intensive loss function is the best choicefor additive and Gaussian noise whose variance and mean are random.

5.4 Nonlinear SVR using kernel

The capability of SVM can be further extended to enable the learning of non-

linear functions. The input data can be mapped from the input space to a higher

54


Figure 5.3: intensive loss function

dimensional feature space using a mapping function (x). The linear hyperplane

estimator can be written as:

y = f(x) = w (x) + b (5.30)

Using this approach, computation and generalization problems are found:

The over-tting of data, due to the increasing of the number of features used

in mapping the data into higher dimension, can cause a degradation of gen-

eralization performance.

The increase of features used in mapping also results in an increase in com-

putational resources in evaluating the features.

The problem on generalization performance can be solved from the perspective

of statistical learning theory [4] by limiting the capacity for learning from a small

data set in a rich feature space. This can be done with the introduction of a capacity

control term w2 leading to the regularized risk functional in Eq. for our case here.This means that SVM can generalize well regardless of the dimension of the feature

space used. The second problem (computation resources) can be solved by using

kernel functions. One important property of linear learning machine is that they can

be expressed in a dual representation (Eq. 5.40). This dual function is expressed

by the dot product of the training data points and the decision of this function

is found by evaluating dot product between training and test data points. The

Duality Theory and the use of kernel functions allows to generalize the discussion

to the non-linear regression models, similarly to what was done for the classication

problems. In particular, the training problem for a SVM for non-linear regression

is dened as follows:

55


min (, )1

2

li=1

lj=1

(i i

)(j j

)k(xi, xj

) li=1

(i i

)yi

+ l

i=1

(i + i

)l

j=1

(i i

)= 0

0 C i = 1, ..., l0 C i = 1, ..., l

(5.31)

Where k(x, z) is a kernel function. The formulated problem is a CQP and the

solution (, ) allows to dene the regression function in the following form:

f(x) =l

i=1

(i i

)k(x, xi

)+ b (5.32)

where b can be determined by using complementarity conditions. Any kernel sat-

isfying Mercer's conditions [30] can be used as a kernel function. Some of the

commonly used kernels are:

Linear Kernel:

K(x, y) = (x y) (5.33)

Polynomial kernel:

K(x, y) = (1 + x y)d (5.34)

Radial basis function(RBF):

K(x, y) = exp( x y2) (5.35)

Multi-Layer Perceptron:

K(x, y) = tanh(b(x y) c) (5.36)

Gaussian Radial basis function:

K(x, y) = exp

((x y)

2

22

)(5.37)

56

5.5 Least Square Support Vector Machine for Regression

It is evident that the implicit mapping of a kernel oers a cheap way for SVM

to construct a variety of non-linear functions. Not only does a kernel character-

ize the features of the input data, but having a well dened kernel could greatly

improves the performance of SVM. Therefore, studies on kernels constitute an im-

portant area of research in SVMs and the Training of SVR is more complex that

the training for the SVM for classication. It was studied that RBF Kernel are

the most powerful. Unlike the polynomial kernels, RBF kernels can manage very

closed decision surfaces, that is a useful property in can of classifying dataset with

a class completely encapsulated in the data of another class. The only drawback is

the highest computational time. To use a Support Vector Machine is necessary to

dene:

Kernel Type;

Kernel Parameters;

Value of C.

No theoretical criteria are available to dene all these parameters. The typi-

cal procedure involves a validation on a validation dataset using Cross-Validation

algorithms.

5.5 Least Square Support Vector Machine for Re-

gression

In [4] and [10] a modied form of SVM algorithm was proposed, called Least

Square Support Vector Machines, in order to reduce the computing time of the

SVMs. The Training of the LS-SVM is more simple because it requires the solution

of a set of linear equations (linear KKT systems). LS-SVMs are closely related

to regularization networks and Gaussian processes but additionally emphasize and

exploit primal-dual interpretations. LS-SVM was introduced by [4], as a modied

form of SVM of [10]. For the productivity forecasting of this thesis, the Radial

Basis Function (RBF) is used. In literature many tests and comparisons showed

great performances of LS-SVMs on several benchmark data set problems and were

very encouraging for further research in this promising direction [10].

A model in the primal weight space is considered:

57

5.5 Least Square Support Vector Machine for Regression

Figure 5.4: LS-SVM: an interdisciplinary topic

y(x) = wT(x) + b (5.38)

Where x

5.6 LsSVM Matlab Toolbox

Lw

= 0 w = Nk=1 k(xk)Lb

= 0Nk=1 k = 0Lek

= 0 k = ek k = 1, ..., NLk

= 0 wT(xk) + b+ ek yk = 0 k = 1, ..., N

(5.41)

After elimination of variables w and e and applying the kernel tick, the resulting

LS-SVM model for function estimation becomes then:

y(x) =Nk=1

kK(x, xk) + b (5.42)

Note that in the case of RBF Kernels, one has only two additional tuning

parameters (, ), which is less than for standard SVMs. Fig. 5.5 shows a

time series prediction on the Santa Fe chaotic laser data set [31] using a LS-SVM

with RBF kernel.

Figure 5.5: Time series prediction by LS-SVM with RBF kernel [9]

5.6 LsSVM Matlab Toolbox

The commercial software Mathworks Matlab R2011b includes a lot of functions

in order to implement a SVM model for classication and regression, while a Least

Square SVM can not be implemented without installing a specic external Toolbox.

In particular the LS-SVMlab Toolboox Version 1.8, created by K. De Brabanter et.

al. [9] at the ESAT-SISTA research division of Electrical Engineering department

of the Katholieke Universiteit Leuven, was chosen. The LS-SVMlab Toolbox is

59

5.7 Forecasting with Model II - Input Vector I

compiled and tested for dierent computer architectures including Linux and Win-

dows. Most functions can handle datasets up to 20.000 data points or more. The

LS-SVMlab interface for Matlab consists of a basic version for beginners as well as

a more advanced version with programs for multiclass encoding techniques and a

Bayesian framework. In this section it's shown how to obtain a LS-SVM model for

classication or regression [9] (Fig. 5.6) while a complete Matlab source code of the

implemented LS-SVM model is shown in Appendix A :

Choose between the functional or objected oriented interface (initlssvm);

Search for suitable tuning parameters (tunelssvm);

Train the model given the previously determined tuning parameters (trainlssvm);

Simulate the model on e.g. test data (simlssvm);

Visualize the results when possible (plotlssvm).

Figure 5.6: List of commands for obtaining an LS-SVM model


In this section, experiments were carried out to evaluate the performance of the

Model II using Matlab R2011b software with a comparison of the results obtained

by Model I on the same data sets. Firstly, the performance of the created Model

are evaluated with the the Normalized Absolute Percentage Error (NMAPE) (See

Section 6.3), for each forecasting horizon. The Model that uses the LS-SVM is

called MODEL II (See Table 3.1). The LS-SVM Training was carried out by several

training procedures repetitions, in order to obtain a performing couple of and 2

parameters (so, to obtain lower values of NMAPE) and the nal values of the LS-

SVM parameters are listed in Table 5.1: these parameters have been proved to be

60


optimal for every forecasting horizons and for every Input Vectors. The results of

the NMAPE values evaluation are shown in Table 5.2 and Figure 5.7.

2

22.8 2600

Table 5.1: Parameters of the LS-SVM based Model

NMAPE

Horizon

+1

NMAPE

Horizon

+3

NMAPE

Horizon

+6

NMAPE

Horizon

+12

NMAPE

Horizon

+24

[%] [%] [%] [%] [%]

Input

Vector I

7.53 13.62 18.22 21.11 18.52

Table 5.2: Productivity forecasting results for Model II with Input Vector I

Figure 5.7: NMAPE values for Model II with Input Vector I

Simulations results shows that, using Input Vector I, an highest forecasting

horizons non always correspond to an NMAPE increasing. For Example, for the

forecasting horizon +24 hours, the NAMPE value is lower than the value for the

61


horizon +12 hours and it may depends on the low data correlation for high fore-

casting horizons. This model behavior is the same obtained with Model I and Input

Vector I. Fig. 5.8 shows a comparison between NMAPE values obtained with Mod-

els I and II with Input Vector I. Model II is better for very short forecasting horizons

while for +12 and +24 hours horizons the performance of Model II are almost the

same as those of Model I.

Figure 5.8: Comparison of the NMAPE values between Model I and II with Input

Vector I

Figures 5.9 and 5.10 show a graphical comparison between the Real Power Pro-

duction and the forecasted Power Production for some time samples and also a plot

of the corresponding error. Graphs show that using Input Vector I there is always

a bias error between the forecasted and real PV power (visible when the real PV

power as zero value), but the error is less than the ones evaluated using Model I,

especially for low forecasting horizons where the bias error is almost zero. It is

also shown that using Input Vector I the forecasting model is not very capable to

follow abrupt changes of the real PV power signal, such as in case of an unexpected

passage of clouds over the PV plant. In addiction, for all the forecasting horizons,

the forecasted PV power signal presents a delay for the edge of the real power signal

and an advance for falling edge of the real power signal. This behavior of the model

is also visible from the error signal that has a sinusoidal for real power peaks, as

also shown for Model I.

Tables 5.3 shows the probability distribution of the NMAPE for all the forecast-

ing horizons. Figure 5.11 shows the error distribution for the Model I with Input

62



Model II with Input Vector I - forecasting horizon +1 hour

Figure 5.10: Comparisons between actual and forecasted PV power for Model II

with Input Vector I - forecasting horizon +6 hour

63


Probab.

(+1 h)

Probab.

(+3 h)

Probab.

(+6 h)

Probab.

(+12 h)

Probab.

(+24 h)

[%] [%] [%] [%] [%]

1 % 45 2 1 2 3

5 % 57 10 5 8 17

10 % 70 61 12 17 34

20 % 88 77 70 37 61

Table 5.3: Probability analysis results for Model II with Input Vector I



(e) Horizon +24h

Figure 5.11: Error distribution for Model II - Input Vector I - all Horizons

Vector I and for all the forecasting horizons. These graphs allows to evaluate the

tendency of the Model to underestimate or overestimate the real PV power: gen-

64


erally, using Input Vector I there is an underestimation of the error so the error

distribution has a behavior of a Gaussian with mean value shifted left from zero.

It can be also inferred from the graphs that error distributions for long forecasting

horizons have an higher standard deviation value, especially for the horizon +24

h. Fig. 5.12 shows the error distribution comparison between Model I and II with

Input Vector I: the best performance of Model II in terms of mean of the Gaussian

distribution are perceptible, using Model II is possible to reduce the model under-

estimation but for high forecasting horizons the results are very similar to those

obtained with Model I, so there is an high shift of the distribution mean. The fore-

casting horizon +24 h is always the critical one because of the very high standard

deviation of the error distribution.



(e) Horizon +24h

Figure 5.12: Error distribution for Model I and II- Input Vector I - all Horizons

A graphical comparison between the probability distributions of the NMAPE for

65

5.8 Forecasting with Model II - Input Vector II

Model II with Input Vector I is also provided and it is shown in Figg. 5.13 and 5.14.

Fig. 5.13 shows that using Input Vector I a forecasting horizon increasing is not

always related to a Probability decreasing, but for a xed forecasting horizon the

probability always increases with the probability range because of we are considering

cumulative probability values. It is clear that the results for very short forecasting

horizons are very good: for the [-20%;+20%] range the probability reach the 88 %

for the +1h horizon, the 77 % for the +3h and the 70 % for the +6h, while for high

forecasting horizons the results are quite poor and are not acceptable for a good

forecasting system.

In conclusion, Input Vector I with Model II allows to achieves the best per-

formance just for low forecasting horizons, even though its global performance are

better then those reached with the same Input Vector but using Model I.

Figure 5.13: Comparisons between NMAPE probability for Model II - Input Vector

I


In this section are shown the results of the simulations using the Forecasting

Model II and the Input Vector II (see Table 3.1 and 3.2), for every forecasting

horizons. Firstly, the performance of the created Model are evaluated with the

the Normalized Absolute Percentage Error (NMAPE) (See Section 3.5), for each

forecasting horizon. The training parameters and 2 are the same of that used

for Input Vector I and are listed in Tab. 5.1. The results of the NMAPE values

evaluation are shown in Tab. 5.4 and Fig. 5.15.

Using Model II and Input Vector II, an increasing of the forecasting horizons

66


Figure 5.14: Comparisons between NMAPE probability for Model II - Input Vector

I

NMAPE

Horizon

+1

NMAPE

Horizon

+3

NMAPE

Horizon

+6

NMAPE

Horizon

+12

NMAPE

Horizon

+24

[%] [%] [%] [%] [%]

Input

Vector I

7.53 13.62 18.22 21.11 18.52

Input

Vector II

6.40 10.18 13.49 14.53 19.50

Table 5.4: Productivity forecasting results for Model II with Input Vector II

is always related to an increasing of the NMAPE value, similarly to Model I with

Input Vector II. Fig. 5.16 shows a comparison between NMAPE performance using

Model II with Input Vector I and II: the performance are better for +1,+3,+6,+12

hours forecasting horizons using Input Vector II while for the +24 h horizon the

comparisons can't be done due to the untrusted NMAPE value obtained with the

Input Vector I. Fig. 5.17 shows a comparison between NMAPE values obtained

using Models I and II with Input Vector II: performance of the two Models are

practically the same.

Figures 5.18 and 5.19 show a graphical comparison between the Real PV Power

Production and the forecasted Power PV Production for some time samples and

also a plot of the corresponding error. Graphs show that using Input Vector II

there isn't the bias error between the forecasted and real PV power observed using

Input Vector I, (when the real PV power as zero value). It is also shown that

67


Figure 5.15: NMAPE values for Model II with Input Vector II

Figure 5.16: Comparison between NMAPE values of Model II with Input Vector I

and II

using Input Vector II the forecasting model is more capable than Input Vector I to

follow abrupt changes of the real PV power signal, such as in case of an unexpected

passage of clouds over the PV plant. In addiction, for all the forecasting horizons,

the forecasted PV power signal presents a delay for the edge of the real power signal

and an advance for falling edge of the real power signal. This behavior of the model

68


Figure 5.17: Comparison between NMAPE values of Model I and II with Input

Vector II

is also visible from the error signal that has a sinusoidal for real power peaks, as

also shown for Model I (Sect. 4.2).

Probab.

(+1 h)

Probab.

(+3 h)

Probab.

(+6 h)

Probab.

(+12 h)

Probab.

(+24 h)

[%] [%] [%] [%] [%]

1 % 38 22 17 10 4

5 % 62 48 41 25 15

10 % 77 65 58 44 31

20 % 91 82 77 75 57

Table 5.5: Probability analysis results for Model II with Input Vector II

Tables 5.5 shows the probability distribution of the NMAPE for all the forecast-

ing horizons. Figure 5.20 shows the error distribution for the Model I with Input

Vector II and for all the forecasting horizons. These graphs allows to evaluate the

tendency of the Model to underestimate or overestimate the real PV power: gen-

erally, using Input Vector II there is a slight overestimation for the +1h horizon

and a slight underestimation of the error for the other horizons. It can be also

inferred from the graphs that error distributions for long forecasting horizons have

69



Model II with Input Vector II - forecasting horizon +1 hour

Figure 5.19: Comparisons between actual and forecasted PV power for Model II

with Input Vector II - forecasting horizon +6 hour

70




(e) Horizon +24h

Figure 5.20: Error distribution for Model II - Input Vector II - all Horizons

an higher standard deviation value, especially for the horizon +24 h. Fig. 5.21

shows the error distribution comparison between Model I and II with Input Vector

II: in this case the best performance of Model II in terms of mean of the Gaussian

distribution are perceptible for +1h,+3h and +6h, horizons but barely perceptible

for +12h and +24h horizons: using Model II is possible to reduce slightly the model

underestimation. The forecasting horizon +24 h is always the critical one because

of the very high standard deviation of the error distribution and the improvements

are not perceptible using the LS-SVM regression model.

A comparison between the probability distributions of the NMAPE for Model II

is shown in Figg. 5.22 and 5.23. The probability to have a low NMAPE using Input

Vector II is very good for every forecasting horizons, in particular it is noted that

the probability to have an NMAPE lower than 1 % is almost 40 % for forecasting

71




(e) Horizon +24h

Figure 5.21: Error distribution for Model I and II- Input Vector I - all Horizons

horizon +1 and in the [20%; +20%] range is the 91 %. Fig. 5.23 shows that usingInput Vector II a forecasting horizon increasing is always related to a Probability

decreasing, such as noted for Model I with Input Vector II. In conclusion, perfor-

mance are clearly better then those obtained with Input Vector 1 and barely better

then those reached using Model I with Input Vector II.

72


Figure 5.22: Probability analysis results for Model II with Input Vector II

Figure 5.23: Probability analisys results for Model II with Input Vector II

73

Chapter 6

LSSVM with Wavelet Transform

6.1 Fourier transform and short-term Fourier trans-

form

Figure 6.1: Fourier transform: from time domain to frequency domain

With the Fourier Transform is possible to move a signal representation from the

time domain to the frequency domain. The new frequency domain representation

is useful for signal analysis, even though time information are lost: is no longer

possible to determine "when" a particular event happened. Direct and Inverse

Fourier Transform have the mathematical representation shown in Eq. 6.1.

F () =

+

f(t) ejtdt

F (t) =

+

F () ejtd(6.1)

The Fourier Transform evaluate the dierent frequencies weight of a signal. Even

thought a signal is not stationary, but only stationary for short time intervals, the

74

6.2 Continuous Wavelet transform

spectrum for this signal can be calculated by "moving" a "stationary signal window"

on consecutive signal segments, in order to realize a Short Term Fourier Transform

(STFT). It is advisable to superimpose the windows during their moving, in order

to obtain a better interpolated representation of the signal.

Figure 6.2: Short Term Fourier Transform (STFT)

The Short Term Fourier Transform is a compromise between time and frequency

but its precision depends on the window amplitude and the amplitude can not be

variate, but it is constant for each frequency. So, it is necessary an adaptive window

to scale requirement in time and frequency domains.


The Wavelet Transform uses adaptive windows in order to improve results ob-

tainable using STFT. Adaptive windows encloses long time intervals to analyze low

frequencies and short time intervals to analyze high frequencies. A signal is ex-

pressed as the combination of children wavelets, results of the shifting and scaling

from a mother wavelet.

C(scale, shift) =

+

s(t) (scale, shift, t)dt (6.2)

From a generic wavelet (a, b, t), where a and b are the shifting and scaling factors,

the Continuous Wavelet Transform (CWT) is dened as the integral of the signal

s(t) multiplied for the scaled wavelet [21]:

W (a, b) =1a

+

s(t) (t ba

)dt (6.3)

Where correspond to the complex conjugate of the wavelet function . Similar

to Fourier Transform, Wavelet Transform also approximate a given signal s(t) by

75


using a basis function except that the basis function here is a small wave instead of

a continuous sinusoidal function.

Figure 6.3: Wavelet Transform (source: Mathworks website)

A Wavelet Scaling consist of the stretching and compression of the mother

wavelet. The smaller the factor, the more complex the wavelet. A Wavelet shifting

consist of delay or anticipate the mother wavelet. If (t) is the original wavelet,

(t k) is its k-delayed version.

Figure 6.4: Continuous Wavelet Transform Process

The Following present a simplistic way to generate a CWT:

Take an arbitrary wavelet function and compare it with the signal s(t);

Calculate the similarity coecient C, that evaluate the similarity between the

signal window and the wavelet;

Shift the wavelet function by b, compare and calculate the similarity coef-

cient again until end of signal;

Proceed to the next scale by stretching the wavelet function by a;

Repeat the last four points for every scales.

The CWT is the sum of signal windows multiplied for scaled and shifted wavelet

versions. The Wavelet Coecients are the result of a regression operation on the

original signal. Coecients graphical representation (Fig. 6.5) is done by quoting

the time on the abscissas axis and the scale of every coecient on the ordinate axis.

76

6.3 Discrete Wavelet transform

Dierent colored pixels are used to refers to the position and the module of the

coecient. High scales correspond to more stretched wavelets and the greater the

stretching, the greater the signal window compared with the wavelet.

Figure 6.5: Wavelet Coecients 2D graphical representation

Figure 6.6: Continuous Wavelet Transform (source: Mathworks website)

6.3 Discrete Wavelet transform

In continuous wavelet transform, the wavelet function is stretched and shifted

along the signal in a continuous manner. This present an enormous amount of

77

6.4 Daubechies type 4 Discrete Wavelet transform

work and there is some redundancy in the data in the sense that there is more than

enough information for re-constructing back the original signal. It turns out that

if the scales and shifting are discretised based on powers of two so called dyadic

scaled and positions the computing of the transform will be more ecient without

any loss in accuracy. The Discrete Wavelet Transform decomposes the original

signal in DETAILS (high frequencies components) and APPROXIMATIONS

(low frequencies components).

Figure 6.7: Details and Approximations decomposition with subsampling

Even if two signal are obtained from the original signal (Details and Approxi-

mations), a sub-sampling operation is done in order to keep only one sample every

two processed samples (Figures 6.7 and 6.8). The sub-sampling operation is done

by redoubling the sampling period. If the scales and shifting are discretised based

on powers of two, the following Discrete Wavelet transform is obtained:

s(t) =+j=

cj k 2j/2 (2j t k) (6.4)Where the wavelet functions (2j t k) are 2j scaled and k translated versions ofthe original wavelet (t). With a dyadic scaling, on every decomposition correspond

a halving of the data (sub-sampling).


The Wavel Transform studied in this thesis is the Daubechies Wavelet Transform

of type 4 (db4) [20], very similar to the Haar Wavelet Transform. For a signal f

with N 2 values, the Level 1 Daubechies (D1) type 4 Wavelet Transform isdened as:

78


Figure 6.8: Details and Approximations decomposition scheme

fD1 (t1|d1) (6.5)

Where:

ti = (f, V 1i )

di = (f,W 1i )(6.6)

and so on for subsequent levels. The dierences between the Daubechies Transform

and the Haar Transform are in the way to dene V 1i and W1i . We dene:

1 =1 +

3

4

2

2 =3 +

3

4

2

3 =33

4

2

4 =13

4

2

(6.7)

Terms V 1i (1st level Daub4 Transform) are constructed in as follows:

79


V 11 = (1, 2, 3, 4, 0, ..., 0)

V 12 = (0, 0, 1, 2, 3, 4, 0, ..., 0)

V 13 = (0, 0, 0, 0, 1, 2, 3, 4, 0, .

comparisons between different hybrid statistical models for accurate forecasting of photovoltaic...

Documents