by dong chan lee - university of toronto t-space › bitstream › 1807 › ... · dong chan lee...

89
Automatic Power Quality Monitoring with Recurrent Neural Network by Dong Chan Lee A thesis submitted in conformity with the requirements for the degree of Master of Applied Science and Engineering Graduate Department of Electrical & Computer Engineering University of Toronto c Copyright 2016 by Dong Chan Lee

Upload: others

Post on 06-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Automatic Power Quality Monitoring with Recurrent Neural Network

by

Dong Chan Lee

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science and EngineeringGraduate Department of Electrical & Computer Engineering

University of Toronto

c© Copyright 2016 by Dong Chan Lee

Page 2: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Abstract

Automatic Power Quality Monitoring with Recurrent Neural Network

Dong Chan Lee

Master of Applied Science and Engineering

Graduate Department of Electrical & Computer Engineering

University of Toronto

2016

The electric power grid constantly experiences disturbances that hinder efficiency and reli-

ability of the grid. This thesis is concerned with the development of automatic power quality

monitoring system that classifies power quality disturbances based on the voltage waveform.

The classification process involves generating training data, extracting features, and classifying

the data at every time step with a neural network. The feature extraction is implemented

based on short-time Fourier transform, wavelet transform, and S transform, and we present

comparisons of their performance for this application. The extracted features are used as the

inputs for the neural network, and the outputs are classes that the waveform belongs to. We

introduce recurrent neural network as the classifier for the first time in this application. The

recurrent neural network has the ability to memorize information in a time sequence data by

passing information through its hidden units. We show that recurrent neural network achieves

better performance than conventional feedforward neural network.

ii

Page 3: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Acknowledgements

I would like to thank my adviser, Professor Deepa Kundur for her encouragement and feedback

throughout my graduate studies. I also would like to thank my family and friends, and especially

my parents for their support and dedication.

iii

Page 4: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Contents

1 Introduction 1

1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Power Quality Disturbance Data Generation 5

1 Overview of Power Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1 Standard Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Classification of power quality disturbances . . . . . . . . . . . . . . . . . 7

2 Characterization of Power Quality Disturbances . . . . . . . . . . . . . . . . . . . 8

2.1 RMS and Peak Measurement . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Monte Carlo simulation for data generation . . . . . . . . . . . . . . . . . . . . . 19

4 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Real-time Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Transformation and Feature Extraction 24

1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 Short-Time Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.1 Discrete Short-Time Fourier Transform and its implementation . . . . . . 26

3 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.1 Discrete Wavelet Transform and its implementation . . . . . . . . . . . . 29

4 S Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

iv

Page 5: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

4.1 Discrete S transform and its implementation . . . . . . . . . . . . . . . . 32

5 Comparison of Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1 Common limitations of the feature . . . . . . . . . . . . . . . . . . . . . . 38

6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Classification with Long Short Term Memory 40

1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2 Feedforward Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.1 Decision making with Softmax Function . . . . . . . . . . . . . . . . . . . 43

2.2 Training Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3 Windowed Feedfoward Neural Network . . . . . . . . . . . . . . . . . . . . 45

3 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1 Advantages Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . 48

5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Results and Case Studies 50

1 Data Generation and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.1 Comparisons of the transformations . . . . . . . . . . . . . . . . . . . . . 51

2.2 Effect of the size of window . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.3 Distribution of Misclassification . . . . . . . . . . . . . . . . . . . . . . . . 57

2.4 Effect of Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4 Limitations of the Proposed Power Quality Monitor . . . . . . . . . . . . . . . . 65

5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6 Conclusion 66

Appendices 68

A Parameters for Monte Carlo Simulation 69

v

Page 6: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

B Results 72

Bibliography 74

vi

Page 7: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

List of Tables

2.1 Classification of power quality disturbances and their characterization [1, 2] . . . 8

3.1 Comparison of computational complexity of feature extraction algorithms . . . . 38

5.1 Comparison of accuracy of FNN and LSTM in percentage . . . . . . . . . . . . . 53

5.2 Accuracy of LSTM with feature from DWT . . . . . . . . . . . . . . . . . . . . . 54

5.3 Comparison of LSTM and FNN in data with noise . . . . . . . . . . . . . . . . . 59

A.1 Parameters for Monte Carlo simulation in power quality data generation . . . . . 70

A.2 Ratio of disturbance classes in the generated data . . . . . . . . . . . . . . . . . . 71

B.1 Confusion matrix for FNN from DWT features . . . . . . . . . . . . . . . . . . . 72

B.2 Confusion matrix for FNN from STFT features . . . . . . . . . . . . . . . . . . . 73

B.3 Confusion matrix for LSTM from STFT features . . . . . . . . . . . . . . . . . . 73

B.4 Confusion matrix for FNN from ST features . . . . . . . . . . . . . . . . . . . . . 73

B.5 Confusion matrix for LSTM from ST features . . . . . . . . . . . . . . . . . . . . 73

B.6 Confusion matrix for LSTM from DWT features with noise . . . . . . . . . . . . 74

B.7 Confusion matrix for FNN from DWT features with noise . . . . . . . . . . . . . 74

vii

Page 8: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

List of Figures

2.1 An example waveform of impulsive transient . . . . . . . . . . . . . . . . . . . . . 10

2.2 An example waveform of oscillatory transient . . . . . . . . . . . . . . . . . . . . 11

2.3 An example waveform of interruption . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 An example waveform of voltage sag . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 An example waveform of voltage swell . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 An example waveform of DC offset . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.7 An example waveform of harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.8 An example waveform of notching . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.9 An example waveform of noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.10 An example waveform of voltage fluctuation . . . . . . . . . . . . . . . . . . . . . 16

2.11 An example waveform of frequency variation . . . . . . . . . . . . . . . . . . . . 16

2.12 RMS and peak voltage measurement of each class of power quality disturbances . 18

2.13 An example of generated data for the training . . . . . . . . . . . . . . . . . . . . 20

2.14 (a) Classification process of existing techniques [3] (b) Classification process of

the proposed technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1 Overall process of building an automatic power quality monitor . . . . . . . . . . 25

3.2 Contour diagram of STFT coefficients . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Two-band analysis bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4 Reconstructed signal using discrete wavelet transform at different levels . . . . . 30

3.5 Contour diagram of ST coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.6 Relationships between wavelt transform, S transform and Fourier transform [4] . 36

3.7 An example of waveform and extracted features based on different transforms . . 37

viii

Page 9: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

4.1 Feed forward neural nentwork architecture . . . . . . . . . . . . . . . . . . . . . . 41

4.2 (a) A simplified single node recurrent neural network (b) Unrolled version of the

network through time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3 Conventional approach for power quality disturbance classifier [3] . . . . . . . . . 48

4.4 (a) Feedforward neural network architecture (b) Windowed feedforward neural

network architecture (c) Long short-term memory architecture . . . . . . . . . . 49

5.1 Cross entropy of the training and testing data as the training progresses . . . . . 51

5.2 Output of the automatic power quality monitoring system . . . . . . . . . . . . . 52

5.3 Performance of LSTM with various window sizes of DWT . . . . . . . . . . . . . 54

5.4 Performance of LSTM with various window sizes of STFT . . . . . . . . . . . . . 55

5.5 Performance of wFNN with various window sizes . . . . . . . . . . . . . . . . . . 56

5.6 Performance of LSTM with various sampling frequencies . . . . . . . . . . . . . . 56

5.7 Performance of LSTM with various output frequencies . . . . . . . . . . . . . . . 57

5.8 Overall distribution of misclassification . . . . . . . . . . . . . . . . . . . . . . . . 58

5.9 Distribution of misclassification for individual power quality disturbances . . . . 58

5.10 Case study of interruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.11 Case study of oscillatory transient . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.12 Case study of voltage sag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.13 Case study of harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

ix

Page 10: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 1

Introduction

1 Motivation

The electric power grid provides a convenient and affordable way to deliver energy to sustain

our society. Technologies ranging from personal electronics to manufacturing plants use electri-

cal energy delivered by the electric grid. While maintaining the reliability of the grid, engineers

began to realize that connecting multiple generators and consumers mitigate the uncertain-

ties in supply and demand. Fortunately, the invention of transformers enabled a high voltage

transmission system that significantly reduced the power loss over the long transmission lines.

Interconnections between regions started to expand with the high voltage transmission system,

and the electric power grid today is the largest human-made machine. The conventional power

system structure provides most of the electricity from large power plants such as nuclear or

gas-fired generators that are often distant from the customers for safety reasons. This system

is centrally monitored and controlled by the transmission system operator.

Today there is a strong demand for innovating the conventional design of the electric grid.

The greenhouse gas emission from fossil fuel power plants is one of the major causes of climate

change. The fossil fuel power plants are being replaced by alternative renewable energy sources.

The wind and solar energy are two promising energy sources that can be widely and safely

deployed. With the rapid advancement of technology in wind turbines and solar cells, their costs

are expected to be competitive with conventional energy sources in near future. Wind turbine

and solar cells are considered distributed generators because they have a smaller capacity than

1

Page 11: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 1. Introduction 2

conventional synchronous generators. The integration of distributed generators is leading major

shifts in the structure and operation of the electric power grid. Since these distributed generators

have a small capacity, they are scattered near the consumers in a low voltage system to avoid

long distance delivery. Both generators and consumers need to be managed in a smaller and

local grid and the low to medium voltage grid such as the microgrid is one of the most researched

topics today [5].

The distributed generators are often connected to the grid through an electronic interface

with a nonlinear control such as MPPT algorithm in Photovoltaics. The electronic components

introduce disturbances to the grid, which result in issues with power quality of the grid. In

addition, the intermittency of renewables increases variation and disturbance in the supply and

operating condition. The increased uncertainty and disturbances manifest as impurities in the

sinusoidal waveform of the voltage and current.

The impurities in the waveform are referred to as power quality disturbances. The concerns

with power quality are not new, and these issues always exist in power systems since the system

is always subject to disturbances. Increasing penetration of renewable is closely related to the

issues in power quality and its importance is continuously growing [6, 7].

The operation of the grid has to accommodate the increasing number of the distributed

generators. The measurement from distributed generator introduce very large amount of data,

and the assessment based on human operator become very expensive and unreliable. While

the changes in power system structure and operation create new challenges, recent advances in

smart grid technology give promising solutions to mitigate the rising issues with its metering

and communication technology.

2 Contributions

This thesis proposes a power quality monitor equipped with machine learning technique

to assist operator’s observability. The automated power quality monitor classifies the type of

phenomena recorded, and the system operator can easily detect and analyze the issues that

the grid is faced with. Traditionally, the operators only provided a diagnostic monitoring of

power quality. Technicians are dispatched only when customers complain continuously or after

Page 12: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 1. Introduction 3

the damage. The automated monitoring technology enables the preventive monitoring of power

quality since the data can be analyzed before customers report problems. To address the issues

regarding power quality, reliability and efficiency can be greatly enhanced with the proactive

system rather than the reactive system.

Specifically, the goal of this thesis is to increase the accuracy of the power quality monitor,

and we make several modifications from existing techniques. The specific contributions of this

thesis are as follows.

1. The technique presented in this thesis eliminates the need of a pre-segmentation algorithm,

which is required in existing techniques. Current techniques assume the monitor is given

a nicely segmented and fixed size window of disturbance in the waveform. We eliminate

this assumption and apply the classification at every time step giving a more accurate

localization of the disturbance.

2. Feature extraction algorithms based on different transforms are studied under a standard

classification algorithm. This thesis describes the algorithms and shows an experimental

evaluation and comparison of the transforms.

3. The effectiveness of recurrent neural network is studied and compared with conventional

feedforward neural network. We demonstrate that we can achieve a lower error rate and

localization of the event by passing information through the hidden network.

3 Overview

Chapter 2 provides background in current practice for power quality monitoring as well as

the state of the art techniques. We describe the data generation process to create training data

for the neural network.

In Chapter 3, we go over the existing transforms and feature extraction based on short-time

Fourier transform, wavelet transform, and S transform. In addition, the comparison of the

transforms will be discussed with their relationship to each other.

Chapter 4 presents the classification of power quality disturbances with the recurrent neural

network, which is implemented with Long Short-Term Memory. The description of how this

Page 13: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 1. Introduction 4

classifier is implemented and how it can be trained is provided in this chapter. This is a standard

method in machine learning, but some of the core techniques such as softmax output layer and

training methods were not introduced for the power quality disturbance classification.

In Chapter 5, we present our results and case studies. The accuracy with Long Short-Term

Memory is compared with the feedforward neural network. In addition, we examine the different

window size of the transform to find the optimal parameter settings. We present the limitation

of the technique due to its over-fitting towards the generated data.

We conclude our thesis in chapter 6 with the summary of the contents as well as the future

direction in this research.

Page 14: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2

Power Quality Disturbance Data

Generation

In this chapter, we provide an overview of power quality disturbances with their mathemat-

ical descriptions. The definition of each power quality disturbance is given with its cause and

the example waveform. This chapter defines the target classes and describes how each class of

disturbance can be generated based on the known magnitude and spectral content. Later we

will see how the generated data can be used to build a classification system that uses machine

learning to automatically recognize patterns in the data.

1 Overview of Power Quality

The term power quality is defined in [1] as ”any power problem manifested in voltage, cur-

rent, or frequency deviations that result in failure or misoperation of customer equipment.”

Power quality encompasses a broad range of concerns and is difficult to develop a cohesive solu-

tion in general. The IEEE Recommended practice for monitoring electric power quality (IEEE

Std 1159-2009 [2]) defines the terminologies and definitions of phenomena that are adopted in

this thesis. While the importance of power quality has increased throughout the past decades,

power quality analysts struggle with processing massive volume of measurements [11]. The

current practice in the industry is commonly equipped with Root Mean Square (RMS) and

total harmonic distortion (THD) measurements of the voltage and current waveform.

5

Page 15: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 6

1.1 Standard Measurements

Root mean square voltage/current

Root mean square of a voltage or current waveform, v[t] is defined as

Vrms(t) =

√1

T

∫ t

τ=t−T[v(τ)]2 (2.1)

where T = 1/ff is the period for the waveform’s fundamental frequency, ff , which is 60 Hz for

North American power systems.

Peak voltage/current

Peak voltage and current identify the maximum and minimum of the waveform over the

period and is defined as

Vmax(t) = maxτ∈[t−T,t]

v(τ)

Vmin(t) = minτ∈[t−T,t]

v(τ)

(2.2)

RMS and peak voltage of the waveform are related. For a pure sinusoidal wave, Vmax(t) =

−Vmin(t) =√

2Vrms(t) for every t. Having both maximum and minimum of the waveform can

be useful for detecting dc component in the waveform.

Total harmonic distortion

Total harmonic distortion (THD) estimates the overall distortion of the wave from the

fundamental and is defined as

THD =

∑Nn=2 VnV1

(2.3)

where Vn is the magnitude of nth harmonics, and V1 is the magnitude of fundamental frequency.

The maximum harmonic, N , is 7 to 13 based on the applications.

IEEE Recommended Practice and Requirements for Harmonic Control in Electric Power

Systems [12] specifies the current practice on how this measurement is utilized. Although the

total harmonic distortion has been the most widely used measurement for detecting waveform

distortions, there is a clear limitation in characterizing different phenomena. Vn is the short-

Page 16: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 7

time Fourier transform coefficient at n th frequency, and the total harmonic distortion sums

up all the coefficients in the non-fundamental frequencies. Total harmonic distortion simply

reduces the dimension of the coefficient so that it is easy to detect waveform disturbances, but

it does not have the capacity to sufficiently represent and characterize the signal content. We

can generalize the function in Equation 2.3 and use the raw information such as V1, ..., VN to

extract much more information using machine learning techniques.

Moreover, there is a limitation for time localization of the phenomena because Fourier

transform gives only the frequency representation of the signal. This point will be elaborated

in the next chapter.

1.2 Classification of power quality disturbances

Classification of power quality disturbances is based on IEEE Std 1159-2009 [2]. Table 2.1

shows the categories of power quality disturbances, which can be found in both [1, 2]. Having

consistent definitions of classes is important in preserving and extending our knowledge of

the phenomena. Classification of the power quality disturbances directs the engineers to the

solution for its fundamental issue. Each class of the disturbance often has the common causes

as well as the solution.

The power quality disturbances can be broadly classified into steady state and transient

disturbances. The steady state disturbances include waveform distortions and voltage imbal-

ances. The transient disturbances include impulsive and oscillatory transients as well as short

duration voltage magnitude variation. While our power quality monitoring has the capability

to distinguish any types of disturbances, the classes of disturbance can be selected as a subset of

the disturbances defined in the standard [2]. For example, the engineers may be interested only

in steady state disturbances so that some control scheme can be implemented. By including

various phenomena such as transient disturbances, the proposed monitor can reduce the false-

positive identification of desired classes for control. If standard measurements such as Total

Harmonic Distortion were used, the controller may react to temporary transient disturbances

reducing the efficiency controller.

Each phenomenon has common characterization in terms of its spectral content and the

magnitude, and we currently have the knowledge to recreate them. This gives us the ability

Page 17: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 8

to create example data of power quality disturbances. With a large amount of data, we can

employee the state of the art machine learning techniques to build an automatic classifier.

Table 2.1: Classification of power quality disturbances and their characterization [1, 2]

Categories Spectral Content Typical DurationTypical

Magnitude

Transients Impulsive 5 ns - 0.1 ms rise 1 ns - 1 ms plus

Oscillatory 0.5 MHz - 5 kHz 5 µs - 50 ms 0 - 8 pu

Short Duration Interruptions 0.5 cycle - 1 min < 0.1 pu

Variations Sags 0.5 cycle - 1 min 0.1 - 0.9 pu

Swells 0.5 cycle - 1 min 1.1 - 1.8 pu

Long Duration Interruptions > 1 min < 0.1 pu

Variations Under-Voltages > 1 min 0.8 - 0.9 pu

Over-Voltages > 1 min 1.1 - 1.8 pu

Voltage Imbalances steady state 0.5-2 %

Waveform Distortions dc offset steady state 0-0.1 %

Harmonics 0 - 9 kHz steady state 0-20 %

Interharmonics 0 - 9 kHz steady state 0-2 %

Notching steady state

Noise steady state 0-1 %

Voltage

Fluctuations< 25 Hz Intermittent 0.1 - 7 %

Power Frequency

Variations< 10 s ± 0.10 Hz

2 Characterization of Power Quality Disturbances

In this section, we present the complete list of power quality disturbances that are subject

to classification in this thesis. We explain the phenomena and present its numerical model and

an example of the disturbance waveform. Although we give brief descriptions of the causes

of the phenomena, the readers should consult references such as [1, 13] to gain the deeper

understanding of the subject. References such as [14, 15] also contain the information on how

synthetic disturbance data can be generated.

After presenting the examples of power quality disturbances, we show the limitation of the

Page 18: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 9

standard measurements in the classification of the disturbances. Throughout this thesis, we will

focus on the voltage waveform since the voltage is the variable that is more strictly monitored

and regulated. The grid-connected equipment is generally designed for a range of current and

a fixed value of the voltage. However, this assumption can be generalized to current by simply

removing some of the classes that are only applicable for voltage such as voltage sag and swell.

Before we present the disturbances, we first define the normal sinusoidal voltage as

vnormal(t) = sin(2πft) + µ(t) (2.4)

where f is the fundamental operating frequency and µ(t) ∼ N(0, σ2) and σ ∈ [0, 0.01]. The

fundamental frequency presented throughout the thesis is the 60 Hz which is standard in North

America, and the voltage is in per unit. The noise term, µ(t) was added as the regularization

term to avoid over-fitting of towards perfect sinusoidal waveform. Since the power grid may

not have the perfect sinusoidal waveform, adding the noise can act as a generalization of the

realistic voltage waveform. The example waveforms are generated with the sampling frequency

of 10 kHz.

Page 19: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 10

Impulsive Transients

Impulsive transients are momentary and instantaneous change in the state without chang-

ing the fundamental frequency. It is unidirectional (either positive or negative) and can be

characterized by the rise and decay time and the peak value. The most common cause of im-

pulsive transients is a lightning, and it can result in oscillatory transient if it excites the natural

frequency of the circuit [2]. Impulsive transient can be synthesized by the following equation,

v(t) = vnormal(t) + βexp

(− c t− tstart

tend − tstart

)t ∈ [tstart, tend] (2.5)

where β ∈ ±[0.1, 0.8] and c = − log( εb) are the peak and fall time constant respectively. ε is the

threshold where the disturbance can be neglected, and is set to 0.001 in our thesis. Since the

rise time is between 5 ns to 0.1 ms, the rise delay is essentially negligible for sampling frequency

up to 10 kHz. An example of impulsive transient is show in 2.1 with 0.2 peak current and 1 ms

duration. The impulsive transient was repeated just for the illustration.

Figure 2.1: An example waveform of impulsive transient

Oscillatory Transients

Oscillatory transient rapidly changes polarity and can be described by the spectral content,

duration, and magnitude. Back-to-back capacitor energization results in oscillatory current

transient. Power electronic devices can produce voltage transients due to commutation and

RLC snubber circuits. Cable switching can also result in oscillatory voltage transients [2]. This

phenomenon can be synthesized with the following equation,

v(t) = vnormal(t) + βexp

(− c t− tstart

tend − tstart

)sin(2πfht) t ∈ [tstart, tend] (2.6)

Page 20: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 11

where β ∈ ±[0.1, 0.8], c = − log( εb), and fh ∈ [500, 5000] are the peak magnitude, fall time

constant, and transient frequency of the transient component of the waveform respectively.

Figure 2.2: An example waveform of oscillatory transient

Interruptions

When an interruption occurs, the supply voltage or load current decreases to less than 0.1

p.u. for a period of time. Common causes of interruptions are power system faults, equipment

failures, and control malfunctions. The interruption due to faults can be restored by the

instantaneous reclosure [2]. If the reclosure fails, the interruption could be permanent. Figure

2.3 shows a momentary interruption, and the waveform can be generated by

v(t) = αvnormal(t) t ∈ [tstart, tend] (2.7)

where α ∈ [0, 0.1] is the magnitude of the waveform. Intuitively, RMS or peak measurement in

Equation 2.1 or 2.2 would be ideal features for classification.

Figure 2.3: An example waveform of interruption

Page 21: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 12

Voltage Sag

Voltage sag is a decrease in RMS voltage or current to between 0.1 to 0.9 p.u. with durations

from 0.5 cycle to 1 minute. It is also referred as voltage dips. Most common cause of voltage

sags is system faults but the energization of heavy loads or starting of large motors could also

cause voltage sags [2]. Similar to the interruption, the RMS voltage is an obvious indicator to

classify voltage sag. Voltage sag waveform can be generated by simply changing the voltage

magnitude,

v(t) = αvnormal(t) t ∈ [tstart, tend] (2.8)

where α ∈ [0.1, 0.9] changes the voltage magnitude.

Figure 2.4: An example waveform of voltage sag

If the duration of the voltage sag lasts more than a minute, it is classified as under-voltage.

A common cause of under-voltage is a load switching on or a capacitor bank switching off [2].

In this thesis, the voltage sag and under-voltage will be in one class because they exhibit same

characteristics. They can be classified further if needed with an additional post-processing step

that measures the duration of the event.

Voltage Swell

Voltage swell is when the voltage magnitude increases to between 1.1 and 1.8 p.u. Similar

to voltage sag, swells are caused by system faults condition. It can be also caused by switching

off a large load or energizing a large capacitor bank [2]. The voltage swell waveform can be

generated by

v(t) = αvnormal(t) t ∈ [tstart, tend] (2.9)

Page 22: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 13

where α ∈ [1.1, 1.8] changes the magnitude of the waveform. If the duration of voltage swell

lasts longer than a minute, it is classified as an over-voltage. Common causes are load switchings

such as switching off a large load and incorrect settings of transformers.

Figure 2.5: An example waveform of voltage swell

DC offset

DC offset occurs if there is a dc voltage or current in the system. The cause of DC offset

is a geomagnetic disturbance or the effect of half-wave rectification. Direct current may be

caused by the electrolytic erosion of grounding electrodes and other connectors [2]. DC offset

waveform can be generated by simply adding a bias to the normal wave form,

v(t) = vnormal(t) + γ(t) t ∈ [tstart, tend] (2.10)

where γ(t) ∈ ±[0.001, 0.01]. Intuitively, tracking both the minimum and maximum voltage in

equation 2.2 can be a good feature to identify this disturbance.

Figure 2.6: An example waveform of DC offset

Harmonics/Interharmonics

Harmonics are sinusoidal voltage or currents having frequencies that are the integer multiple

of the fundamental frequency (60 Hz). Interharmonics are voltage or current having frequency

Page 23: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 14

that are non-integer multiples of the operating frequency. Harmonic distortion usually originates

from the nonlinear characteristics of devices and loads [2]. It creates waveform distortion from

the fundamental frequency. The harmonics and inter-harmonics can be synthesized by

v(t) = vnormal(t) + β sin(2πfht) t ∈ [tstart, tend] (2.11)

Figure 2.7: An example waveform of harmonics

where β ∈ [0.1, 0.2] and fh ∈ [180, 900] and b are the magnitude, and frequency of the harmonic

respectively. Since the phenomenon is periodic with the frequency greater than the operating

frequency, it is hard to detect the harmonic with the RMS voltage. The harmonics and inter-

harmonics will be one class in the automatic classification because distinguishing them is difficult

for the classifier. Determining whether the harmonic frequency is integer multiple or not is a

highly non-convex discrete set, and thus the feature required by the classifier will have to

span a large range of frequency. Therefore we combine these classes to one, and the further

classification can be made by an additional layer of classification.

Notching

Notching is a periodic voltage disturbance. The source of nothcing is the operation of

power electronic devices. When current is commutated from one phase to another, there is

a momentary short circuit between two phases resulting in notching [2]. The waveform of

notching can be generated by

v(t) = vnormal(t) +∑i

βexp

(− c t− tstart,i

tend,i − tstart,i

)t ∈ [tstart, tend] (2.12)

Page 24: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 15

where beta ∈ [0.25, 0.5], c = − log( εb) and tstart,i+1 − tstart,i is constant for all i making the

notching periodic.

Figure 2.8: An example waveform of notching

Noise

Noise is an electrical signal with spectral content less than 200 kHz. Power electronic devices,

control circuits, arcing equipment, and switching power supplies are common causes of noise

[2]. The noise can be added to the waveform by

v(t) = vnormal(t) + µ(t) t ∈ [tstart, tend] (2.13)

where µ(t) ∼ N(0, σ2) and σ ∈ [0.05, 0.1].

Figure 2.9: An example waveform of noise

Voltage Fluctuations

Voltage fluctuation is a variation in the voltage magnitude and is also referred as the voltage

flickers. An arc furnace is one of the most common causes of the flickers [2]. Solar panels can

also produce flickers when the irradiation condition changes and the voltage is modified with

Maximum Power Point Tracking (MPPT) algorithms. Flicker can be reproduced by introducing

low frequency waveform,

Page 25: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 16

v(t) = vnormal(t) + β sin(2πff t) t ∈ [tstart, tend] (2.14)

where β ∈ [0.05, 0.1] and ff ∈ [10, 25] are magnitude and frequency of flicker. Flicker appears

as a fluctuation in RMS voltage, but the instantaneous measurement may appear to be a

continuous switching between voltage sag or swell.

Figure 2.10: An example waveform of voltage fluctuation

Power Frequency Variations

Frequency variation is when the fundamental frequency of the power system deviates from

its nominal fundamental frequency significantly. Frequency variation normally occurs due to

the faults on bulk power transmission system or due to a large block of load or generator goes

off [2]. Synthesizing the power frequency variation can be done by

v(t) = sin(2π(60 + ∆ff )t) t ∈ [tstart, tend] (2.15)

where ∆ff ∈ ±[0.05, .01].

Figure 2.11: An example waveform of frequency variation

Page 26: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 17

Sag/Swell and Harmonic

Power grid can also experience combinations of the disturbances listed above. Two phe-

nomena that will be considered in this thesis are voltage sag with harmonics and voltage swell

with harmonics. The synthesis of those disturbances can be done by

v(t) = αvnormal(t) + β sin(2πfht) t ∈ [tstart, tend] (2.16)

where α ∈ [0.1, 0.2], fh ∈ [180, 900], β ∈ [0.1, 0.9] are the magnitude, frequency, and phase of the

harmonic respectively. The combination of disturbances are rarer than individual disturbance,

and thus only these two were considered. However, if a system regularly experiences certain

combinations, then those class can be added to this list. The machine learning approach

for classification allows this expansion of the list very easy because the modification of the

monitoring system is automated.

Voltage imbalance was not considered because the system is designed with input from single

phase input. Building a classifier with three phase input is more complex than having three

separate single phase classifier. This is because the feature required for the three phase input is

three times the single phase input, and the neural network will require a much larger capacity

to process the tripled input features. For the three phase system, dqo frame could be effective

in classifying severely unbalanced disturbances.

Although this section gave the complete list of power quality disturbance listed in the IEEE

standard [2], it may not cover the complete list of phenomena that can occur in the grid. One of

the advantages of establishing automatic data generation process is that we can easily expand

our definition by adding a class with its mathematical description. This approach allows us to

preserve definitions and expands records and understanding of the power quality disturbances.

2.1 RMS and Peak Measurement

In this section, we give RMS and peak voltage measurements that were presented in section

1.1. Figure 2.12 shows RMS and peak measurement of each example waveform. While RMS

measurement is generally good for classifying events related to the voltage magnitude such as

voltage sag and swell, it is unable to retrieve the spectral information. Since this is limited

Page 27: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 18

information retrieved from the waveform, we need an additional layer of feature extraction to

extract more information about the waveform. This layer will be presented in the next chapter.

Figure 2.12: RMS and peak voltage measurement of each class of power quality disturbances

Page 28: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 19

3 Monte Carlo simulation for data generation

The characterization developed in the previous sections will be used to generate data for

training the neural network. The data will be generated by Monte Carlo simulation from a

uniform distribution with the range given in Table A.1. A general form of the equation can be

described by the following equation,

v(t) = α sin(2π(60 + ∆ff )t) +∑i

βiexp

(− c t− tstart

tend − tstart

)cos(2πfh(t− tstart)) + µ(t) + γ(t)

(2.17)

for t ∈ [tstart, tend], µ(t) ∼ N(0, σ2), and c = − log( εb) where ε = 0.01. Since we have a

full characterization of power quality disturbances, it removes the need of manual labeling

process. The pseudocode for the data generation is given in Algorithm 1 with some notations

adopted from MATLAB. We let N be the length of the data we want to generate. The function

randi(a, b) draws random integer between a and b. We denote the type of disturbance by i and

insert normal waveform (i = 1) between each disturbance since the probability of a disturbance

right after another disturbance is very low. An example of data generated by the proposed

method is presented in Figure 2.13.

Algorithm 1 Data generation with Monte Carlo simulation1: tstart ← 02: i← 13: while tstart < N do4: if i==1 then5: i← randi(2, 14)6: else7: i← 18: end if9: α,∆f, β, fh, σ, γ, duration← sampled with range given in table A.1 with event i

10: tend ← tstart + duration11: v(tstart : tend)← equation 2.1712: label(tstart : tend)← i13: tstart ← tend14: end while15: Output v

Since the duration of each disturbance is different, there is an issue of fairness of generated

data between the disturbances. For example, if the majority of the data is voltage swell, then

Page 29: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 20

Figure 2.13: An example of generated data for the training

the bias of the neural network towards voltage swell would be very high. Therefore, all of

the disturbances were sampled with approximately equal probability in their total duration.

The total duration was chosen to be balanced between classes because the training of neural

network will be minimizing the objective function equally distributed over time. The number

of occurrence for the impulsive and oscillatory transients were higher than other classes by

about 15 and 5 times respectively. This result in almost equal ratio of total duration for each

disturbance except the normal waveform. The normal waveform is about 10 times the rest

of the data, and this is to weakly represent the realistic grid, which is usually in the normal

condition. Table A.2 shows both the ratio of classes in terms of the total duration and the

number of occurrences. In the next section, we present the literature survey where we collect

the history of the work that attempted to achieve the automatic monitoring of power quality.

4 Literature Survey

The goal of automatic classification of power quality disturbances has a long history and

some of the works date back to mid to late 1990s. The automatic classification involves two

major steps, which are feature extraction and classification. Feature extraction algorithms are

from techniques in signal processing such as short-time Fourier transform, wavelet transform

and S transform. The classification algorithms are from techniques in machine learning such

as neural network, support vector machine, and etc. In the literature, we will see that many

papers are different combinations of feature extraction and classification algorithm.

One of the earliest papers that tried to address this problem with neural network was done by

Page 30: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 21

Gosh et. al. [16]. Wavelet transform was noticed to be effective in detecting disturbances [17],

and the first classification system combining wavelet transform and neural network was achieved

[18]. After then, Gaouda et. al. [19] found that by Parseval’s theorem, the energy in discrete

wavelet transform coefficient is a much better feature for classification. Wavelet transform is

still one of the dominating feature extraction algorithm in this application [20, 21, 22, 23].

The variation was in the classification algorithm, and multiple neural networks with decision

making with voting scheme algorithm [20], neural structure [21], probabilistic neural network

[22], and self-organizing learning array [23]. While these classification algorithms often show

great results, all of the existing techniques avoid the issue of windowing of the sampled waveform

with a segmentation algorithm. The segmentation algorithm divides the data sequence into

normal and disturbance parts by adding a layer of an algorithm such as the triggering method

[3]. Figure 2.14 shows the comparison between existing and proposed algorithms where we

remove the segmentation layer and have output in the function of time. The decision-making

layer is also eliminated by the softmax output layer, which is part of the neural network.

Figure 2.14: (a) Classification process of existing techniques [3] (b) Classification process of theproposed technique

S transform is also another transform for the feature extraction [24, 25, 26, 27, 28, 14].

The classification algorithm that were considered were feedforward neural network [24, 27],

probabilistic neural network [24, 25], modular neural network [26], and decision tree [14]. Ray

et. al. [28] studied the system with distributed generation and renewable and compared the

discrete wavelet transform and S transform.

Page 31: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 22

Other approaches such as Hilbert transform [29], neural tree [29], and combining multiple

algorithms [30] were studied as well. Existing literature is summarized with through literature

survey in [31, 3]. There are recent papers as well [15, 32, 33] proposing similar ideas for power

quality classification. With recent advancements in neural networks [34, 35], the automatic

power quality monitoring system can be greatly improved and simplified by removing many

layers of the process. We propose a new recurrent neural network from the deep neural network

architecture to address the implementation of power quality disturbance classification at every

time step.

5 Real-time Classification

Removing the segmentation layer in the existing techniques enables real-time implemen-

tation of the power quality disturbance classification. The real-time monitoring system can

be implemented without extensive computing hardware upgrade. The computing capability

requirement for the proposed monitoring system is not significantly greater than the existing

capabilities. The implementation of total harmonic distortion requires computing the magni-

tude of integer multiples of fundamental frequency with Fourier transform. In the real-time

implementation, those magnitudes are the coefficient of short-time Fourier transform. The only

computation left is the forward propagation through time in Long-Short Term Memory, which

takes much less time than the short-time Fourier transform. The most expensive step in the

online classification is the feature extraction, which uses signal transform. Since the existing

infrastructure can handle the signal transforms, the additional classification layer can be im-

plemented without much upgrade. In the future, more primitive signal transforms such as dqo

transformation could be considered to significantly reduce the computational requirement of

the monitoring system.

Currently, the standard measurements such as RMS voltage and THD are constantly mon-

itored to detect any abnormality in the grid. Those values are often used to initiate control

action when those measurements are over the thresholds defined by standards [2, 12]. The

proposed technique in power quality monitoring can extend the current capability to report

multiple classes of disturbances. The identified class can be used to initiate different control

Page 32: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 2. Power Quality Disturbance Data Generation 23

actions. These control actions include Static Synchronous Compensator (STATCOM), con-

figuring the distribution network and determining the status of the capacitor bank deployed

throughout the grid. There are a number of potential applications in increasing fault tolerance

of the grid, setting control parameters for STATCOM, and operating the microgrid [8, 10, 28].

In the next section, we will first explore the options for feature extraction and give overview

and comparison of short-time Fourier transform, wavelet transform, and S transform.

Page 33: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3

Transformation and Feature

Extraction

1 Background

In order to assess the power quality issues, engineers have to carefully and often tediously

observe the system states. Most common variables that are monitored are voltage magnitude,

frequency and total harmonic distortion as mentioned in the previous chapter. These can be

considered as features that are extracted from the voltage and current waveforms. While the

standard measurements are usually quite effective in detecting whether the system is in a normal

or abnormal condition, these features are not sufficient to distinguish different disturbances. In

order to extract more information from the waveforms, this chapter investigates short-time

Fourier transform, wavelet transform, and S transform. The overall process of classification

is presented in Figure 3.1. The first step is to generate labeled data based on Monte Carlo

simulation, and we apply the transform and feature extraction. The extracted feature data will

be used to train the neural network that will classify the input features, and the next chapter

will discuss the classifier. In Figure 3.1, the upper part of the graph is the process that is done

offline before the monitor is deployed. The training of the classifier is the most time-consuming

step, and it has to be done only once to set up the parameters for the classifier. The bottom

part of the graph shows the implemented system. It will only require the feature extraction of

24

Page 34: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 25

the data and classification.

Figure 3.1: Overall process of building an automatic power quality monitor

In this chapter, we will discuss the feature extraction process using the signal transforms. We

first go over the Fourier transform, which extracts the spectral information about the waveform.

1.1 Fourier Transform

Fourier transform breaks down a signal into sinusoids at different frequencies. It transforms

our view of the signal from a time domain to frequency domain. Fourier transform uses com-

plex exponential as the basis function. The formal definition of Fourier Transformation of a

continuous signal v(t) is

V (f) =

∫ +∞

−∞v(t)e−j2πftdt. (3.1)

Discrete Fourier transform of a discrete signal v[n] is defined as

V [k] =N−1∑n=0

v[n]e−j2πNkn (3.2)

where N is the size of the signal. For the implementation of Fourier transform, fast Fourier

transform (FFT) efficiently computes based on the divide and conquer method. The algorithmic

complexity of the naive implementation of Fourier transform is O(N2), and FFT is O(N logN).

The limitation of Fourier transform in power quality monitor is its inability to localize events.

The sense of time is completely lost in Fourier transform, and therefore it needs to be modified

in order to be applicable for localizing power quality events.

Page 35: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 26

2 Short-Time Fourier Transform

In order to overcome the limitation of the Fourier transform, a window is applied to the

signal for localization. Short-time Fourier transform (STFT) applies Fourier transform to only

a fixed section of the signal at a time. By taking Fourier transform on the window in the

specified period, the time is part of the representation of the location of the window. STFT

can be defined formally as

STFTx(τ, f) =

∫ +∞

−∞v(t)w(t− τ)e−j2πftdt (3.3)

where w(t) is the windowing function such as rectangular window, Hann window, Hamming

window, etc.

2.1 Discrete Short-Time Fourier Transform and its implementation

Suppose the sampling frequency of the waveform is fi and the required sampling frequency

of the output is fo. We define the input and output sampling ratio as g = fifo

, and we will

only consider an integer ratio, g ∈ Z. Since the window function of size Q has the property of

w[n] = 0 for every n ∈ (−∞, 0) ∪ [Q,∞), the discrete STFT can be written as:

STFTx[m, k] =

m+Q−1∑n=m

v[n]w[n−m]e−j2πNkn, (3.4)

with the Hanning window

w[n] = 0.5(

1− cos( 2πn

Q− 1

)). (3.5)

In Figure 3.2, we present the short-time Fourier transform of the example signals given

in the previous chapter. The window size of one cycle was used at every time step. The

figure demonstrates the existence of other harmonics as shown in oscillatory transient and

harmonic disturbances. However, the fixed window size in STFT is limited for distinguishing

high-frequency short term events such as impulsive transients and notch. Having a shorter

time window would suffer from higher uncertainty in determining the coefficient. This shows

the limitation of short-time Fourier transform due to having the fixed size window.

Page 36: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 27

Figure 3.2: Contour diagram of STFT coefficients

The magnitude of the STFT coefficients are selected as features. We take the spectrogram

definition of short-time Fourier transform, which the square root of short-time Fourier transform

coefficients. In addition, we concatenate the peak voltage presented in the equation 2.2 to

complete the feature vector x,

x =[|STFTx|2, Vmin, Vmax

]. (3.6)

The complete algorithm for implementing feature extraction with short-time Fourier trans-

form is given in Algorithm 2. In addition to the concatenation, we normalize the feature. The

normalization of the feature sets the mean and variance of the feature data to 0 and 1 respec-

tively. This step is to help the convergence of the training neural network, and the normalization

constant is obtained during the offline training step. When the classifier is implemented after

Page 37: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 28

training the neural network, the same constant that was used for training is loaded to ensure

the computation of feature data is consistent during both training and testing.

Algorithm 2 Feature extraction from short-time Fourier transform

1: for m from 1 to L/g do2: vw ← 03: for n from m to m+Q− 1 do4: w[n]← 0.5(1− cos((2πn)/(Q− 1)))5: vw[n]← v[n] ∗ w[n−m]6: end for7: STFT [m, :]← FFT (wv)8: Vmax[m], Vmin[m]← equation 2.29: x[m, :]←

[|STFTx[m, :]|2, Vmax[m], Vmin[m]

]10: if online then11: load(xmean, xvar)12: x[m, :]← (x[m, :]− xmean)/

√xvar

13: end if14: end for15: if offline then16: for k from 1 to end do17: xmean[k]← mean(x[:, k])18: xvar[k]← var(x[:, k])19: end for20: x[m, :]← (x[m, :]− xmean)/

√xvar ∀m

21: save(xmean, xvar)22: end if23: Output x

While short-time Fourier transform gives the spectral analysis of the waveform, it has lim-

itations due to its fixed-size window. The size of the window is fixed, and there is a trade-off

in varying the window size for classifying high frequency and low-frequency disturbances. For

example, having a small window would be advantageous for detecting short duration events

such as impulsive transient, but it will be at a disadvantage for detecting flickers.

3 Wavelet Transform

In order to overcome the limitation of short-term Fourier transform, wavelet transformation

has been developed. Wavelet transform introduces the scale s, which changes the size of the

Page 38: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 29

window. Continuous wavelet transform is defined as follows:

CWTψv (τ, s) =

∫ +∞

−∞v(t)ψ∗τ,s(t)dt =

1√|s|

∫ +∞

−∞v(t)ψ∗(

t− τs

)dt, (3.7)

where τ and s are the translation and scale parameters. As it can be seen, ψτ,s(t) = 1√|s|ψ( t−τs )

where ψ is the mother wavelet.

3.1 Discrete Wavelet Transform and its implementation

Detail derivation of discrete wavelet transform will not be discussed in this thesis since they

can be found in many standard books such as [36]. Discrete wavelet transform samples the

scale and position in a dyadic grid. The wavelet system satisfies the multiresolution conditions

where wavelets in higher resolution can span wavlets in lower resolution. Moreover, the lower

resolution coefficients can be efficiently computed from higher resolution coefficients by a filter

bank. For implementation of discrete wavelet transform, the following coefficients are results

of single-level wavelet analysis,

cj(k) =∑m

h(m− 2k)cj+1(m), (3.8)

dj(k) =∑m

g(m− 2k)cj+1(m), (3.9)

where cj is called approximation coefficients and dj is called detail coefficients. Implementation

of equation 3.8 and 3.9 can be done using FIR filtering and then down-sampling by two as

shown in figure 3.3. The filter banks, h(−n) is a lowpass filter outputting the approximation

coefficient, cj , and g(−n) is a highpass filter outputting the detail coefficieint, dj .

Figure 3.3: Two-band analysis bank

Decomposed detail coefficients were used to reconstruct the signal, and it can be computed

Page 39: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 30

by

fj(t) =∑k

dj2j/2ψ(2jt− k) (3.10)

where ψ is the wavelet and j is the reconstruction level. In this thesis, the Daubechies wavelets

(db5) are used to decompose the signal into 5 levels. Daubechies wavelet system is one of the

most widely used wavelet systems. Figure 3.4 shows reconstructed details of power quality

disturbances at 5 levels using Equation 3.10. Unlike short-time Fourier transform where fre-

quencies are sampled, the signal is decomposed to different levels. In order to make the features

classifiable, the energy of a windowed signal will be used as the feature. We present the algo-

rithm for getting features based on wavelet decomposition in Algorithm 3. In the algorithm,

H is an initial buffer to obtain both peak voltage as well as the energy of the decomposed

waveform. Similar to Algorithm 2, we apply the normalization of data to make the gradient

descent work better for training the classifier.

Figure 3.4: Reconstructed signal using discrete wavelet transform at different levels

Page 40: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 31

Algorithm 3 Discrete Wavelet Transform1: c5 ← v2: for j from 4 to 1 do3: cj , dj ← equation 3.8, 3.9 with cj+1

4: f [:, j]← equation 3.10 with dj5: end for6: H ← max (Q, fs/60)7: for m from H + 1 to L do8: g[m, :]←

√∑mn=m−Q f [n, :]2

9: Vmax[m], Vmin[m]← equation 2.210: x[m, :]←

[g[m, :], Vmax[m], Vmin[m]

]11: if online then12: load(xmean, xvar)13: x[m, :]← (x[m, :]− xmean)/

√xvar

14: end if15: end for16: if offline then17: for k from 1 to end do18: xmean[k]← mean(x[:, k])19: xvar[k]← var(x[:, k])20: end for21: x[m, :]← (x[m, :]− xmean)/

√xvar ∀m

22: save(xmean, xvar)23: end if24: Output x

4 S Transform

S transform was first proposed by Stockwell in [37]. S transform has a fixed modulating

sinusoids with respect to the time axis, and a Gaussian window is dilated and translated like the

wavelet transform. It maintains relationship with both the wavelet transform and short-time

Fourier transform. The fact that S transform retains direct relationship with Fourier transform

gives a good characterization. Continuous S transform is defined as follows,

STx(τ, f) =

∫ +∞

−∞x(t)

|f |√2πe−

(τ−t)2f22 e−j2πftdt. (3.11)

The continouous S transform can be also written as

STx(τ, f) =

∫ +∞

−∞X(α+ f)e

− 2π2α2

f2 ej2πατdα, f 6= 0 (3.12)

where X(f) is the Fourier transform of x(t). We can show equation 3.12 is equivalent to 3.11.

Page 41: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 32

STx(τ, f) =

∫ +∞

−∞X(α+ f)e

− 2π2α2

f2 ej2πατdα

=

∫ +∞

−∞

[ ∫ +∞

−∞x(t)e−j2π(α+f)tdt

]e− 2π2α2

f2 ej2πατdα

=

∫ +∞

−∞

∫ +∞

−∞x(t)e−j2π(α+f)te

− 2π2α2

f2 ej2πατdtdα

=

∫ +∞

−∞

∫ +∞

−∞x(t)e−j2πfte

− 2π2α2

f2 ej2πα(τ−t)dαdt

=

∫ +∞

−∞

[ ∫ +∞

−∞e− 2π2α2

f2+j2πα(τ−t)

]x(t)e−j2πftdt.

(3.13)

The integral of a Guassian function in a general form can be written as

∫ +∞

−∞e−ax

2+bx+c =

√π

aeb2

4a+c, (3.14)

which can be used for the integration inside Equation 3.13. Hence

STx(τ, f) =

∫ +∞

−∞

[√πf2

2π2e

−4π2(τ−t)2f2

8π2

]x(t)e−j2πftdt

=

∫ +∞

−∞

[√f2

2πe

−(τ−t)2f22

]x(t)e−j2πftdt

=

∫ +∞

−∞x(t)

|f |√2πe

−(τ−t)2f22 e−j2πftdt

(3.15)

which is equivalent to Equation 3.11.

4.1 Discrete S transform and its implementation

Using equation 3.12, we can utilize Fast Fourier transform to increase the computational

efficiency. S transform can be written in discrete time by

STx[m, k] =N−1∑p=0

X[p+ k]e−2π2p2

k2 ej2πpm. (3.16)

Some part of the implementation can utilize fast Fourier transform and inverse fast Fourier

transform. However, multiplication of the Gaussian window e−2π2p2

k2 has to be done for every p

and m, and thus the algorithmic complexity is O(N2). Contour plots of S transform are shown

in Figure 3.5. When this is compared to the contour plot of short-time Fourier transform in

Page 42: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 33

Figure 3.5: Contour diagram of ST coefficients

Page 43: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 34

Figure 3.2, we can see clearer and more localized representation of the signal. Especially the

high-frequency disturbances such as impulsive transient, notch were accurately captured and

shown.

Algorithm 4 shows the implementation of S transform and feature extraction. Since the

discrete sample size is large, we will sample it down by d. In the implementation, the window

size, Q, was set to 12 cycles of the fundamental frequency, and the frequency sampling was

done at every 240 Hz.

Algorithm 4 Implementation of Feature extraction with Discrete Time S transform

1: for i from 1 to 2N/Q do2: v ← v[i ∗Q/2 + 1 : i ∗Q/2 +Q]3: V ← FFT (v)4: for k from 1 to l + 1 do5: for p from 1 to Q do

6: B[p]← V [p+ k]e−2π2p2

k2

7: end for8: D[:, k]← IFFT (B)9: end for

10: STx[i ∗Q/2 +Q/4 + 1 : i ∗Q/2 + 3Q/4]← D[Q/4 + 1 : 3Q/4]11: end for12: x← |STx[:, 1 : d : K]|13: if online then14: load(xmean, xvar)15: x[m, :]← (x[m, :]− xmean)/

√xvar ∀m

16: end if17: if offline then18: for k from 1 to end do19: xmean[k]← mean(x[:, k])20: xvar[k]← var(x[:, k])21: end for22: x[m, :]← (x[m, :]− xmean)/

√xvar ∀m

23: save(xmean, xvar)24: end if

5 Comparison of Transforms

In this section, we give both theoretical and experimental comparisons of the transforms.

At this stage, the extracted features are too ambiguous to compare their performances. Their

performance will be evaluated after applying a standard classifier in Chapter 5. The theoretical

Page 44: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 35

comparison was given by Ventosa et. al. [4], and this section will briefly go over it. The com-

parison is done on a continuous version of transforms to establish straightforward mathematical

relationships.

Firstly, we compare S transform and Fourier transform. S transform can recover Fourier

transform by following equations,

∫ +∞

−∞STx(τ, f)dτ = X(f). (3.17)

This can be seen by directly substituting and using equation 3.14,

∫ +∞

−∞STx(τ, f)dτ =

∫ +∞

−∞

∫ +∞

−∞x(t)

|f |√2πe−

(τ−t)2f22 e−j2πftdtdτ

=

∫ +∞

−∞

[ ∫ +∞

−∞e−

(τ−t)2f22 dτ

]x(t)

|f |√2πe−j2πftdt

=

∫ +∞

−∞

[√2π

f2

]x(t)

|f |√2πe−j2πftdt

=

∫ +∞

−∞x(t)e−j2πftdt = X(f).

(3.18)

Therefore, we can see that summing S transform over time gives Fourier representation of the

signal. Short-term Fourier transform is a windowed version of Fourier transform, and within

the window, the above equation holds. Next, the relationship between the continuous wavelet

transform and S transform is shown.

STx(τ, f) =

∫ +∞

−∞x(t)

|f |√2πe−

(τ−t)2f22 e−j2πftdt

= e−i2πfτ∫ +∞

−∞x(t)

|f |√2πe−

(τ−t)2f22 ej2πf(τ−t)dt

(3.19)

We give the definition of Morlet Wavelet,

ψ(t) =1√2πe

12t2ej2πft (3.20)

and substitute s = 1f to replace the frequency with scale. Then,

Page 45: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 36

STx(τ, f) = e−i2πτ/s∫ +∞

−∞x(t)

1

|s|√

2πe−

12

( t−τs

)2e−j2π( t−τs

)dt

=e−i2πτ/s√|s|

1√|s|

∫ +∞

−∞x(t)ψ∗(

t− τs

)dt

=e−i2πτ/s√|s|

CWTψx (τ, s)

(3.21)

Now we can relate S transform with the continuous wavelet transform by the phase factor

e−i2πτ/s√|s|

. Figure 3.6 summarizes the theoretical comparison. Although continuous transform

shows the direct relationship between transforms, the discrete implementation gives perk for

the wavelet transform over short-time Fourier transform and S transform due to its efficiency

in implementation.

Figure 3.6: Relationships between wavelt transform, S transform and Fourier transform [4]

Although S transform may appear to be a good feature, S transform and short-time Fourier

transform shares a disadvantage compared to discrete wavelet transform. Both S transform

and short-time Fourier transform introduce redundant representation if the frequency sampling

points are too many [38]. The redundant representation means the feature input size is larger

for the classifier, and there are more computations involved. If the sampling point is too low, it

may not convey sufficient information for classification. While S transform shows more favorable

characteristics as shown in Figure 3.5, we will see that the classification accuracy is not as good

as it was expected in Chapter 5. This issue may come from insufficient frequency sampling

points. Currently, there is no consensus on how to effectively select the limited sampling point

in the frequency, and it needs further investigation.

In figure 3.7, we show an example features that were obtained based on Algorithms 2, 3 and

Page 46: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 37

4. These are the features that will be sent to the classifier for both offline training and online

classification.

Figure 3.7: An example of waveform and extracted features based on different transforms

Algorithmic complexities of proposed transforms are also important for implementation,

and they are given in the table 3.1. While the short-time Fourier transform computes the

features in O(Q logQ) with fast Fourier transform, the S transform requires O(Q2) for comput-

ing feature per window with size Q. The discrete wavelet transform is most efficient with the

implementation as described in the previous section.

While the characterization has different advantages and disadvantages in terms of efficiency

in implementation and characterization, there is a fundamental limitation in extracting features.

The features from the transforms are based on the frequencies of the signal, and they are subject

to the Heisenberg uncertainty principle.

Page 47: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 38

Table 3.1: Comparison of computational complexity of feature extraction algorithms

Algorithmic complexity

short-time Fourier transform O(NQ logQ)

discrete wavelet transform O(N)

S transform O(NQ2)

5.1 Common limitations of the feature

The Heisenberg uncertainty principle states the following. Given the following variables,

mt =

∫ ∞−∞

t|x(t)|2dt

σt =

[ ∫ ∞−∞

(t−mt)2|x(t)|2dt

] 12

mf =

∫ ∞−∞

f |X(f)|2df

σf =

[ ∫ ∞−∞

(f −mf )2|X(f)|2df] 1

2

(3.22)

where mt and σt are the average and uncertainty in time, and mf and σf are the average and

uncertainty in frequency, Heisenberg uncertainty principle states that

σtσw ≥1

2. (3.23)

This means there is a fundamental trade-off between the localization of the feature and

uncertainty of feature. As we try to extract more accurate frequency feature, the localization

is lost. And as we obtain more accurate localization, there is more uncertainty in the frequency

feature. The wavelet transform and S transform tries to address the issue by having windows

in various sizes depending on the frequency that we are trying to capture. While we expect

the short-time Fourier transform to perform significantly worse than the other transforms, we

will see that if the classifier is powerful enough, it may not have as significant impact as it

was expected. The implication of Heisenberg uncertainty principle is more important in the

transition state, which is the moment that the disturbance happens. The classification during

the transition outputs features with high uncertainty. As a result, we expect the majority of

misclassification to be in the transition state, and we will see this in Chapter 5.

Page 48: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 3. Transformation and Feature Extraction 39

6 Summary

In this chapter, we went over the short-time Fourier transform, wavelet transform and

S transform to extract features that will be used for the classifier in the next chapter. We

presented the relationships and comparisons between the transforms and discussed pros and

cons of each transform based on its characterization and efficiency in implementation.

Page 49: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4

Classification with Long Short Term

Memory

1 Background

The previous chapter discussed about how different transform can refine the waveform

to features appropriate for classification. In this chapter, we finally describe the classifier

that will output the types of power quality disturbance. In general, classifiers are associated

with parameters that implicitly or explicitly define thresholds. The explicit parameters are

determined by the engineers, and it can be used when the disturbance is directly related to

the parameter. For example, we could define voltage sag as the time period where voltage

magnitude is greater than 1.1 p.u. The voltage magnitude would be an explicit parameter

classifying voltage sag and swell with a threshold. However, the classification with the explicit

parameters usually involves a tree structure classification. The tree structure will become very

complex and difficult to tune as the number of classes increases. In addition, the modification

of the classification becomes very challenging and may require much time for engineers to

reconfigure the tree structure.

The alternative is to use implicit parameters, and example of this approach is the neural

network. Neural network is one of the most successful techniques that trains a classification

system to find patterns in training data. We will use the supervised learning to figure out the

40

Page 50: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4. Classification with Long Short Term Memory 41

implicit parameters of the classifier, which are the weights and bias of the neural network. The

implicit parameters are determined systematically, and there is no need for manual work in

order to modify the classifier and to reconfigure the parameters.

Neural network was invented in 1950s, inspired by how the brain works. Neural network

distributes the pattern matching across many nodes, which corresponds to the neurons in the

brain. Recently, it made great achievement in numerous fields including speech recognition, ob-

ject recognition, natural language processing, etc [34, 39, 40]. While there are other approaches

such as Support Vector Machine and neural tree, the deep neural network showed most recent

success across many fields. In the next chapter, different architectures of the neural network

will be reviewed.

2 Feedforward Neural Network

Feedforward neural network (FNN) is a basic neural network architecture that has multiple

hidden layers with the direction of the edge from a layer to the layer above. This structure uses

input in the bottom layer and computes layer by layer from the bottom to top where the top

layer is the output. An example of 3 layer feedforward neural network is shown in figure 4.1.

Figure 4.1: Feed forward neural nentwork architecture

The neural network is a relation of the input x ∈ RM×T with output y ∈ RN×T . The true

label will be denoted by y ∈ RN×T . M is the number of features obtained from the feature

extraction algorithm, N is the number of classes, and T is the number of data points, which is

the number of time steps. Individual training data, x(t), is the t th column of x. The output

y(t)i ∈ {0, 1} indicates which class the data belongs to. It also satisfies the condition

∑i y

(t)i = 1.

Page 51: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4. Classification with Long Short Term Memory 42

A FNN with l hidden layers has parameters that are l + 1 weight matrices W = (w0, ..., wl)

and biases b = (b1, ..., bl+1). We will denote the parameters by θ = [W, b]. Given kth layer has

mk units, Wk ∈ Rmk×mk−1 , and bk ∈ Rmk , and m0 = M . The output y = [y(1)... y(T )] can be

computed from the input x using forward propagation with the following algorithm,

Algorithm 5 Forward Propagation for FNN

1: set z0 ← x(t)

2: for i from 1 to l do3: gi ←Wi−1zi−1 + bi4: zi ← σ(gi)5: end for6: Output y(t) ← P (y(t)|zl)

The function σ is an activation function, and P (y(t)|z) is the softmax function, which will

be explained later. The activation could be the sigmoid function, hyper-tangent function,

and rectified linear function, and the sigmoid function will be used in this thesis. The last

hidden layer is connected to the output by the softmax function, which produces a probability

distribution.

The architecture of neural network also allows an efficient algorithm for the derivative of

an objective function with respect to its parameters. Using the chain rule, the computation is

done in the reverse way of the forward propagation, so it is called the backward propagation.

Backward propagation requires the output of each node zi, which can be computed by running

forward propagation before running the backward propagation. The following algorithm shows

the implementation of backward propagation.

Algorithm 6 Backward Propagation for FNN

1: dzl ← dL(y, y)/dzl2: for i from l to 1 do3: dgi ← σ′(xi) · dzi4: dzi−1 ←W T

i−1dgi5: dbi ← dxi6: dWi−1 ← dgiz

Ti−1

7: end for8: Output ∇θL← [dW0, ..., dWl, db1, ...., dbl+1]

We will define the objective function or the loss function L(y, y) in the later section. The

backward propagation will be utilized to set the parameters θ.

Page 52: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4. Classification with Long Short Term Memory 43

2.1 Decision making with Softmax Function

Existing algorithms for power quality monitoring were often equipped with additional layer

of algorithms to interpret the output from the neural network [3]. In our thesis, we propose

softmax function, which is currently the most widely used technique for multi-class classifier. It

only changes the output layer of the neural network and is much easier and simpler to implement

than having additional layer of algorithm. The softmax function is defined as follows,

P (y = j|z) =exp(wTj z)∑k exp(w

Tk z)

. (4.1)

Since P (y = j|z) ∈ (0, 1) and∑

j P (y = j|z) = 1, it satisfies the condition for a probability

distribution. The compact notation, P (y|z) = [P (y = 1|z) ... P (y = 14|z)]T , is used where

N = 14 is the number of disturbance classes in our definition. To determine which class the

signal belongs, the one with maximum probability is chosen. We can reconstruct the output

prediction by

yi =

1 if i = arg maxj P (y = j|x)

0 otherwise

(4.2)

where we assign class based on the maximum probability. Based on the output prediction, the

error rate can be defined as

E =1

T

T∑t=1

y(t) · y(t) (4.3)

where y(t) · y(t) = 1 if the prediction matches the actual label, and 0 otherwise. This is summed

over all the training cases and divided by the total number of training data.

2.2 Training Neural Network

If we consider FNN as the input to output mapping relation, then

y(t) = f(x(t), θ); (4.4)

is our prediction based on the input x with fixed hyper-parameters such as number of hidden

layers and number of nodes at each layer. Our goal is to get our prediction y as close as possible

Page 53: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4. Classification with Long Short Term Memory 44

to its label y by adjusting the parameter, θ = [W, b]. We can formulate this as an optimization

problem where we first define the negative log probability of the target class,

L(y, y) = − 1

T

∑t

y(t) · log(y(t)), (4.5)

where only the log probability of the right class is selected and summed. This is the cross

entropy between the actual output and the prediction. Then, θ is the argument that minimizes

the cross entropy function,

θ = argminθL(y, y) = argmin

θL(f(x, θ), y). (4.6)

In order to find this θ, we use the gradient descent method with the learning rate α,

θk ← θk−1 + αk∇θL (4.7)

where ∇θL is the gradient of the loss function L with respect to θ, and αk is the learning rate

at step k. The gradient, ∇θL, can be calculated efficiently with the backpropgation presented

in Algorithm 6. In addition, we decay the learning rate to yield better performance,

αk = α0 · η(k/K) (4.8)

where α0 is the initial learning rate, η is the decay rate, and K is the decay steps. In this

thesis, α0 = 0.01, η = 0.9 and K = 2000 were used. While the gradient descent method is

straightforward, there are much better methods for this optimization. Adagrad [41], RMSProp

[42] and ADAM [43] are some of the popular algorithms for training neural network. For

recurrent neural network, there are Hessian free Newton’s method as well [44]. In this thesis,

ADAM optimizer was used, and the training was done with the mini-batch of size 128.

Page 54: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4. Classification with Long Short Term Memory 45

2.3 Windowed Feedfoward Neural Network

While feedforward neural network is a basic neural network architecture, there is a limitation

for application in time series data such as power quality disturbance classification. In voltage

and current waveforms, each data point is part of a sequence. FNN has access to only the

instantaneous data, and it is unable to retrieve information from neighbouring data in the

sequence. Heisenberg uncertainty principle states the fundamental limitation on the certainty

of the localization and characterization.

We introduce windowed feedforward neural network (wFNN) in this thesis as an attempt

to address the problem of FNN having access to only instantaneous features. The feedforward

neural network architecture in equation 4.4 can be reformulated to include the past data,

y(t) = f(x(t−w), ... , x(t), θ); (4.9)

where w+1 is the size of the window. The improvement with the windowed feedforward neural

network is that the neural network has access up to w previous data in the sequence. However,

there are still disadvantages with this approach.

1. Windowed neural network does not have any way to access the data prior to the window.

2. The length of input data increases proportional to the size of the window. It may require

more capacity and training time for the neural network.

3. If the length of the window is too large, it will require more memory and computational

power for the monitoring system.

The size of window needs to consider both benefits of accessing information as well as the

complexity and dilution of the feature. In Chapter 5, we will see that this approach is not very

effective in improving the accuracy of FNN. The increased window step increases the size of

input features, and the benefit of window is not realized with fixed capacity of FNN. In the

next section, we introduce the recurrent neural network that elegantly addresses the problems

stated above with feedforward neural network and windowed feedforward neural network.

Page 55: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4. Classification with Long Short Term Memory 46

3 Recurrent Neural Network

Recurrent neural network describes the neural network that has recurrence in its structure.

The recurrence is when the neural network structure yields a directed cycle within the network.

Figure 4.2 (a) shows a simple single node neural network with a directed cycle within the node

h. This cycle allows information to flow within itself, enabling neural network to maintain

information within the network.

Figure 4.2: (a) A simplified single node recurrent neural network (b) Unrolled version of thenetwork through time

Recurrent neural network shares similarities with dynamical system studied in many engi-

neering applications including power systems. The unrolling of the recurrent neural network in

Figure 4.2 makes the similarity more apparent. Figure 4.2 (b) is exactly the same representation

as a linear dynamical system assuming the node activation functions are linear. Non-steady

state power system analysis is already very familiar with this type of model. Both discrete

time-invariant dynamical system and a single neuron RNN can be written as

h[t] = e(h[t− 1], x[t], θ)

y[t] = g(h[t])

(4.10)

where h, x, and θ are state, input and system parameters respectively. The function e is the

activation function of RNN, and the function g is the output function. The computation of

recurrent neural network above goes forward in time, and it can handle continuously sampled

data very elegantly. The forward propagation through time in Algorithm 7 describes how the

computation can be carried out in this architecture.

The pre-activation value u(t) is a linear combination of the input unit and the hidden state

Page 56: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4. Classification with Long Short Term Memory 47

Algorithm 7 Forward Propagation Through Time for RNN

1: for t from 1 to T do2: u(t) ←Whxx

(t) +Whhh(t−1) + bh

3: h(t) ← e(u(t))4: o(t) ←Wohh

(t) + bo5: z(t) ← g(o(t))6: y(t) ← P (y(t)|z(t))7: end for8: Output y

from the previous time step t − 1 plus the bias bh. The pre-activation value for the output ot

is a linear function of the hidden state ht. Similar to the feedforward neural network, finding

the derivative of the network with respect to the parameters, θ = [W,h], can be implemented

efficiently using the chain rule. Backward propagation through time algorithm considers the

unrolled recurrent neural network as a big neural network and goes backward in time to get

the gradient. The implementation of the algorithm can be found in [45, 44]. The training

of the recurrent neural network can be done in the same way as training feedforward neural

network. The computation of the gradient will be replaced by backpropagation through time.

However, the training of general recurrent neural network is much more difficult than FNN due

to vanishing gradient problem [46]. To overcome this issue, we use the gating of the recurrent

unit with Long Short-Term Memory.

4 Long Short-Term Memory

Long Short-Term Memory (LSTM) is a special type of RNN architecture that avoids the

vanishing gradient problem by utilizing memory units. LSTM was first proposed by Hochreiter

and Schmidhuber in 1997 [47]. Gating units controls the flow of the information through

time, which gives LSTM’s ability to memorize important features. LSTM showed successful

applications in speech and handwritten text recognition tasks, and partially-observable Markov

Decision Processes [39, 40, 48], which are similar to power quality classification problem in

terms of recognizing characteristics of a continuous signal. Figure 4.3 shows the architecture of

LSTM unit.

Page 57: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4. Classification with Long Short Term Memory 48

Figure 4.3: Conventional approach for power quality disturbance classifier [3]

At the classification time step t, LSTM computes the following:

at = tanh(Whhht−1 +Whxxt + ba) (4.11a)

it = sigmoid(Wihht−1 +Wixxt + bi) (4.11b)

ft = sigmoid(Wfhht−1 +Wfxxt + bf ) (4.11c)

ot = sigmoid(Wohht−1 +Woxxt + bo) (4.11d)

ct = ft � ct−1 + it � at (4.11e)

ht = ot � tanh(ct) (4.11f)

yt = ht (4.11g)

where � denotes the elementwise multiplication, igt , ft and ot denotes input, output, and forget

gates respectively, and ct is the memory unit.

4.1 Advantages Recurrent Neural Network

As mentioned in the limitation of feedforward neural network, recurrent neural network

address the issue with limited information sharing along the sequence. To illustrate point,

Figure 4.4 shows configurations of feedforward neural network, windowed feedforward neural

network, and LSTM. The figure shows that feedforward neural network has no communication

between the sequence and the windowed feedforward neural network has access to only fixed

neighbours . The long short-term memory is the most elegant form and the system has access

to information from previous times.

Page 58: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 4. Classification with Long Short Term Memory 49

Figure 4.4: (a) Feedforward neural network architecture (b) Windowed feedforward neuralnetwork architecture (c) Long short-term memory architecture

For power quality monitoring purpose, this gives an advantage in classification because

many of the disturbances last for a long time. For example, if there was harmonics in the

waveform in the previous step, the chance that it is continuing is high. LSTM incorporates

these information in making decision for the classification. This temporal information allows

the algorithm to build confidence over time and to hold on to various information such as

existing frequency over time. It also reduces the false positive rate by the forget unit, which

rapidly drains the confidence in classification if those disturbances are no longer observed.

5 Summary

In this chapter, we proposed recurrent neural network as the new classifier for the power

quality disturbance classification. We presented the limitation of feedforward neural network

and discussed how recurrent neural network passes information through time. Particularly, we

use the Long-Short Term Memory, which employs gating of input, output and memory cell to

prevent the gradient from vanishing or blowing up during the training. In the next chapter,

we will show and evaluate the performance of compared to feedforward neural network and

windowed version of it.

Page 59: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5

Results and Case Studies

In this chapter, we synthesize feature extraction algorithms and classifiers to build the

automatic power quality monitoring system. We test the performance of the system with

feature extractions based on different transforms. We also compare the performance of LSTM

and traditional FNN.

1 Data Generation and Training

Power quality disturbance data was generated with the sampling frequency of 3840 Hz,

and the length of the data was 10 million time step, which corresponds to approximately 43

minutes. The effect of changing the sampling frequency will be presented in the later section.

Data generation and feature extraction processes were implemented with MATLAB, and the

classifier was implemented with Tensorflow developed by Google [49]. Tensorflow is an open

source software library that is designed for research in machine learning. The descriptions of

algorithms and parameters were done according to Chapter 4. Since the training data could be

generated as much as needed, the neural network is free from the issue of over-fitting within the

data. However, later in the case studies, we will observe that the monitoring system over-fits to

the generated data, and there could be cases where the classification does not work very well.

This is a fundamental issue with neural network approach based on generated data for power

quality monitoring.

Figure 5.1 shows an example of how the cross entropy drops as the training progresses. The

50

Page 60: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 51

Figure 5.1: Cross entropy of the training and testing data as the training progresses

cross entropy of training and testing data were very close to each other indicating the data set

was large enough to avoid the over-fitting within the generated data.

2 Results

Figure 5.2 shows an example of output from the classifier. LSTM was used with features

from different transforms. The shaded area shows the output of the softmax layer, which can

be interpreted as the probability of the colored disturbance. The square boxes are the true

label where the output is either one or zero for each class. While the cross entropy was the

objective for the minimization, we only show the accuracy throughout this chapter. The cross

entropy includes the confidence of the result, but the accuracy is what matters at the end and

gives a more straightforward interpretation. The accuracy of the monitoring system is defined

as (1− E) where E is the error rate defined in Equation 4.3.

2.1 Comparisons of the transformations

We first show the comparisons of the accuracy between different combinations of feature

extraction and classifier in Table 5.1. Feedforward neural network was built with 3 hidden

layers and 8 hidden units in each layer. The result shows that LSTM increases the performance

by 2.95%, 3.40 % and 3.86% for STFT, DWT, and ST respectively. It shows that the discrete

Page 61: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 52

Figure 5.2: Output of the automatic power quality monitoring system

Page 62: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 53

wavelet transform works the best with LSTM in the overall result. However, it shows that the

short-time Fourier transform is quite comparable. The performance of S transform was not as

good as it was expected since the frequency variation did not work in the classification. This

is likely due to insufficient sampling in frequency, and it will require further investigation.

Table 5.1: Comparison of accuracy of FNN and LSTM in percentage

Feature STFT DWT ST

Classifier FNN LSTM FNN LSTM FNN LSTM

(i) normal 94.44 95.15 91.88 95.01 91.02 92.33

(ii) interruption 88.23 88.86 89.00 90.22 88.24 89.33

(iii) sag 88.31 89.54 91.52 89.78 87.04 87.60

(iv) swell 87.78 85.97 90.20 88.9 84.68 81.28

(v) impulsive 20.66 62.87 82.29 86.54 17.63 50.90

(vi) oscillatory 74.14 79.65 76.59 82.68 74.49 85.55

(vii) dc offset 90.78 90.94 93.54 91.7 83.16 84.38

(viii) harmonics 82.08 81.37 90.67 93.82 89.54 80.65

(ix) notch 75.79 83.43 82.07 83.37 59.49 85.12

(x) flicker 86.61 89.04 74.49 87.06 68.58 70.26

(xi) noise 89.98 93.53 94.04 96.26 88.48 98.09

(xii) frequency variation 82.58 91.33 67.71 84.1 0.00 0.00

(xiii) sag and harmonics 86.89 82.39 86.09 80.63 78.83 86.21

(xiv) swell and harmonics 81.8 82.91 86.10 86.5 85.95 83.70

overall 87.04 89.61 88.05 91.04 79.66 82.74

Since short-time Fourier transform and S transform have disadvantages with computational

complexity and sampling resolution in frequency, we will focus on the discrete wavelet transform

based features for testing the effect of sampling frequency and output frequency. Table 5.2 is

the confusion matrix for discrete wavelet transform with recurrent neural network. The values

are in percentage. It shows that the highest confusion was made between sag & harmonics and

just harmonics with 12.54 % followed by the swell & harmonics and harmonics with 6.96 %.

The impulsive transient was often confused with the notch with 4.47 %. These were, in fact,

the ones that can be difficult for human to distinguish especially when the sag or swell is not

significant.

Page 63: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 54

Table 5.2: Accuracy of LSTM with feature from DWT(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)

(i) 94.07 0.06 0.05 0.16 1.51 0.5 0.88 0.18 0.39 0.45 0.39 1.23 0.03 0.1

(ii) 5.41 89.9 2.15 0 2.03 0.03 0.01 0.01 0.25 0 0.03 0.07 0.12 0

(iii) 6.59 1.33 90.65 0 0.57 0.08 0.01 0.01 0.49 0 0.03 0.18 0.06 0

(iv) 5.95 0 0 90.54 0.69 0.05 0 0 0.59 2.07 0.01 0.02 0 0.09

(v) 6.13 0 0 0.01 89 0.79 0.01 0 3.76 0.09 0.15 0 0 0.06

(vi) 12.84 0.08 0.22 0.25 0.45 81.45 0.41 1.21 0.54 0.57 1.29 0.17 0.11 0.42

(vii) 7.34 0 0 0 0 0 92.66 0 0 0 0 0 0 0

(viii) 5.45 0 0 0 0 0.12 0 93.57 0.01 0 0 0 0.75 0.09

(ix) 16.03 0 0 0.07 1.36 0.13 0.03 0.01 81.93 0.03 0.3 0.11 0 0

(x) 8.21 0 0 4.26 2.2 0.12 0 0.01 0.29 84.63 0.01 0.02 0 0.25

(xi) 1.09 0 0 0 0.21 1.08 0 0 1.25 0 96.36 0 0 0

(xii) 17.63 0.02 0.29 0 2.11 0.42 0.01 0.25 0.11 0 0.04 79.05 0.08 0

(xiii) 5.59 0.32 0 0 0.86 0 0 12.51 0.28 0 0.01 0 80.42 0

(xiv) 5.11 0 0 0.03 0.77 0.09 0 6.28 0.51 0.14 0.01 0 0 87.07

2.2 Effect of the size of window

In this section, we adjust the window size of short-time Fourier transform and discrete

wavelet transform to find the optimal window size. The default settings for sampling frequency

was 3840 Hz, and window size for STFT and DWT were 1 cycles and 0.5 cycles of the funda-

mental frequency respectively.

Window in Discrete Wavelet Transform

Figure 5.3 shows the effect of varying size of the energy window in features from discrete

wavelet transform. Some notable classes are highlighted with the legend. It shows that if

the window size is too small or too big, the accuracy of impulsive transient suffers the most.

Frequency variation and notch also suffer from a small window because the uncertainty in

Figure 5.3: Performance of LSTM with various window sizes of DWT

Page 64: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 55

determining periodicity increases. The ideal size of the window was determined to be about

half the cycle of the fundamental frequency.

Window in Short-Time Fourier Transform

The window size of short-time Fourier transform was varied in this section as shown in

Figure 5.4. It showed that impulsive transients are difficult to distinguish when the window

size is too small. Having a small window size, the fundamental and other frequencies can be

significantly impacted by impulsive transients. The performance slightly improves with the

window size of 3 cycles, but it significantly drops again when the window is too large. Having

a large window loses the localization of the impulsive transient, and the result reflects this

intuition. Voltage sag and swell also decreased with increasing the window size. This clearly

illustrates the limitation of the short-time Fourier transform, which has only the fixed window

size.

Figure 5.4: Performance of LSTM with various window sizes of STFT

Window in windowed Feedforward Neural Network

In this section, the window size of the wFNN network was varied. Although it was expected

that having a larger window in wFNN will increase the accuracy, the result did not show this

to be true. The neural network used for the wFNN was fixed with 3 layers with 8 hidden units

in each layer. Increasing the window size of wFNN increased the size of the input units, and it

Page 65: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 56

rather degraded the performance of the classifier.

Figure 5.5: Performance of wFNN with various window sizes

Sampling frequency

In this section, the sampling frequency was varied in logarithmic scale as shown in Figure

5.6. Due to aliasing, we expect the accuracy to drop as the sampling frequency goes down. As

the figure shows, the high-frequency disturbances such as impulsive, oscillatory and notch were

significantly affected if the sampling frequency was too low.

Figure 5.6: Performance of LSTM with various sampling frequencies

Page 66: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 57

Output frequency

The required frequency of the output may not be as frequent as the sampling frequency. The

output frequency was varied in this section to see if varying the output frequency will have a

similar effect as varying the sampling frequency. The data was obtained by sampling the input

data down so that the same data can be used in testing. The input frequency was fixed to 3840

Hz, and the input-output frequency ratio was varied. The result shows that the performance of

the recurrent neural network is quite constant for different input-output frequency ratio. The

consistent performance in this case study indicates that the training set was large enough to

avoid the over-fitting, and the learning rate was decreased slowly enough with enough duration.

Figure 5.7: Performance of LSTM with various output frequencies

2.3 Distribution of Misclassification

In this section, we investigate and determine when the misclassification occur. We plot

the distribution of time steps away from the transition time for every misclassification. The

transition time is the time that a disturbance starts to occur or end. From the Heisenberg

uncertainty principle, we expect most of the misclassification to occur near the transition time.

In Figure 5.8, the distribution of the entire misclassification is plotted and its distance from

the transition time. It shows that the mode of the distribution is at the transition time and

81.61% of the misclassification occur within a cycle away from the transition time.

Page 67: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 58

Figure 5.8: Overall distribution of misclassification

Figure 5.9: Distribution of misclassification for individual power quality disturbances

Page 68: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 59

In Figure 5.9, we show the distribution for individual disturbance. DC offset, notching,

flicker, and sag/swell and harmonics were disturbances that had misclassification away from

the transition time. This section shows that majority of our misclassification is associated with

the fundamental limitation for power quality disturbance classification.

2.4 Effect of Noise

In this section, we present the effect of noise in classification accuracy. We test both feedfor-

ward neural network and LSTM and who and show the results in Table 5.3. The signal-to-noise

ratio (SNR) of the waveform was 37.1 dB. The feature extraction was based on discrete wavelet

transform with half of cycle window size and 3840 Hz as the sampling frequency. The result

showed that the classification of dc offset, flicker and sag and harmonics drops significantly.

The SNR with respect to those disturbances are low, and as a result, it becomes difficult to

classify them in general. The result shows that the performance of LSTM is consistently higher

than feedforward neural network under noise.

Table 5.3: Comparison of LSTM and FNN in data with noise

Feature FNN LSTM

Classifier Without Noise With Noise Without Noise With Noise

(i) normal 91.88 91.41 95.01 93.31

(ii) interruption 89.00 90.15 90.22 89.28

(iii) sag 91.52 87.24 89.78 90.50

(iv) swell 90.20 91.08 88.90 91.26

(v) impulsive 82.29 82.86 86.54 88.25

(vi) oscillatory 76.59 74.24 82.68 79.75

(vii) dc offset 93.54 19.68 91.70 19.96

(viii) harmonics 90.67 93.32 93.82 92.82

(ix) notch 82.07 82.31 83.37 81.75

(x) flicker 74.49 28.46 87.06 39.49

(xi) noise 94.04 92.60 96.26 95.29

(xii) frequency variation 67.71 56.08 84.10 54.35

(xiii) sag and harmonics 86.09 0.00 80.63 3.43

(xiv) swell and harmonics 86.10 90.19 86.50 84.10

overall 88.05 78.77 91.04 80.08

Page 69: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 60

3 Case Studies

In this section, we show case studies with data from electrical simulation in SimPowerSys-

tems/MATLAB. These data are a more realistic representation of the power quality distur-

bance, and we evaluate the performance and discuss the limitation of the proposed automatic

monitoring system.

Interruption

The interruption was generated by opening the breaker of the system at 0.04 second and

reclosing it at 0.12 second. This was one of the straightforward classification tasks. There

is a high misclassification rate in the first cycle of the transition time, and this is due to the

definition of the peak voltage in Equation 2.2. In order to obtain the peak voltage, it needs to

wait until the new peak is reached.

Page 70: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 61

Figure 5.10: Case study of interruption

Page 71: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 62

Oscillatory Transient

In the second case study, the oscillatory transient was generated by a short disconnection

of RL load. Figure 5.11 shows the waveform as well as the output from RNN and FNN. As

the oscillation phases away, the FNN reduces its confidence with the oscillatory transient. For

RNN, the classifier is constantly confident that it is the oscillatory transient until the end.

Figure 5.11: Case study of oscillatory transient

Page 72: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 63

Voltage Sag

Voltage sag was generated by a three-phase fault on 230 kV line connected to a synchronous

machine. The fault happens at 0.3 seconds and is cleared at 0.5 seconds. The system goes

through the voltage sag and voltage swell after the clearance. The classification result identifies

voltage sag as well as voltage swell afterward.

Figure 5.12: Case study of voltage sag

Page 73: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 64

Harmonics

Harmonics was generated by the three phase AC system connected to a rectifier with a

constant DC load. Although the proposed method has performed well in the previous studies,

this case study shows the limitation of the proposed power quality monitoring system. The

harmonic occurs between 0.4 to 0.6 second with multiple harmonics combined. The output of

the classifier sees the state as normal. This indicates that there is an over-fitting towards the

generated data, and there could be cases where the classifier fails to generalize towards any

Figure 5.13: Case study of harmonics

Page 74: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 5. Results and Case Studies 65

power quality events.

4 Limitations of the Proposed Power Quality Monitor

As we saw in the case study with harmonics, the proposed training approach with generating

data based on mathematical equation has limitations with representing the real power quality

disturbances. Although we propose adding Gaussian noise to the normal waveform as the

generalization approach, it will have certain cases where it fails to generalize to realistic data

correctly. The limitations we saw throughout our thesis are the followings.

1. The data generation process described in Chapter 2 is prone to over-fitting towards the

synthesized data.

2. The performance of the classifier significantly reduces with increase in noise for the feature

extracted based on discrete wavelet transform.

3. Short-time Fourier transform and S transform requires an efficient way for computation

and reducing the redundant representations

The first point can be resolved with more data from the real examples that could be used

for training the neural network. This may require much manual work in order to label the data.

While S transform shows more promising results against the noise [24], the efficient computation

method and identifying frequency samples remains as challenging. For this approach, we may

want to use unsupervised learning methods such as principle component analysis.

5 Summary

In this chapter, we synthesized the data generation, feature extraction, and neural network

classifier from previous chapters to build a working automatic classifier of power quality distur-

bances. We tested the system with different transformation and variables. In the case studies,

we found some limitations of the technique and provided the future research directions.

Page 75: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 6

Conclusion

In this thesis, we proposed the recurrent neural network for power quality disturbance

classification. We compared short-time Fourier transform, wavelet transform, and S transform

for this process. Our technique modifies existing techniques by freeing the process from the

segmentation algorithm. With this approach, we can have the class labeled at each time step

allowing further localization of the event. We propose the classification with Long Short-Term

Memory to improve the accuracy by employing the hidden units to pass information through

time within the classifier.

We examined the performance of LSTM in various settings by adjusting variables such

as sampling frequency, output frequency and parameter settings for the feature extraction

algorithms. At present, the discrete wavelet transform is the most favourable candidate due

to its efficiency in implementation. We tested the system under noise and showed some case

studies with data generated by full electrical simulation using SimPowerSystems. We showed

some of the limitations and discussed potential solutions for them as well.

In addition to improving the power quality monitor, the future direction in this research

is identifying appropriate placements for these monitors. There are some works such as [50]

that attempt to solve this problem, but we need further investigation to identify buses that can

maximize the benefit of the monitoring system to reduce the cost of deployment.

In addition, the idea of an automatic operation of the power grid and the benefit of having

the automatic monitoring system can be investigated further. Figure 6.1 shows the causal

graph of the disturbance, control, and phenomena. This thesis has focused on inferring the

66

Page 76: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Chapter 6. Conclusion 67

phenomena from the voltage and current waveform. With further investigation, the source of

the disturbance can be further characterized and added to the automatic classification scheme.

With the identified source of disturbance, the control action for mitigating the issue can be

automated.

Figure 6.1: Future work

The current shifts in the power system structure and operation put the operators in great

demand. With recent advancements and achievements in artificial intelligence, it may be a

potential solution for operating our future grid reliably and efficiently.

Page 77: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Appendices

68

Page 78: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Appendix A

Parameters for Monte Carlo

Simulation

In this section, we present the parameters used for the Monte Carlo simulation of power

quality disturbance generation. As it was presented earlier, the equation for the data generation

can be generalized to the equation 2.17,

v(t) = α sin(2π(60+∆ff )t)+∑i

βiexp

(−c t− tstart

tend − tstart

)cos(2πfh(t−tstart))+µ(t)+γ(t) (A.1)

for t ∈ [tstart, tend], µ(t) ∼ N(0, σ2) and c = − log( εβ ) where ε = 0.01. The parameters α, ∆f ,

β, fh, σ, γ, and the duration (tend− tstart) are sampled from a uniform distribution. The range

for the uniform distribution is given in table A.1. Note that the notch was periodcally repeated

3 to 6 times in a cycle. The data was generated continuously while the normal waveform was

inserted between every disturbances. It was implemented this way because the probability of

one of the disturbance happening right after another disturbance is very small. The length of

the waveform was N = 107, which gives about 43 minutes of waveform data with the sampling

frequency of 3840 Hz. Table A.2 shows the ratio of the classes in the generated data. The

duration ratio is the ratio of total duration of each disturbance in the generated data. The

occurrence ratio is the ratio of number of occurrence that the disturbance happened.

69

Page 79: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Appendix A. Parameters for Monte Carlo Simulation 70

Table A.1: Parameters for Monte Carlo simulation in power quality data generation

Par

amet

ers

α∆f

βf h

σγ

du

rati

on

min

max

min

max

min

max

min

max

min

max

min

max

min

max

(i)

nor

mal

11

00

00

00

00.0

10

00.0

50.0

5

(ii)

inte

rru

pti

on0

0.1

00

00

00

00

00

0.1

0.2

(iii

)sa

g0.

10.

90

00

00

00

00

00.1

0.2

(iv)

swel

l1.

11.

80

00

00

00

00

00.1

0.2

(v)

imp

uls

ive

11

00

±0.

0.8

00

00

00

1/f s

0.0

1

(vi)

osci

llat

ory

11

00

±0.

0.8

500

5000

00

00

0.0

003

0.0

5

(vii

)d

coff

set

11

00

00

00

00

±0.

001±

0.01

0.1

0.2

(vii

i)h

arm

onic

s1

10

00.1

0.2

180

900

00

00

0.1

0.2

(ix)

not

ch1

10

00.2

50.5

00

00

00

0.1

0.2

(x)

flic

ker

11

00

0.0

50.1

10

25

00

00

0.1

0.2

(xi)

noi

se1

10

00

00

00.0

50.1

00

0.1

0.2

(xii

)fr

eqva

r1

50

00

00

00

00.1

0.2

(xii

i)sa

g&

har

0.1

0.9

00

0.1

0.2

180

900

00

00

0.1

0.2

(xiv

)sw

ell

&h

ar1.

11.

80

00.1

0.2

180

900

00

00

0.1

0.2

Page 80: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Appendix A. Parameters for Monte Carlo Simulation 71

Table A.2: Ratio of disturbance classes in the generated data

Class duration ratio (%) occurrence ratio (%)

(i) normal 44.99 50.00

(ii) interruption 3.97 1.49

(iii) sag 4.31 1.61

(iv) swell 4.65 1.72

(v) impulsive 3.24 24.34

(vi) oscillatory 4.31 8.01

(vii) dc offset 4.49 1.67

(viii) harmonics 4.34 1.61

(ix) notch 4.12 1.56

(x) flicker 4.23 1.56

(xi) noise 4.24 1.57

(xii) frequency variation 4.39 1.63

(xiii) sag and harmonics 4.39 1.62

(xiv) swell and harmonics 4.32 1.61

Page 81: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Appendix B

Results

In this appendix, we present the confusion matrix. The Roman numerals indicates power

quality disturbances and the list is the following: (i) normal, (ii) interruption, (iii) sag, (iv)

swell, (v) impulsive, (vi) oscillatory, (vii) dc offset, (viii) harmonics, (ix) notch, (x) flicker, (xi)

noise, (xii) frequency variation, (xiii) sag and harmonics, (xiv) swell and harmonics. The units

are in percentage indicating how much percentage of disturbance in the row was classified to

the disturbance in the column.

Detail results for Table 5.1

Table B.1: Confusion matrix for FNN from DWT features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)

(i) 91.88 0.09 0.5 0.18 1.34 0.41 1.01 0.22 0.85 0.14 0.33 2.81 0.04 0.18

(ii) 5.58 89 2.63 0 1.83 0.05 0 0.01 0.58 0 0.03 0.03 0.26 0

(iii) 5.3 1.29 91.52 0 0.36 0.06 0 0.01 1.19 0 0.01 0.2 0.05 0

(iv) 6.06 0.08 0.01 90.2 0.64 0.02 0 0 1.12 1.57 0 0.03 0 0.26

(v) 7.21 0.01 0 0 82.29 0.27 0 0 10.17 0 0 0 0 0.04

(vi) 14.34 0.02 0.36 0.36 1.01 76.59 0.7 1.54 1.88 0.48 2.04 0.15 0.08 0.46

(vii) 6.4 0 0 0 0 0 93.54 0 0 0 0 0.06 0 0

(viii) 4.87 0 0.06 0 0 0.01 0.01 90.67 0 0.01 0 0 4.29 0.07

(ix) 15.62 0.03 0 0.04 1.99 0 0.02 0 82.07 0 0.17 0.06 0 0

(x) 14.19 0.02 0.01 8.04 1.94 0.11 0 0.03 0.45 74.49 0 0.03 0 0.67

(xi) 1.11 0.02 0.01 0 0 1.8 0 0 3.02 0 94.04 0 0 0

(xii) 29.4 0.14 0.12 0 1.6 0.2 0.01 0.35 0.37 0 0.04 67.71 0.07 0

(xiii) 4.96 0.18 0.2 0 0.58 0.02 0 7.25 0.67 0 0.01 0.03 86.09 0

(xiv) 4.65 0.05 0.03 0 0.83 0.12 0 7.47 0.7 0.04 0 0 0 86.1

72

Page 82: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Appendix B. Results 73

Table B.2: Confusion matrix for FNN from STFT features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)

(i) 94.44 0.1 0.07 0.04 0.24 1.22 0.2 0.05 0.94 0.36 1.38 0.81 0.13 0.03

(ii) 10.28 88.23 1.08 0 0 0.03 0 0 0.02 0 0.22 0.1 0.03 0

(iii) 9.07 1.66 88.31 0 0.1 0.08 0.03 0 0 0 0.13 0.55 0.06 0

(iv) 9.85 0 0 87.78 0.24 0.4 0 0.01 0.06 1.21 0.02 0.43 0 0

(v) 43.41 0 0 0 20.66 1.59 3.37 0 0.27 2.57 0 28.12 0 0

(vi) 10.01 0 0.3 0.16 5.73 74.14 4.03 0.1 0 0.8 0 4.56 0.13 0.04

(vii) 1.28 0 0.01 0 1.5 1.78 90.78 0 0 0 0 4.64 0 0

(viii) 9.1 0 0 0 0.06 0.86 0 82.08 0 0.07 0 0.42 7.41 0

(ix) 17.59 0 0 0 0.47 0.68 0.05 0 75.79 1.1 2.6 1.72 0 0

(x) 10.12 0 0 0.61 0.9 0.84 0 0 0.76 86.61 0.04 0.1 0.02 0

(xi) 4.65 0 0 0 0.06 0.45 0 0 4.27 0.02 89.98 0.56 0 0

(xii) 7.29 0 0 0 1.84 0.04 0.83 0 0.66 0 6.75 82.58 0.02 0

(xiii) 8.97 0 0 0 0.02 0.14 0 3.68 0.03 0 0.02 0.25 86.89 0

(xiv) 10.12 0 0 0.06 0.16 0.29 0 6.61 0.07 0.32 0.1 0.45 0.03 81.8

Table B.3: Confusion matrix for LSTM from STFT features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)

(i) 95.15 0.12 0.07 0.08 0.4 0.68 0.24 0.4 0.9 0.23 0.97 0.43 0.19 0.14

(ii) 10.91 88.86 0.01 0 0.01 0 0 0 0.01 0 0.01 0.15 0.03 0

(iii) 9.87 0.04 89.54 0 0.18 0.08 0 0 0.06 0 0 0.13 0.1 0

(iv) 10.37 0 0 85.97 0.29 0.36 0 0 0.11 2.78 0 0.08 0.03 0

(v) 23.14 0 0 0.02 62.87 1.99 1.09 0 0.74 1.3 0 8.85 0 0

(vi) 8.59 0 0.05 0.04 6.67 79.65 1.63 0.47 0.03 1.12 0.03 1.71 0.01 0

(vii) 3.91 0 0 0 2.37 1.38 90.94 0.03 0 0 0 1.37 0 0

(viii) 7.67 0 0.02 0.07 0.08 0.27 0 81.37 0.02 0 0.01 0.12 10.39 0

(ix) 13.93 0 0 0 1.77 0.36 0.03 0 83.43 0.33 0.06 0.08 0 0

(x) 9.93 0 0 0.21 0.27 0.08 0 0 0.17 89.04 0.02 0.19 0.04 0.04

(xi) 4.63 0 0 0 0.07 0.4 0.01 0 0.84 0 93.53 0.5 0 0.02

(xii) 3.28 0.01 0 0.01 0.22 0.33 0.09 0 2.84 0.01 1.88 91.33 0.01 0

(xiii) 8.09 0.02 0 0 0.04 0.11 0 9.2 0.04 0 0.02 0.09 82.39 0

(xiv) 9.43 0 0 0.12 0.14 0.12 0 6.8 0.17 0.06 0.03 0.19 0.04 82.91

Table B.4: Confusion matrix for FNN from ST features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)

(i) 91.02 0.09 0.31 1.6 0.7 0.08 1.55 1.94 1.29 0.39 0.43 0 0.2 0.39

(ii) 8.16 88.24 1.09 0 0.02 0 0.09 0.47 0.3 0 0.07 0 1.56 0

(iii) 9.69 0.59 87.04 0 0 0 0.39 0.14 0.59 0 0.48 0 1.08 0

(iv) 11.15 0 0 84.68 1.17 0 0 0.04 0.92 0.02 0.4 0 0.03 1.59

(v) 70.84 0 0 0 17.63 0.03 0 8.11 0 0.43 0 0 0.34 2.62

(vi) 18.39 0.1 0.62 0.95 1.28 74.49 0.73 0.7 1.91 0 0.81 0 0.01 0.01

(vii) 16.84 0 0 0 0 0 83.16 0 0 0 0 0 0 0

(viii) 6.16 0 0 0 0 0 0 89.54 0.58 0 0.02 0 3.7 0

(ix) 29.67 0 0 0 0 0 0 0 59.49 0 10.84 0 0 0

(x) 11.29 0 0 17 1.26 0 0 0.36 0.42 68.58 0.04 0 0 1.05

(xi) 0.93 0.19 0 0 0 0.52 0 0 9.89 0 88.48 0 0 0

(xii) 96.94 0 0.3 0 0.16 0.01 0.61 0.87 0.93 0 0.08 0 0.11 0

(xiii) 5.93 0.29 0.17 0 0 0 0 13.92 0.77 0 0.09 0 78.83 0

(xiv) 4.36 0 0 0.01 1.4 0 0 7.69 0.48 0.04 0.04 0 0.01 85.95

Table B.5: Confusion matrix for LSTM from ST features(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)

(i) 92.33 0.07 0.36 1.55 2 0.19 1.42 0.27 0.69 0.74 0.15 0 0.09 0.14

(ii) 8.21 89.33 2.02 0 0.27 0.01 0.06 0 0.02 0 0 0 0.07 0

(iii) 10.85 0.48 87.6 0 0.08 0 0.36 0 0.15 0 0.03 0 0.45 0

(iv) 12.8 0 0 81.28 0.6 0 0 0 0.52 4.45 0.05 0 0.03 0.26

(v) 44.12 0 0 0 50.9 1.41 0 1.35 0.02 0.58 0 0 0.63 1

(vi) 11.35 0.01 0.35 0.75 0.49 85.55 0.44 0.75 0.04 0.05 0.11 0 0.08 0.03

(vii) 15.62 0 0 0 0 0 84.38 0 0 0 0 0 0 0

(viii) 7.25 0 0 0 0.34 1.09 0 80.65 0.02 0 0.02 0 10.64 0

(ix) 14.77 0 0 0.11 0 0 0 0 85.12 0 0 0 0 0

(x) 10.14 0 0 17.65 1.15 0.25 0 0.15 0.08 70.26 0 0 0 0.31

(xi) 0.73 0 0.02 0.05 0 0.3 0.03 0.04 0.69 0.03 98.09 0 0.02 0

(xii) 97.29 0 0.11 0 0.93 0.02 0.48 0.11 0.68 0 0.07 0 0.31 0

(xiii) 4.77 0.02 0.72 0 0.57 0.04 0 7.49 0.18 0 0 0 86.21 0

(xiv) 5.5 0 0 0.59 1.36 0.2 0 7.25 0.23 1.05 0 0 0.12 83.7

Page 83: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Appendix B. Results 74

Detail results for Table 5.3

37.1 dB

Table B.6: Confusion matrix for LSTM from DWT features with noise(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)

(i) 93.31 0.11 0.14 0.2 1.32 0.42 0.91 0.41 0.38 0.48 0.3 1.74 0.02 0.2

(ii) 5.58 89.28 3.15 0 1.37 0.02 0 0.11 0.3 0.02 0.03 0.14 0 0

(iii) 5.81 1.76 90.5 0 0.19 0.06 0 0.01 0.72 0.33 0.04 0.59 0 0

(iv) 7.34 0.03 0.04 91.26 0.16 0.07 0 0 0.6 0.19 0 0.06 0.01 0.23

(v) 6.99 0 0.04 0 88.25 0.08 0 0.08 4.23 0 0.2 0.01 0.01 0.11

(vi) 14.05 0.08 0.27 0.42 0.17 79.75 0.22 1.41 0.6 0.64 1.52 0.33 0.09 0.44

(vii) 78.96 0 0 0 0 0.01 19.96 0 0 1.07 0 0 0 0

(viii) 6.53 0 0 0.01 0 0.23 0 92.82 0 0 0 0.02 0.01 0.38

(ix) 15 0 0.13 0.29 1.16 0.04 0.19 0 81.75 0.02 1.35 0.07 0 0

(x) 54.74 0 0.2 0.71 0 0.04 4.53 0 0 39.49 0 0.29 0 0

(xi) 1.4 0 0.01 0 0.09 0.97 0 0 2.18 0 95.29 0.06 0 0

(xii) 43.32 0.04 0.07 0 1.13 0.07 0.05 0.37 0.31 0.2 0.05 54.35 0 0.02

(xiii) 5.27 0 0.01 0.02 0.21 0.09 0 5.52 0.34 0 0 0.01 3.43 85.11

(xiv) 4.94 0 0.02 0.03 0.11 0.02 0 7.71 0.39 0 0 0.01 2.67 84.1

Table B.7: Confusion matrix for FNN from DWT features with noise(i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv)

(i) 91.41 0.12 0.11 0.44 1.53 0.42 0.71 0.7 0.62 0.46 0.43 2.72 0 0.33

(ii) 4.63 90.15 1.45 0 1.8 0.01 0 0.14 0.76 0.05 0.04 0.96 0 0

(iii) 5.49 4.58 87.24 0 0.52 0.05 0 0.08 0.7 0.16 0.01 1.18 0 0

(iv) 6.62 0 0 91.08 0.5 0.04 0 0 0.91 0.3 0 0.28 0 0.26

(v) 8.99 0.21 0 0 82.86 0.13 0 0.09 7.64 0 0.02 0.01 0 0.06

(vi) 17.23 0 0.16 0.44 0.53 74.24 0.28 2 0.64 1.23 2.28 0.46 0 0.52

(vii) 79.03 0 0 0 0 0 19.68 0 0 1.28 0 0.02 0 0

(viii) 5.81 0 0 0.02 0.03 0 0 93.32 0 0.04 0 0.25 0 0.53

(ix) 14.43 0 0 0.07 1.85 0.11 0 0 82.31 0.01 0.89 0.32 0 0.01

(x) 63.55 0 2.44 0.26 0 0 4.85 0 0 28.46 0 0.44 0 0

(xi) 1.13 0 0 0 0 2.09 0 0 4.18 0 92.6 0 0 0

(xii) 40.78 0.01 0.02 0 1.84 0.13 0.06 0.49 0.44 0.09 0.05 56.08 0 0

(xiii) 4.26 0 0 0.18 0.87 0.02 0 6.89 0.55 0 0.01 0.05 0 87.17

(xiv) 4.6 0 0 0.05 1.13 0.02 0 3.2 0.74 0 0 0.05 0 90.19

Page 84: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Bibliography

[1] R. C. Dugan, M. F. McGranaghan, and H. W. Beaty, “Electrical power systems quality,”

New York, NY: McGraw-Hill,— c1996, vol. 1, 1996.

[2] “IEEE recommended practice for monitoring electric power quality,” IEEE Std 1159-2009

(Revision of IEEE Std 1159-1995), pp. c1–81, June 2009.

[3] O. P. Mahela, A. G. Shaik, and N. Gupta, “A critical review of detection and classification

of power quality events,” Renewable and Sustainable Energy Reviews, vol. 41, pp. 495–505,

2015.

[4] S. Ventosa, C. Simon, M. Schimmel, J. J. Danobeitia, and A. Manuel, “The S -transform

from a wavelet point of view,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp.

2771–2780, July 2008.

[5] F. Katiraei, R. Iravani, N. Hatziargyriou, and A. Dimeas, “Microgrids management,” IEEE

Power and Energy Magazine, vol. 6, no. 3, pp. 54–65, May 2008.

[6] C. Gonzalez, J. Geuns, S. Weckx, T. Wijnhoven, P. Vingerhoets, T. De Rybel, and

J. Driesen, “LV distribution network feeders in belgium and power quality issues due to

increasing PV penetration levels,” in Innovative Smart Grid Technologies (ISGT Europe),

2012 3rd IEEE PES International Conference and Exhibition on. IEEE, 2012, pp. 1–8.

[7] B. Rahmani, W. Li, and G. Liu, “An advanced universal power quality conditioning system

and MPPT method for grid integration of photovoltaic systems,” International Journal of

Electrical Power & Energy Systems, vol. 69, pp. 76 – 84, 2015.

75

Page 85: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Bibliography 76

[8] C. G. Hou, S. C. Lin, S. T. Su, and W. H. Chung, “Fault tolerant quickest detection

for power quality events in smart grid ami networks,” in 2015 International Symposium

on Intelligent Signal Processing and Communication Systems (ISPACS), Nov 2015, pp.

159–163.

[9] IEEE, “Electric signatures of power equipment failures,” IEEE, Tech. Rep.

[10] M. Sumner, A. Abusorrah, D. Thomas, and P. Zanchetta, “Real time parameter estima-

tion for power quality control and intelligent protection of grid-connected power electronic

converters,” Smart Grid, IEEE Transactions on, vol. 5, no. 4, pp. 1602–1607, 2014.

[11] R. P. Bingham, “Recent advancements in monitoring the quality of the supply,” in Power

Engineering Society Summer Meeting, 2001, vol. 2, July 2001, pp. 1106–1109 vol.2.

[12] “Ieee recommended practice and requirements for harmonic control in electric power sys-

tems,” IEEE Std 519-2014 (Revision of IEEE Std 519-1992), pp. 1–29, June 2014.

[13] M. H. Bollen, Understanding power quality problems. IEEE press New York, 2000, vol. 3.

[14] R. Kumar, B. Singh, D. T. Shahani, A. Chandra, and K. Al-Haddad, “Recognition of

power-quality disturbances using S-transform-based ANN classifier and rule-based decision

tree,” IEEE Transactions on Industry Applications, vol. 51, no. 2, pp. 1249–1258, March

2015.

[15] S. Alshahrani, M. Abbod, and B. Alamri, “Detection and classification of power quality

events based on wavelet transform and artificial neural networks for smart grids,” in 2015

Saudi Arabia Smart Grid (SASG), 2015, pp. 1–6.

[16] A. K. Ghosh and D. L. Lubkeman, “The classification of power system disturbance wave-

forms using a neural network approach,” Power Delivery, IEEE Transactions on, vol. 10,

no. 1, pp. 109–115, 1995.

[17] S. Santoso, E. J. Powers, W. M. Grady, and P. Hofmann, “Power quality assessment via

wavelet transform analysis,” Power Delivery, IEEE Transactions on, vol. 11, no. 2, pp.

924–930, 1996.

Page 86: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Bibliography 77

[18] B. Perunicic, M. Mallini, Z. Wang, and Y. Liu, “Power quality disturbance detection and

classification using wavelets and artificial neural networks,” in Harmonics and Quality of

Power Proceedings, 1998. Proceedings. 8th International Conference On, vol. 1, Oct 1998,

pp. 77–82 vol.1.

[19] A. Gaouda, M. Salama, M. Sultan, A. Chikhani et al., “Power quality detection and

classification using wavelet-multiresolution signal decomposition,” IEEE Transactions on

Power Delivery, vol. 14, no. 4, pp. 1469–1476, 1999.

[20] S. Santoso, E. J. Powers, W. M. Grady, and A. C. Parsons, “Power quality disturbance

waveform recognition using wavelet-based neural classifier. I. Theoretical foundation,”

Power Delivery, IEEE Transactions on, vol. 15, no. 1, pp. 222–228, 2000.

[21] D. Borras, M. Castilla, N. Moreno, and J. C. Montano, “Wavelet and neural structure:

a new tool for diagnostic of power system disturbances,” IEEE Transactions on Industry

Applications, vol. 37, no. 1, pp. 184–190, Jan 2001.

[22] Z.-L. Gaing, “Wavelet-based neural network for power disturbance recognition and classi-

fication,” Power Delivery, IEEE Transactions on, vol. 19, no. 4, pp. 1560–1568, 2004.

[23] H. He and J. A. Starzyk, “A self-organizing learning array system for power quality clas-

sification based on wavelet transform,” Power Delivery, IEEE Transactions on, vol. 21,

no. 1, pp. 286–295, 2006.

[24] I. W. Lee and P. K. Dash, “S-transform-based intelligent system for classification of power

quality disturbance signals,” Industrial Electronics, IEEE Transactions on, vol. 50, no. 4,

pp. 800–805, 2003.

[25] S. Mishra, C. Bhende, and B. Panigrahi, “Detection and classification of power quality

disturbances using S-transform and probabilistic neural network,” Power Delivery, IEEE

Transactions on, vol. 23, no. 1, pp. 280–287, 2008.

[26] C. Bhende, S. Mishra, and B. Panigrahi, “Detection and classification of power quality

disturbances using S-transform and modular neural network,” Electric Power Systems

Research, vol. 78, no. 1, pp. 122 – 128, 2008.

Page 87: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Bibliography 78

[27] M. Uyar, S. Yildirim, and M. T. Gencoglu, “An expert system based on S-transform and

neural network for automatic classification of power quality disturbances,” Expert Systems

with Applications, vol. 36, no. 3, Part 2, pp. 5962 – 5975, 2009.

[28] P. K. Ray, N. Kishor, and S. R. Mohanty, “Islanding and power quality disturbance detec-

tion in grid-connected hybrid power system using wavelet and S-transform,” Smart Grid,

IEEE Transactions on, vol. 3, no. 3, pp. 1082–1094, 2012.

[29] B. Biswal, M. Biswal, S. Mishra, and R. Jalaja, “Automatic classification of power quality

events using balanced neural tree,” Industrial Electronics, IEEE Transactions on, vol. 61,

no. 1, pp. 521–530, 2014.

[30] M. Valtierra-Rodriguez, R. de Jesus Romero-Troncoso, R. A. Osornio-Rios, and A. Garcia-

Perez, “Detection and classification of single and combined power quality disturbances

using neural networks,” Industrial Electronics, IEEE Transactions on, vol. 61, no. 5, pp.

2473–2482, 2014.

[31] M. K. Saini and R. Kapoor, “Classification of power quality events–a review,” International

Journal of Electrical Power & Energy Systems, vol. 43, no. 1, pp. 11–19, 2012.

[32] P. Sebastian and P. A. Da, “A neural network based power quality signal classification

system using wavelet energy distribution,” in Advancements in Power and Energy (TAP

Energy), 2015 International Conference on, June 2015, pp. 199–204.

[33] S. Upadhyaya and S. Mohanty, “Localization and classification of power quality distur-

bances using maximal overlap discrete wavelet transform and data mining based classi-

fiers,” IFAC-PapersOnLine, vol. 49, no. 1, pp. 437 – 442, 2016, 4th IFAC Conference on

Advances in Control and Optimization of Dynamical Systems ACODS 2016Tiruchirappalli,

India, 1-5 February 2016.

[34] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp.

436–444, 2015.

[35] Y. B. Ian Goodfellow and A. Courville, “Deep learning,” 2016, book in preparation for

MIT Press. [Online]. Available: http://goodfeli.github.io/dlbook/

Page 88: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Bibliography 79

[36] C. S. Burrus, R. A. Gopinath, and H. Guo, Introduction to wavelets and wavelet transforms

: a primer. Upper Saddle River, N.J. : Prentice Hall, 1998.

[37] R. G. Stockwell, L. Mansinha, and R. P. Lowe, “Localization of the complex spectrum:

the S transform,” IEEE Transactions on Signal Processing, vol. 44, no. 4, pp. 998–1001,

Apr 1996.

[38] R. Stockwell, “A basis for efficient representation of the S-transform,” Digital Signal Pro-

cessing, vol. 17, no. 1, pp. 371 – 393, 2007.

[39] A. Graves, M. Liwicki, S. Fernndez, R. Bertolami, H. Bunke, and J. Schmidhuber, “A novel

connectionist system for unconstrained handwriting recognition,” IEEE Transactions on

Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, May 2009.

[40] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural

networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International

Conference on. IEEE, 2013, pp. 6645–6649.

[41] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning

and stochastic optimization,” Journal of Machine Learning Research, vol. 12, no. Jul, pp.

2121–2159, 2011.

[42] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running

average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning,

vol. 4, no. 2, 2012.

[43] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint

arXiv:1412.6980, 2014.

[44] I. Sutskever, “Training recurrent neural networks,” Ph.D. dissertation, University of

Toronto, 2013.

[45] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings

of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.

Page 89: by Dong Chan Lee - University of Toronto T-Space › bitstream › 1807 › ... · Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer

Bibliography 80

[46] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural

networks,” arXiv preprint arXiv:1211.5063, 2012.

[47] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9,

no. 8, pp. 1735–1780, 1997.

[48] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural

networks,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani,

M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates,

Inc., 2014, pp. 3104–3112. [Online]. Available: http://papers.nips.cc/paper/5346-

sequence-to-sequence-learning-with-neural-networks.pdf

[49] M. Abadi and et. al., “TensorFlow: Large-scale machine learning on heteroge-

neous systems,” 2015, software available from tensorflow.org. [Online]. Available:

http://tensorflow.org/

[50] S. Ali, K. Weston, D. Marinakis, and K. Wu, “Intelligent meter placement for power quality

estimation in smart grid,” in Smart Grid Communications (SmartGridComm), 2013 IEEE

International Conference on. IEEE, 2013, pp. 546–551.