ieee trans. submitted for review 1 a deep learning based

13
IEEE TRANS. SUBMITTED FOR REVIEW 1 A Deep Learning based Detection Method for Combined Integrity-Availability Cyber Attacks in Power System Wangkun Xu 1 and Fei Teng 1 Member, IEEE Abstract—As one of the largest and most complex systems on earth, power grid (PG) operation and control have stepped forward as a compound analysis on both physical and cyber layers which makes it vulnerable to assaults from economic and security considerations. A new type of attack, namely as combined data Integrity-Availability attack, has been recently proposed, where the attackers can simultaneously manipulate and blind some measurements on SCADA system to mislead the control operation and keep stealthy. Compared with traditional FDIAs, this combined attack can further complicate and vitiate the model-based detection mechanism. To detect such attack, this paper proposes a novel random denoising LSTM-AE (LSTM- RDAE) framework, where the spatial-temporal correlations of measurements can be explicitly captured and the unavailable data is countered by the random dropout layer. The proposed algorithm is evaluated and the performance is verified on a standard IEEE 118-bus system under various unseen attack attempts. Index Terms—Cyber-Physical System, FDI Attack, Availabil- ity Attack, Anomaly Detection, LSTM-Autoencoder, Denoising Autoencoder. I. I NTRODUCTION A. Background T HE emerging application of information and communi- cation techniques (ICT) on power system automation, monitoring, and control has reformed the modern power gird into a complex cyber-physical system (CPS) [1]. Many advanced and artificial solutions are proposed to embrace this new trend at all cyber, physical, network, and communication layers to allow two-way communication between facilities and costumers. However, this new opportunity also raises challenges on secure and resilient operation of cyber-physical power system under cyberattacks [2], [3]. As a result, the national institute of standards and technology (NIST) issued the first ever report guiding the smart grid security awareness in 2014 [4]. Several cyber-physical attacks have been re- ported causing severe economic and human life losses, among which the incidence taking place in Ukraine may be the most recognizable one. In December 2015, more than 200k customers are influenced by the half-day grid blackout in Kiev. The following-up investigation revealed that the Supervisory Control and Data Acquisition (SCADA) protocol was intruded by the hackers [5]. In principle, the control center retrieves the operational states of individual buses by the measurements observed from 1 The authors are with the Department of Electrical and Electronic Engi- neering, Imperial College London, London, SW7 2AZ, U.K. the Remote Terminal Unit (RTU) and/or Phasor Measurement Unit (PMU) in every few seconds to minutes collected by the SCADA. The estimated states are further used in energy management systems (EMS) for contingency analysis, auto- matic generation control (AGC), and load forecasting, etc [6]. Cyber-physical attacks can be classified according to their unique target and delivery methodologies [7], such as denial- of-service (DoS) attack in the network and communication layers where the the information flow packets are either jammed or lost [8]. In particular, most of the recent researches focus on the False Data Injection Attack (FDIA) as it can be implemented on all the four layers in cyber-physical power systems and may remain unnoticeable while causing severe economic and stability deterioration. FDIA is defined as a direct manipulation on the measurements with the purpose of deviating the estimated state that can mislead the EMS’s economic and stable operations [9], [10]. In the literature, the concept of stealth FDIA on DC state estimation is firstly verified in [11] where the attacker can bypass the model- based Bad Data Detection (BDD) with limited resources, even under protected RTUs. Later, FDIA on AC state estimation is proposed in [12] along with the vulnerability analysis. FDI for different attack purposes are reviewed in [3] where the corresponding impacts and detection frameworks are partially examined. Broadly speaking, three hierarchical approaches can be implemented to counter the cyberphysical attacks, namely protection, detection, and mitigation [13]. The protection is an attack prevention mechanism that can reject common attack attempts. However, it is too hard and costly to protect all the RTUs in the grid and a complete attack rejection is unrealistic [14], The second stage is attack detection where model-based and data-driven algorithms are two prevalent methods [7]. The third approach is attack mitigation, i.e. the real operational state should be retrieved once an attack alarm is raised in the detection stage [15]. As the dynamic power system model is hard to track and construct, the model-based detection may easily fail on unseen attack attempts and suffer from detection delay. Consequently, this paper focuses on the detection of cyber attacks using data-driven method while we leave the discussions on the model-based methods in [7]. B. Data-Driven Attack Detection CPS attack detection has been considered as a subset of anomaly or outlier detection problem that can be tracked arXiv:2011.01816v1 [eess.SY] 3 Nov 2020

Upload: others

Post on 01-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANS. SUBMITTED FOR REVIEW 1

A Deep Learning based Detection Method forCombined Integrity-Availability Cyber Attacks in

Power SystemWangkun Xu1 and Fei Teng1 Member, IEEE

Abstract—As one of the largest and most complex systemson earth, power grid (PG) operation and control have steppedforward as a compound analysis on both physical and cyberlayers which makes it vulnerable to assaults from economicand security considerations. A new type of attack, namely ascombined data Integrity-Availability attack, has been recentlyproposed, where the attackers can simultaneously manipulateand blind some measurements on SCADA system to mislead thecontrol operation and keep stealthy. Compared with traditionalFDIAs, this combined attack can further complicate and vitiatethe model-based detection mechanism. To detect such attack, thispaper proposes a novel random denoising LSTM-AE (LSTM-RDAE) framework, where the spatial-temporal correlations ofmeasurements can be explicitly captured and the unavailabledata is countered by the random dropout layer. The proposedalgorithm is evaluated and the performance is verified on astandard IEEE 118-bus system under various unseen attackattempts.

Index Terms—Cyber-Physical System, FDI Attack, Availabil-ity Attack, Anomaly Detection, LSTM-Autoencoder, DenoisingAutoencoder.

I. INTRODUCTION

A. Background

THE emerging application of information and communi-cation techniques (ICT) on power system automation,

monitoring, and control has reformed the modern powergird into a complex cyber-physical system (CPS) [1]. Manyadvanced and artificial solutions are proposed to embrace thisnew trend at all cyber, physical, network, and communicationlayers to allow two-way communication between facilitiesand costumers. However, this new opportunity also raiseschallenges on secure and resilient operation of cyber-physicalpower system under cyberattacks [2], [3]. As a result, thenational institute of standards and technology (NIST) issuedthe first ever report guiding the smart grid security awarenessin 2014 [4]. Several cyber-physical attacks have been re-ported causing severe economic and human life losses, amongwhich the incidence taking place in Ukraine may be themost recognizable one. In December 2015, more than 200kcustomers are influenced by the half-day grid blackout in Kiev.The following-up investigation revealed that the SupervisoryControl and Data Acquisition (SCADA) protocol was intrudedby the hackers [5].

In principle, the control center retrieves the operationalstates of individual buses by the measurements observed from

1The authors are with the Department of Electrical and Electronic Engi-neering, Imperial College London, London, SW7 2AZ, U.K.

the Remote Terminal Unit (RTU) and/or Phasor MeasurementUnit (PMU) in every few seconds to minutes collected bythe SCADA. The estimated states are further used in energymanagement systems (EMS) for contingency analysis, auto-matic generation control (AGC), and load forecasting, etc [6].Cyber-physical attacks can be classified according to theirunique target and delivery methodologies [7], such as denial-of-service (DoS) attack in the network and communicationlayers where the the information flow packets are eitherjammed or lost [8]. In particular, most of the recent researchesfocus on the False Data Injection Attack (FDIA) as it can beimplemented on all the four layers in cyber-physical powersystems and may remain unnoticeable while causing severeeconomic and stability deterioration. FDIA is defined as adirect manipulation on the measurements with the purposeof deviating the estimated state that can mislead the EMS’seconomic and stable operations [9], [10]. In the literature,the concept of stealth FDIA on DC state estimation is firstlyverified in [11] where the attacker can bypass the model-based Bad Data Detection (BDD) with limited resources, evenunder protected RTUs. Later, FDIA on AC state estimation isproposed in [12] along with the vulnerability analysis. FDIfor different attack purposes are reviewed in [3] where thecorresponding impacts and detection frameworks are partiallyexamined.

Broadly speaking, three hierarchical approaches can beimplemented to counter the cyberphysical attacks, namelyprotection, detection, and mitigation [13]. The protection isan attack prevention mechanism that can reject common attackattempts. However, it is too hard and costly to protect all theRTUs in the grid and a complete attack rejection is unrealistic[14], The second stage is attack detection where model-basedand data-driven algorithms are two prevalent methods [7]. Thethird approach is attack mitigation, i.e. the real operationalstate should be retrieved once an attack alarm is raised in thedetection stage [15]. As the dynamic power system model ishard to track and construct, the model-based detection mayeasily fail on unseen attack attempts and suffer from detectiondelay. Consequently, this paper focuses on the detection ofcyber attacks using data-driven method while we leave thediscussions on the model-based methods in [7].

B. Data-Driven Attack Detection

CPS attack detection has been considered as a subset ofanomaly or outlier detection problem that can be tracked

arX

iv:2

011.

0181

6v1

[ee

ss.S

Y]

3 N

ov 2

020

IEEE TRANS. SUBMITTED FOR REVIEW 2

decades ago [16]. Meanwhile, as the improvement of thenumber and resolution of the grid measurements, data-drivenmethod, such as statistical, machine, and deep learning al-gorithms, is armed to simulate the complex dynamic anduncertainty of the system behaviour with few knowledge onthe model. Here, we summarize the up-to-date challengesinvolved in data-driven power system anomaly detection andits potential solutions in the literature [7], [17]:

(1) High dimensionality of measurements and features:The dimension of the measured data increases as the topo-

logical complexity increases and the anomalies can becomehidden and unnoticeable. For example, bus-118 system mayhave more than 300 attributes for one sample under DC stateestimation (DCSE) while in AC, this number can be boosted to1k. Due to the static nature of the power system operation andsparse topology, directly changing on the estimated states canonly lead to limited measurement variations. Consequently,many statistical and machine learning methods, such as Sup-port Vector Machine (SVM) [18], Naive Bayesian Classifier(NBC) [19], Decision Trees (DT) [20], and k-NNs [21] usuallyfail to fit in large system with high dimensionality. See [16] formore details on these methods. As a result, order reduction orexpert feature engineering is required in advance. However,there is lack of guarantee on the feature performance foranomaly detection purpose [17].

(2) Absence of positive measurement samples:Power system operates and runs under the optimal con-

dition in most of the time which leads to the lack of realmeasured anomalies during training. To enrich the balance-labelled dataset, positive measurements are generated artifi-cially in the supervised binary classification algorithms [15],[18]. However, in real-time, the attacks can be heterogeneousand there always exists uncertain attack scenarios such thatthe trained model fails to converge. Due to the rarity ofthe positive samples and the high cost of collecting largenumber of labelled data, supervised learning is not applica-ble in practice [17], [22]. Generative model, unsupervised,and weakly supervised learnings are thus proposed. In thegenerative model [23], new attack signals can be generatedaccording to the probabilistic distribution of the known attacksamples where in unsupervised and semi-supervised learnings,the model is mainly trained to describe the normalities andany deviations on the model can be considered as the outliers.Ref. [13] implements the isolation forest (IF) for unsuper-vised anomalous measurements detection. The normality isexamined by semi-supervised mixture Gaussian distributionmodels in [24] but the anomaly scores are determined by theknown attack patterns which weakens its generality. Semi-supervised Support Vector Machine (S3VM) is considered in[25] under the assumption that the difference between thenumber of normal and abnormal samples is not significant. Thesimulation results of these methods show a higher detectionaccuracy than common supervised techniques, such as SVMand k-NNs, to some extents. However, they usually sufferfrom high false alarm rate [17] and roundabout training target[22]. One possible sequential countermeasure is the so-calledMoving Target Defence (MTD) where the grid topology andparameters are altered by the control center to defend the

potential FDIAs [26]. However, recent researches indicate thatthe MTD has several limitations [27] and can be circumventedusing data-driven method [28].

(3) Mix of spatial and temporal correlations:The RTU measurements can be considered as a multivariate

time series [29] and the anomalies involved can be point-wise,contextual, and cumulative [16]. In detail, point anomalies arethe isolated points that are distinguishable from the majorityof the data, e.g. a large but stealthy FDI attack. Contextualanomalies can only be detected in a certain temporal envi-ronment. An extreme example would be the replay attack[30]. A sub-sequence of the data can be regarded as collectiveanomalies if it is distinctive to the other instances, e.g. smallbut stealthy oscillations on the measurements. Isolated detec-tion on a single measurement mainly focuses on the spatialcomplexity which is likely to fail on some occasions.

Apart from the spatial complexity, it has been reported thatthe temporal correlations on loads and distributed generators(DG) can reflect on the grid operation and influence on theaccuracy of state estimation [31]. As a result, prediction-aideddetection is considered in [32] and [15] where the estimatedmeasurement error is found by the predicted states. It shouldbe trivially noticed that, prediction-based detection can benefiton anomaly mitigation by replacing the contamination with theestimated value. However, as the attacks are usually sparse inthe power grid, disregarding all the latest information may de-teriorate the detection performance. Another research directionleads to the density-based methodology. Small but successiveattacks have been considered in [33] where data transformationand Kullback–Leibler divergence are implemented to capturethe measurement temporal variations while in [34], the slopesof the adjacent measurements are classified by decision treesand spectral clustering. However, many of the aforementionedmethods only consider the one-step dependence of the mea-surement dynamics while few can explicitly investigate thespatiotemporal complexity and report the detection accuracyunder various unseen attack attempts.

(4) Robust detection against combined attacks:To facilitate FDIA, the attacker may simultaneously blind a

certain part of the measurements by launching an availabilityor so-called DoS attack, which improves the assault furtivenessand relieves the attack efforts [35]–[37]. From the resilientpoint of view, this complicates the detection where the missingdata can be either part of the combined integrity-availabilityattack or caused by the measurement malfunctioning. As aresult, static detection algorithm cannot be generalized dueto the destruction of the spatial relationships. The goal ofthe combined attack detection is to give robust estimationon the observed measurements under varying combined at-tack strengths. Although the vulnerability analysis has beeninvestigated on the combined attacks, few attention has beenpaid to the practical detection and mitigation solutions.

C. Deep Autoencoder-Based Anomaly Detection

In recent years, deep anomaly detection (DNN) has beenintroduced to numerous practical fields, such as fraud detec-tion, health record surveillance, and network intrusion, etc.

IEEE TRANS. SUBMITTED FOR REVIEW 3

[22]. It also leads to a superior detection performance thanthe aforementioned traditional methods in power system, suchas [15], [38], and [39]. Among all of the DNN applications,deep autoencoder (AE) and its variations have been compre-hensively used for feature extraction, dimension reduction,and network pretraining in many machine learning tasks dueto its straightforward network structure, unlabelled trainingrequirement, robust, and high-dimensional nonlinear mappingproperties [40]. As a result, AE-based anomaly detection hasbecome a overwhelming cornerstone.

For the attack detection in power systems, point-wiseanomaly detection is considered in [41] while a probabilisticinference is further added by [42]. The AE detection methodexplicitly learns the spatial relationships in the measurementswhile neglects the temporal correlations. To retrieve the tempo-ral correlations cross the normalities, recurrent neural network(RNN) with different realizations can be considered to replacethe dense AE layers. For instance, ref. [43] demonstratesthe effectiveness of RNN-AE in mislabelling attack detectionoccurring in the electronic health record system. In addition,the previous applications of deep autoencoder-based detectionalgorithm assumes the full availability of measurement data,which does not hold in the case of combined integrity andavailability attack.

In this context, this paper proposes a new LSTM-RDAEbased detection algorithm with the following contributions:• Semi-supervised autoencoder is applied to only learn the

normalities of the power system measurement (challenge2) where the deep network structure can automaticallyfit on the high-dimensional and nonlinear measurementswithout prior model knowledge (challenge 1).

• We introduce the recurrent LSTM layer in the encoder-decoder network to explicitly extract the temporal cor-relations among successive measurements (challenge 3).To cope with the availability attacks and improve therobustness, complete random dropout is applied on eachsample per epoch (challenge 4).

• We simulate the model by using real-time load profiles onIEEE bus-118. Our method allows a real-time anomalydetection with moderate computational burden. We alsoconsider a broad unseen attack types by consideringboth the attacker’s effort and different attack strength.The simulation results depict that the proposed LSTM-RDAE can outperform the state-of-art semi-/unsupervisedmachine learning and deep learning methods.

The reminder of the paper is organized as follows. Powersystem model and combined integrity-availability attacks areformulated in Section II. Section III introduces the proposedLSTM-RDAE for combined attack detection. The simulationset-up and results are summarised in Section IV while thisarticle ends at a conclusion in section V.

II. PROBLEM FORMULATION

A. State Estimation

The power grid can be roughly regarded as a graphicnetwork. The generator and demand consist as the nodal busesand the electric power can flow along the transmission lines

(edges). A centralized control center can monitor and controlthe gird operations by collecting nodal and line measurementsfrom RTUs and PMUs. In a large power system, the mainobjective of the control center is to find the optimal real andreactive powers of each generator such that all the demandsare met under the minimal cost while the system constraintsare not violated. This generation planning is called as optimalpower flow (OPF) [44]. In this article, only the economicdispatch is considered and it is assumed that all the nodaland line measurements are available.

Static state estimation (SE) functions as the core to maintainnormal and safe operations in power system. Given redundantmeasurements, the voltage phasors are estimated through thesystem equations. Let the system state as x ∈ Rn and themeasurement as z ∈ Rm with m > n, the system equationscan be represented as [6]:

z = h(x) + e (1)

where eT = [e1, e2, . . . , em] is the independent zero-meanmeasurement errors and define I = {i|i = 1, 2, · · · ,m} asthe measurement index set. The weighted least square (WLS)algorithm is usually applied by minimising the following cost:

J(x) = [z − h(x)]TE−1[z − h(x)] (2)

where E ∈ Rm×m is the covariance matrix of the i.i.d.measurement noises and x is the estimated state. Iterativeoptimization is required to solve (2) and a DC approximationis commonly implemented according to the linear systemobservation equation:

z = Hx+ e (3)

where H ∈ Rm×n is the observation matrix. In DC-SE,the system state only consists of voltage phases: x =[θ2, θ3, · · · , θn] where θ1 = 0 is the reference state. Theanalytic WLS solution of (2) under DC assumption gives that:

x = (HTE−1H)−1HTE−1z (4)

B. Stealthy False Data Injection Attacks

One of the essential functions of a SE is to detect, iden-tify, and eliminate any measurement errors. Traditionally, themeasurement errors can be caused by sensor inaccuracy andmalfunctioning. With the emerging of cyber-physical layers,the SE and BDD are also equipped to detect possible cyberattacks. Firstly, the estimation residual is found as:

r(z) = ||z −Hx|| (5)

If (5) is normally distributed on different z, a Chi-square χ2

algorithm can be applied with m−n freedoms. More directly,a heuristic threshold τ can be determined according to thenormal measurements and the BDD is given as:

D1(z) =

{1 if r(z) ≥ τ10 otherwise

(6)

where an attack alarm is indicated by 1.

IEEE TRANS. SUBMITTED FOR REVIEW 4

In this article, an FDIA is considered by directly intrudingvector c ∈ Rn on the estimated state x [13]. To successfullyachieve the attack goal and also keep stealthy to the BDD, theinjected measurement vector should follow as [11]:

a = Hc (7)

This perfect FIDA indeed requires a strong condition wherethe attacker should know the exact system topology and lineparameters. In practice, the attacker may also consider to landa series of assaults as long as the imperfect attack model:

||a−Hc|| ≤ τ1 − ||z −Hx|| (8)

holds [32], [45].

C. Combined Integrity-Availablity Attacks

Assume a single attack attempt has been imposed on the ithbus by µ such that

c(k) =

{µ if k = i

0 otherwise(9)

where c(k) represents the kth elements of vector c. Accordingto the perfect attack principle (7), a possible attack vector canbe formulated as:

a = µH(:, i) (10)

where H(:, i) represents the ith column of Jacobian H . Letδ(i) = |H(:, i)|0 be the degree of bus i, it can be directlyshown that the above attack attempt is a sparsest upper bound[12], [46] with respect to the ith state. The index set of thecontaminated measurements becomes Ia = {j|H(j, i) 6= 0}.Generally, the FDIA is intensive and costly as the intruder hasto know the system knowledge and impose the attack vectoron the real measurement. On the contrast, an availability attackis much cheaper as sensor failure can be easily caused at bothphysical and cyber sides. In [35]–[37], the attacker aims toinject on a particular measurement where the remaining mea-surements in the sparsest critical tuple set can be covered tominimize the attack effort. Let d ∈ {0, 1}m be the availabilityattack vector and d(k) = 1 indicates that the kth measurementis not available, i.e. Id = {k|d(k) = 1}. The Jacobian matrixwill be reduced to Hr = (I − diag(d))H implying that thekth row of H becomes to null. In this article, the availabilityattack strength is defined as γ = |d|0/m. Two constraints areconsidered when applying availability attack on the predefinedFDIAs:

1) The dropped kth measurement should not be critical, and2) Ia and Id are disjoint, i.e. Ia ∩ Id = φ.Condition 1) ensures that the reduced matrix Hr is still ob-

servable [6]. Although pseudo-measurements can be imputedon the missing measurements, the deviation of the estimatedstates can be enlarged if the availability attack continues.Unlike [35] and [36], the control center cannot outcome thedesired state contamination if the measurements in Ia arehided. As shown by condition 2), the low-cost availability

attack is introduced to improve the stealthness of the FDIAsand mislead the detection algorithm.

The concept of missing data in the statistics is adopted toformulate the availability attacks. Two measurement blindingmechanisms are considered in particular, e.g. missing com-pletely at random (MCAR) and missing at random (MAR)[47]. In MCAR, all the measurements have the same proba-bility to loss and the missing proportion is a random subset ofthe measurements I \ Ia that depends neither on the propertynor the value of the measurement. On the contrast, MARor conditional missing is defined by the observed propertyof individual measurement. For instance, the adversaries mayintend to cover certain measurements that are more relatedto Ia to keep furtive. Detailed constructions on Id will bedescribed in section IV. The detection algorithm will betrained under MCAR condition whereas both MCAR andMAR will be evaluated during the test stage. Furthermore, weassume that the attacker has full system knowledge includingmeter measurement data, Jacobian H or system topology [3],and the BDD strategy. Besides, the attacked measurements areassumed to bypass the BDD test.

III. DETECTION METHODOLOGY

Load profiles are continuously changing in power systemdue to varying customer behaviours, weather conditions, anddispatch policies. This dependency can be reflected by thecorrelations of loads at different time which also implies thecorrelations on states and measurements [48]. As a result,dynamic behaviour has been considered in [15], [32], [45],[49] to aid on anomaly detection by predictive models. How-ever, most of them only consider one-step correlation and theknowledge on the current measurements can be barely appliedto the prediction model. In general, the dynamic anomalydetection at time t considering past T − 1 measurements canbe formulated by a look-ahead standard:

r(zt)′ =

1

T

t∑i=t−T+1

||zi − zi||22 (11)

where the dynamic on the measurements is governed byzi = l(zt, . . . , zt−T+1) for ∀i = t − T + 1, . . . , t. However,the explicit expression on l(·) is hard to find. As a result,deep learning method is applied to capture the correlationsin measurements together with the anomaly detection onmeasurement zt.

A. LSTM-AutoencoderGiven a multivariate dataset Z = {z<1>, z<2>, . . . , z<N>}

where z<i> ∈ Rm has m attributes and a decision spaceH ⊂ Rmh with mh < m, a commonly-used deep anomalydetection framework is to learn a nonlinear feature mappingf(·; Θ) : Z 7→ H which can imply anomaly score basingon the learnt knowledge in feature space H. The nonlinearmapping is achieved by stacked network layers, nonlinearactivation functions, and the trainable weights and biasesΘ = {W,B}.

In a nutshell, an autoencoder is a neural network that istrained to reconstruct its input at the output layer. It consists

IEEE TRANS. SUBMITTED FOR REVIEW 5

of two dense networks. For each of the sample z<i>, anencoder h<i> = fe(z

<i>; Θe) is applied to code the inputfeatures into the hidden layer h<i> ∈ Rmh with lowerdimension while a decoder can then retrieve the inputs bydecoding function z<i> = fd(h<i>; Θd) [50]. As a result, AEcreates a bottleneck for the data where only the significantstructural information can go through and be reconstructed.Assumption is made such that the encoded feature can learnnonlinear correlations on the input dataset that are sufficientfor separating the anomalies. One possible anomaly scoringon z<i> is to integrate the pretrained encoder to new outputlayers [51] or directly regard the low-dimensional features asthe training data to the downstream classification algorithms[52]. However, since the encoded hidden feature is compactand compressed, only the normal data can be successfullyreconstructed by the decoder - thus, anomalies and outlierscan be distinguished directly [53]:

{Θ∗e,Θ∗d} = arg minΘe,Θd

‖z − z‖22

= arg minΘe,Θd

∑z∈Z‖z − fd (fe (z; Θe) ; Θd)‖22

(12a)

S(z) = ‖z − fd (fe (z; Θ∗e) ; Θ∗d)‖22 (12b)

where Θe and Θd are training parameters in encoder anddecoder networks. ||·||2 is the Euclidean norm. Reconstructionerror S(z) is the anomaly score assigned to measurement zonce the network is trained. Subscript i is omitted in (12)for simplification. Similarly to the BDD (6), the decisionthreshold τ2 can be found according to the distribution of thereconstruction error on the validation set:

D2(z) =

{1 if S(z) ≥ τ20 otherwise

(13)

While the autoencoder has proven to be useful for anomalydetection where only the normal measurements are collected totrain the network, it can only detect point outliers and overlookthe temporal correlations between samples, e.g. the contextualand cumulative anomalies. To explicitly extract the temporalcorrelations, recurrent neural network (RNN) can be used toreplace the feedforward networks in both encoder and decoder.In RNN, the impact of the previous input can be recorded andnew features are learnt recursively in each training sample[50]. Perhaps, the most effective recurrent model in practiceis long short-term memory (LSTM) [54]. To deal with thelong-term gradient vanishing and explosion problem, LSTMcan automatically update and forget the state in each cell.

Consider a length T continuous subset of Z: Zi ={z<ti>, z<ti+1>, · · · , z<ti+T−1>} [29] with ti ≤ N −T +1.An LSTM cell for input z<t> can be calculated as Fig.1(a)and (14). In (14), W∗ are the kernels, b∗ represents thebiases, σ and tanh are commonly used as the sigmoid andhyperbolic tangent activation functions. In detail, at timet, state c<t> and the activation a<t> are updated by thecurrent input z<t>, the previous state c<t−1>, and previousactivation a<t−1>. The LSTM cells are initiated statelesslyby a<ti−1> = c<ti−1> = 0. Following (14), the useful

information in the past can always be kept by the update gateand the unworthy one will be forgotten by the forget gate. Inour problem, this suggests that the temporal correlations thatare critical for normality representation are extracted througha window of length T .

Candidate state: c<t> = tanh(Wc

[a<t−1>, z<t>

]+ bc

)Update gate: Γ<t>

u = σ(Wu

[a<t−1>, z<t>

]+ bu

)Forget gate: Γ<t>

f = σ(Wf

[a<t−1>, z<t>

]+ bf

)Output gate: Γ<t>

o = σ(Wo

[a<t−1>, z<t>

]+ bo

)State update: c<t> = Γ<t>

u c<t> + Γ<t>f c<t−1>

Output: a<t> = Γ<t>o tanh (c<t>)

(14)

(a)

(b)

Figure 1: LSTM structure: (a). a single LSTM cell, and (b).LSTM-AE

Fig.1(b) illustrates the structure of the LSTM autoencoder(LSTM-AE) where each layer represents an unfolding graph ofone LSTM cell for sample Zi and it is further stacked to havea deep structure. The LSTM-AE can be trained similarly to(12) where the anomaly score for Zi is the mean over samplelength T :

IEEE TRANS. SUBMITTED FOR REVIEW 6

S(Zi) =1

T

T−1∑j=0

∥∥z<ti+j> − ld(le(z<ti+j>; Θ∗e

); Θ∗d

)∥∥2

2

(15)where le and ld represent the LSTM encoder and decodermappings. Consequently, (15) gives a feasible representationon (11) where the measurement dynamics are embedded inl( · ; Θ) during training.

B. Random Dropout Layer

Denoising autoencoder (DAE) has been used specificallyfor missing data imputation [43]. Conventionally, DAE isused to train the network to learn more ‘intelligently’ on thehidden features instead of trivial mapping. Noise layer, suchas Gaussian noise mask, dropout mask, and salt-and-peppermask, is added directly on the input and a nonlinear mappingis trained to reconstruct the clean input signal:

fDAE : fd (fe (z; Θ∗e) ; Θ∗d) 7→ z (16)

where z is the corrupted version of z. In this article, thedropout mask is applied to quantify the process of missingmechanism. Though in real life, the missing ratio can beknown by the control center whenever the current measure-ments are available, its value γ can be varying continuously.Unavoidably, this requires the control center to have the detec-tion model on each missing ratio case which is computationalunattractive, lack of redundancy, and may also cause over-fitting problem. Inspired by DAE [55], variational inferenceapproximation [56], and sparse-LSTM [57], the basic idea ofthe completely random dropout on the input layer is to use theMonte-Carlo method to model all possible data missing ratiosin range [dmin, dmax] once for all.

During each training epoch, a random dropout mask Di isapplied on the ith sample to give the corrupted counterpart:

Zi = Zi −Zi ◦Di (17)

where ◦ is the Hadamard product and Di ∈ Rm×T withDi(j, k) = 1 indicating that the jth attributes in the kthmeasurements is unavailable during the current loop. Themissing ratio and attribution in Di(:, k) is random and inde-pendent on different i and k to simulate the MCAR condition.Consequently, after sufficient trainings (epochs), the LSTM-RDAE can fit on normal measured sequence under variousmissing ratios.

To control the complexity in simulation, we assume thatthe previous missing measurements have been imputed or esti-mated for further operations such as dispatch organization andload forcasting. As a result, only the incoming measurementis detected with missing attributes in real time:

z<ti+T−1> = z<ti+T−1> ◦ (1− di) (18)

where di ∈ {0, 1}m by controlling the missing strengthdmin ≤ |(di(j)|0/m ≤ dmax. Fig. 2 illustrates the connectionbetween random input dropout to the conventional LSTM-AE.

Figure 2: LSTM-RDAE configured for single temporal sample.

Remark 1: We refer the autoencoder-based anomaly detec-tion as semi-supervised as only the normal measurements arerequired to train the network. This has fundamental differencesto the unsupervised learning where the exact positive andnegative labels cannot be known in advance [17].

Remark 2: Though sharing with certain similar ideas, ourmethod, as a new DAE [55] application, is different to [56]where the fixed forward and recurrent dropout ratios are usedto approximate the variational inference. It also varies fromthe sparse LSTM-AE [57] and ensemble RandNet [58] inwhich the multiple networks with random and independentdisconnections in the hidden layers are applied.

C. Discussions on LSTM-RDAE

In this section, we intuitively explain why applying sequen-tial information can improve the performance of DAE forcombined attack detection. Referring to the manifold assump-tion [59], high-dimensional data locates around some certainlow-dimensional manifolds (such as the solid black curve inFig.3). The autoencoder can be interpreted as regenerating asimilar manifold (dashed black curve) to catch the projectedsequential data. The encoder representation le( · ; Θe) can thusbe argued as a certain projection pattern at this lower manifoldspace. When dropout is applied, the deviation between Zi

and Zi is amplified in the lower space, i.e. the distance tothe manifold increases. As shown by Fig. 3, the previousuncorrupted measurements can help guide the reconstructedmanifold up to the incoming corruption by learning theirtemporal correlations even under various dropout ratios.

Figure 3: LSTM-RDAE formulated as a manifold reconstruc-tion problem.

IV. EXPERIMENT SET-UPS AND RESULTS

In this section, we evaluate the performance of LSTM-RDAE algorithm for combined attack detection problem.

IEEE TRANS. SUBMITTED FOR REVIEW 7

A. Data Acquisition, Refinement, and Analysis

We use some common settings in literature to prepare thetraining dataset. To simulate the most realistic situation andobtain high-resolution real-time measurements and states, 11distinct European regional load profiles in year 2015-2017with 15min observation interval are collected from an open-source dataset [60] as also in [41]. Firstly, the missing data isroughly imputed by the last-day’s load at the same time. Tohave a reasonable profile range, for each regional load pattern,a random maximum load consumption li,max is applied [61]:

L<k>i = li,max

l<k>i

maxk (li), i = 1, 2, . . . 11 (19)

where l<k>i is the load profile in region i at time k, li,max ∈

[0.25, 2.75] is the random maximum power for region i in p.u..maxk (li) is the maximum power at region i, and L<k>

i is therescaled regional load profile.

Figure 4: One day load profile for the first 10 out of 99 loadsin bus-118 system

IEEE bus-118 test system [62] is used as the simulationplatform which contains 99 loads and 54 generators in total.To map the 11 distinct load profiles into 99 load buses andkeep their individual pattern, symmetric Dirichlet distribution[41] with a low distribution parameter a = 0.2 is applied.Furthermore, as in general the state estimation can be pro-cessed in less than 5min, two more load points are interpolatedwithin the successive 15min data with 2% variations added.Consequently, each bus has 288 load data a day accountingfor 210528 temporally continuous load data in total, e.g. inFig. 4. After obtaining the L ∈ R99×210528 loads, AC-OPF isoperated with the specified active power demand. The powerfactor is found as the default value with 5% variations [33],[61]. To increase the randomness, stepped quadratic generatorcosts are applied at each node. The remaining parameters,system topology, and operational constraints are set as defaultsas in MATPOWER bus-118 description file. At each time k theAC-OPF gives the solution to line power flows Pf ∈ R186 andpower generations Pg ∈ R54. Then the modified measurementvector can be further found by the DC approximation:

z =

[CgPg − Pd −Gs − Pbus,shift

Pf − Pf,shift

](20)

where Cg ∈ R118×54 is the generator incidence matrix andCg(i, j) = 1 if the jth generator locates at bus i, otherwiseCg(i, j) = 0. Pd ∈ R118 is the demand vector. Gs is theconductance of the shunt elements whereas Pbus,shift andPf,shift are the admittance matrices for shift transformer. Theaugmented measurement z = [PI ;PF ] ∈ R304×210528 with1% Gaussian measurement noise [33], [45] added will be usedas the input to the neural network [13]. Moreover, the min-max scaling is applied on the raw measurement data into range[0, 1] across samples.

As discussed in Section II, different buses have differentvenerability degrees δ under FDIAs which can also reflect onthe attack efforts. For the IEEE bus-118 system under DCassumption, the degree can vary from δ = 3 at bus 10, 73,87, 111, 112, 116, and 117 up to δ = 22 at bus 49. If δ = 3,the targeted bus is connected to one other bus and changingthe state will only effect on the two buses’ states and the linepower between them which makes the attack hard to detect ingeneral.

B. Model Settings and Parameter TuningIn general, it is not conspicuous to train the semi-supervised

LSTM-RDAE algorithm before the detection accuracy iscounted by applying the scoring credit (15) since the anomaliesare not supposed to know in advance. Thus, as one of thebaseline methods, we pretrain an vanilla autoencoder methodwith full measurement information. Three metrics are consid-ered throughout the training and evaluation procedures: (a).True Positive Rate (TPR) which counts for the ratio of thepredicted positive samples over the actual positive samples,(b). False Positive Rate (FPR) which counts for the predictedpositive samples over the actual negative samples (the falsealarm), and (c). the F1 score which is defined by:

(1). The precision Pre: the proportion of the correctlypredicted anomalies in all the predicted anomalies:

Pre = TP/(TP + FP ) (21)

(2). The recall Rec: the proportion of the correctly predictedanomalies in all the true anomalies:

Rec = TP/(TP + FN) (22)

(3). The F1-score: the harmonic mean of Pre and Rec:

F1 score = 2× Pre×Rec/(Pre+Rec) (23)

Note that for a competitive classifier, the F1-score shouldapproach to 1. The core hyperparameters in AE network isthe hidden feature dimension, the number of layers, and theactivation functions. We test a wide range of these parameterswhere nonlinear activations and compression hidden featuresare designed in particular to avoid trivial feature extraction. Wethen give some benefits to vanilla AE by supervised training.In detail, the hyperparameters are tuned using grid searchmethod by evaluating its best F1 score. The optimal hyperpa-rameters are then implemented directly to the LSTM-RDAE.

IEEE TRANS. SUBMITTED FOR REVIEW 8

Table I records the hyperparameters for LSTM-RDAE guidedby vanilla AE where the training epoch is tuned to have similarloss as AE. The sample length T = 6 is chosen since thesuccessive attack is usually assumed to last for at most threesteps [34]. Further enlarging the sample length will dilute theimpact of isolated anomalous point. Accordingly, the trainingsamples in LSTM-RDAE are separated by sliding window [63]of length-6, step-1 so that a ZLSTM ∈ R304×6×210523 tersorcan be constructed.

Table I: Hyper-parameters of the Proposed LSTM-RDAE

Parameter ValueLayer structure 304-512-256-256-512-304Batch size 400Sample length 6Number of epochs 1500Dropout rate (input) [0, 0.2]Dropout rate (hidden) 0.005Optimizer AdamNormalisation min-max normalizationLearning rate 0.0001

The random input dropout rate is set as dmin = 0 to simu-late the condition where no availability attack is imposed. In abus-14 example, 3 up to 12 out of the 55 total measurementscan be dropped to hide an FDIA on a single measurement[36] and this number can rise to 36 if a coordinated attackis considered [37]. Without losing generality, the maximummissing ratio is set as dmax = 0.2 in our situation. Moreover,feedforward dropouts [56] are added in the encoder hiddenlayers to avoid overfitting in LSTM-RDAE. The training andvalidation process is operated on the first-year’s data which israndomly partitioned into ZTrain

LSTM : ZV alidLSTM = 0.8 : 0.2 and

the trained network will be evaluated solely on the secondyear’s data for both normality and anomaly. To sum up,attacked measurements of different kinds are generated onlyon ZTest

LSTM .The LSTM-RDAE as well as machine learning algorithms

is trained and analyzed on Google Colab Pro with P100 GPUand 25GB RAM option. MATLAB R2017a with MATPOWER7.0 embedded is run for simulating the power system modeland preparing the training measurements on an Intel Core i5-10600KF CPU @ 4.10GHz PC with 16GB RAM. Networksare trained under TensorFlow 2.3.0 and Keras frameworkwhereas machine learning algorithms are trained with Scikit-Learn v0.22 built-in models.

C. Stealthy Combined Integrity-Availability Attacks With Lim-ited Resources

In this section, we follow the similar attack patterns in[13] and [33] where only a specific single estimated state onevery bus except bus 69 (as the slack) can be contaminatedaccording to stealth attack model (10), as we are aiming tofind the most general detection algorithm. Each single statecan vary in set ± [3%, 5%, 7%, 10%, 15%, 20%, 30%]. For theone-shot (point) attacks, the averaged metric is calculated overall the 117 attack cases under different FDIA strengths andavailability attack ratios.

1) Detection Strategy and Model Evaluations: After train-ing the LSTM-RDAE network, the reconstruction error (15)on the normal validation set T V alid

LSTM are found under differentavailability attack ratios γ and sorted in an ascending order.Basing on the error distributions, their αth quantile can befound individually. As shown by Fig.5, except the large devi-ations at α = 100%, the reconstruction errors are similar underdifferent γ. Besides, the lowest anomaly score is achievedwhen there is no missing measurements (γ = 0.00). Fig. 6illustrates the FPR on normality T test

LSTM according to thepredefined thresholds in Fig.5 where stable false alarms canbe observed under different availability attacks. We can alsoobserve that the αth FPR is equivalent to their correspondingαth quantile, i.e. FPR(α) +α ≈ 1, regardless of the varyingratios which suggests that the proposed LSTM-RDAE canovercome the overfitting problem and give a better represen-tation on the unseen measurements.

Figure 5: Reconstruction error on the normal measurementsin T valid

LSTM with different missing ratios γ.

Figure 6: FPRs on the normal measurements in T testLSTM with

different missing ratios γ.

In real-time anomaly detection, γ is likely to vary contin-uously that is different from the five cases considered in Fig.

IEEE TRANS. SUBMITTED FOR REVIEW 9

5 and 6. To further test the robustness, five available attackranges are defined accordingly by Table II. For example, ifthe current measurement has 20 unavailable attributes, then theproblem will be solved by the principle defined at γ = 0.05.

Table II: Ranges and Detection Standards for Different Miss-ing Ratios

Availability Attack Range Detection Standards[0, 0.025) γ = 0.00

[0.025, 0.075) γ = 0.05[0.075, 0.125) γ = 0.10[0.125, 0.175) γ = 0.15

[0.175,∞) γ = 0.20

The differences between the LSTM-AE with and withoutthe proposed random dropout layer are compared in Fig. 7and 8. In detail, Fig. 7 compares the averaged F1 scores with10% attack on each bus. The LSTM-DAEs are trained on theexact five γ values in Table II and both DAEs and RDAE are tofollow the detection scheme in Table II. To test the robustness,the two detection algorithms are tested at the boundaries of theavailable attack ranges, which stands for the hardest possibledetection cases, i.e. at γ ∈ Γb = [0.025, 0.075, 0.125, 0.175].In general, the detection performance is deteriorated when γincreases. This observation obeys the assumption that impos-ing availability attack on the FDIA can improve the assaultstealthy level. Furthermore, the F1 score with random dropoutlayer (LSTM-RDAE) is higher than it with the exact missingratio preference (LSTM-DAE) in all cases, inferring that theproposed method is more accurate and robust.

Figure 7: Sensitivity of quantiles: averaged F1 scores under10% FDIAs and different availability attack ratios γ. TheLSTM-DAE is trained at γ whose detection threshold is foundaccording to Table II.

Referring to Fig. 6 and 7, the quantile α is another parameterinfluencing the detection metrics. Decreasing the quantile canimprove the TPR in the cost of misclassifying more normalmeasurements. As suggested by [7] and [41], FPR = 5% isa common choice to balance the trade-off of the F1 score. In

the following discussions, all the comparisons are operated bycontrolling 5% false alarms on each method unless specified.

Fig. 8 investigates the effect of different interval lengthsof the available attack ratio ranges in Table II. The optimalLSTM-DAE is simulated where each network is trained andthe detection strategy is determined particularly by their indi-vidual available attack ratio. As a result, it can be consideredto have the best detection performance. As in Fig. 7, thered dotted curve in Fig. 8 follows the detection strategy inTable II by LSTM-RDAE. The pink cycles highlight that theworst detection rates occur at the boundaries. If the detectionprinciples are further investigated explicitly at γ ∈ Γb, thedetection rates can be significantly improved (green curvein Fig. 8). In general, we can always achieve a comparabledetection performance using LSTM-RDAE by retrieving theexact detection principle, except when γ = 0. Unlike LSTM-DAE, finding a new detection standard on a certain γ does notrequire to retrain the network and the computational time isless than 0.5s in average.

Figure 8: Sensitivity of availability attack detection range:averaged detection rates under 10% FDIAs and differentavailability attack ratios γ. Both the model and detectionthreshold of LSTM-DAE are found at γ.

2) Model Verification: In this section, we evaluate theperformance of the LSTM-RDAE compared with other state-of-art deep learning methods. In particular, we investigate howthe temporal information can support the detection processwith and without availability attacks.

Table III compares LSTM-RDAE with two semi-superviseddeep learning algorithms: DAE and LSTM-AE, and two un-supervised machine learning techniques: One-Class SupportVector Machine (OC-SVM) and Isolation Forest (IF) assumingthat full measurements are given. To have a fare comparison,we also add random input dropout mask on DAE duringtraining. The objective of OC-SVM [64] is to find a hyperplanethat has the maximum margin separating the high dimensionaldata with the origin. The RBF kernel method is used in OC-SVM to measure the nonlinearity. Successive random hyper-separations are applied in IF [13], [65] where the abnormaldata can be automatically picked out with less steps if they

IEEE TRANS. SUBMITTED FOR REVIEW 10

Table III: Averaged detection rates for different detection algorithms under varying FDIA ratios %µ and full measurementsγ = 0.00.

Algorithms %µ = 0.03 0.05 0.07 0.10 0.15 0.20 0.30DAE 0.8011 0.8629 0.9108 0.9306 0.9506 0.9602 0.9726LSTM-AE 0.8334 0.9019 0.9270 0.9463 0.9627 0.9718 0.9788LSTM-RDAE 0.8144 0.8802 0.9161 0.9374 0.9550 0.9661 0.9756OC-SVM 0.5532 0.5935 0.6217 0.6547 0.7019 0.7443 0.8058IF 0.5475 0.5925 0.6236 0.6657 0.7188 0.7624 0.8144

are rare and distinctive. Ensemble isolation trees (iTrees) areconstructed to find the average anomaly score. The two MLmodels are trained solely on the normalities in the semi-supervised manner and the contamination ratios (similar tothe quantile defined in AE-based methods) are tuned to have5% FPR on the validation set. Principle component analysis(PCA) is applied in advance on the scaled training data asa feature engineering method to reduce the data dimensionwhich can benefit on the ML performance and computationalspeed. In our example, we train the MLs under supervisedawareness and it suggests that the first 26 out of 304 com-ponents can cover 99% variances thus can give the bestperformance. In the simulation, it is assumed that the missingattributes have been imputed, i.e. γ = 0.00. The suggestedhyperparameters for MLs are recorded in Table IV.

Table IV: Hyper-parameters of the MLs

OC-SVM IFKernel method RBF No. of iTrees 200Kernel coefficient 0.1 Samples per iTrees 256Contamination ratio 0.02 Contamination ratio 0.04

As shown by Table V, the performances of three deeplearning methods can significantly outperform the MLs’. Evenunder intense attack strength, both the OC-SVM’s and theIF’s detection rates are relatively low. Firstly, as real-timeload profiles are considered, different loads have differentranges and tendencies so that the obtained measurements aremore complex and temporally correlated (Fig. 4) that mayexceed the ML’s competence. In the meanwhile, the first 26components are considered through PCA which is still high formachine learning algorithms. Secondly, IF is set to be semi-supervised which leverages its separation effects on the unseenattack samples.

The TPR of the proposed method is slightly lower thatit without input dropout layer as the cost of considering allavailability attack possibilities during training. It suggests thatif there is no availability attack, LSTM-AE can be safely im-plemented for detection. As the FDIA becomes more intense,the assaults turn to be less stealthy to the control center andthe difference between these two methods becomes negligible.

Table V evaluates the TPRs of LSTM-AE, DAE, andLSTM-RDAE under varying FDIAs and availability attackratios. Firstly, the detection strategy of LSTM-AE follows themethod in Section IV-C. As it is trained without specifyingmissing input condition, LSTM-AE is leveraged by the largefalse alarm causing the lowest detection rate of the threemethods. In all attack scenarios in Table III and V, the

proposed algorithm outperforms the DAE, especially whenthe FDIAs are small and γs are large (highlighted in red)which verifies that the temporal correlations extracted bythe recursive LSTM cells can give more confidence on theunavailable measurements as discussed in Fig. 3.

Table V: Averaged detection rates for different detectionalgorithms under different FDIA and availability attack ratios.The improvement is calculated between LSTM-RDAE andDAE.

%µ = 5%Algorithms γ = 0.05 0.10 0.15 0.20LSTM-AE 0.8034 0.7592 0.7151 0.6702

DAE 0.8311 0.8023 0.7746 0.7433LSTM-RDAE 0.8565 0.8356 0.8137 0.7989Improvement 0.0254 0.0333 0.0391 0.0556

%µ = 10%LSTM-AE 0.8860 0.8429 0.8107 0.8022

DAE 0.9139 0.8917 0.8662 0.8423LSTM-RDAE 0.9277 0.9168 0.9035 0.8902Improvement 0.0128 0.0251 0.0373 0.0479

%µ = 20%LSTM-AE 0.9305 0.9142 0.8979 0.8741

DAE 0.9519 0.9414 0.9305 0.9187LSTM-RDAE 0.9594 0.9536 0.9467 0.9414Improvement 0.0075 0.0122 0.0162 0.0227

3) Small and Successive Attacks: Temporal correlation isfurther examined in this section where slight but successiveattacks are considered to investigate the detection performanceon the cumulative anomalies. In detail, a single attack withsmall attack vector may be harmless to the safety operationbut their compound impacts should not be overlooked. Here,the attacks are assumed to last for at most three steps [34].

Fig. 9 simulate the LSTM-RDAE under various successivecombined attack strengths. As the missing ratio growing,the detection on one-shot attack becomes harder and theimprovement on the detection rate introduced by successiveanomalies becomes more significant, i.e. the detection rate onthe three-step attacks (pink curves) are similar in all threeconditions (a)-(c) no matter the missing ratio γ. Fig. 9(d)highlights the condition of selected buses with low detectionrate when one-shot 10% false data and 20% missing ratios areinjected. In majority of these buses, the detection rates whenat most three steps are considered can be boosted by 10%to 25% compared with the one-shot attack and 20% to 50%compared with the DAE.

IEEE TRANS. SUBMITTED FOR REVIEW 11

(a) (b)

(c) (d)

Figure 9: Averaged detection rates under different combined FDIA-Availability attack strengths: (a) γ = 0.00; (b) γ = 0.10,(c) γ = 0.20, and (d) improvements on the selected buses with initial detection rate smaller than 80% under 10% FDIA attackand 20% availability attack.

4) FDIAs under Targeted Availability Attacks: In the previ-ous sections, the measurements are lost completely at random(MCAR) while in practice, the attacker may blind a certainarea of RTUs to mislead the control center. Meanwhile, it isalso realistic to have missing data around a central junctional-node. The measurements are attacked according to their statedegrees, thus leading to an MAR scheme. For the ith state,we define its attack neighborhood Na(i) as the set containingthe measurements directly connected to the contaminatedmeasurements:

Ia(i) = {j|H(j, i) 6= 0} (24a)

Na(i) = {k|H(k, j) 6= 0,∀j ∈ Ia(i)} \ Ia(i) (24b)

where Ia(i) defines the contaminated measurement set due tostate attack on bus i and the set difference \ ensures that thecontaminated measurements in set Ia(i) are still available tothe control center. In Table VI, FDIAs with %µ = 0.10 on bus93 and 94 are simulated with the following target availabilityattack settings.

1) FDIA on bus 93 with |Ia(93)|0 = 5 and blind on setId(93) = Na(93) which accounts for availability attack

ratio γ = 4.93%;2) FDIA on bus 94 with |Ia(94)|0 = 11 and blind on setId(94) = Na(94) which accounts for availability attackratio γ = 9.21%;

The mesh graph around bus 93 is illustrated by Fig. 10.The red nodes and edges represent the contaminated measure-ments in set Ia(93) whereas the blacks represent the attackneighborhoods in set Na(93) for target availability attack.Random availability attacks are also simulated in Table VIwith the same missing ratios as the two cases. In general,masking the measurements in Na will deteriorate the detectionperformances in both DAE and LSTM-RDAE methods asthe most relevant spatial correlations are now unknown tothe control center. However, the deterioration percentage onthe proposed algorithm is smaller than it in DAE, whichagain verifies the assumption in Fig.3. In the worst case, 77%detection rate is still maintained. It is also worth to note thatdifferent buses can have distinct sensitivities on the availabilityattack types and strengths, due to their various degrees andtopological configurations.

IEEE TRANS. SUBMITTED FOR REVIEW 12

Figure 10: Illustration of proposed target availability attack onbus 93.

Table VI: Detection rates under target availability attacks.

bus 93 bus 94target random target random

DAE 0.389 0.838 0.834 0.861LSTM-RDAE 0.773 0.946 0.904 0.925

5) Stealth Replay Attacks: If the attacker is located at thecontrol center, e.g. an internal intruder, he might have accessto the previous measurements to impose the replay attack [7],[30]. To improve the attack stealthness, we assume that thecurrent measurement is replaced by the measurement at thesame time but in the previous day, e.g. za(t) = z (t− t0)where t0 = 288. Although it has been argued that the replayattack may not be practical to the attacker in real life [32],[45], in general it serves as the strongest stealthy assault tobypass both BDD (6), (8) and the point detection algorithm,such as DAE. Indeed, the replay attack can be seen as an FDIAwith a(t) = za(t)− z(t) and the estimated state variation canbe calculated by (4). As a result, replay attack is tested as antheoretical instance of contextual anomaly.

The simulation results on one-shot replay attacks are illus-trated in Table VII under different γs. The detection rates ofDAE are around 5% which are similar to the predefined FPR(= 5%) during the model verification stage. Thanks to thetemporal exploitation property in LSTM cells, the proposedalgorithm can achieve detection rate on the replay attackbetween 84% and 72% depending on the rate of missing data.

Table VII: Averaged detection rates for the previous day’sreplay attacks.

Strategy γ = 0 0.05 0.10 0.20LSTM-RDAE 0.8424 0.8106 0.7740 0.7110DAE 0.0628 0.0610 0.0567 0.0468

V. CONCLUSIONS

This paper investigates a semi-supervised detection algo-rithm for combined integrity-availability attack in power sys-tem where model knowledge, sample labelling, and prelimi-naries on the attack patterns are not required. We formulate

the availability attack to further improve the stealth of FDIAsthat can confuse the control option and deteriorate the falsealarm ratio. The proposed LSTM-RDAE is pure data-drivenand can explicitly fit on the spatiotemporal complexities in thenormal measurement sequences. Moreover, a completely ran-dom dropout layer is designed after the input layer to evaluatethe varying availability attack ratios. The performance of theproposed detection framework is verified under IEEE 118-bussystem where real-time load profiles are employed. Sensitivityanalysis is given for parameter tuning while various unseencombined attack scenarios, e.g. one-shot and successive stateattack, target availability attack, and stealth replay attack aresimulated during the test stage. By controlling the FPR under5%, the simulation results verify that the proposed LSTM-RDAE is more accurate with approximate 95% detection rateunder moderate attacks and more robust than the state-of-artdeep machine learning counterparts in literature.

REFERENCES

[1] Y. Yan, Y. Qian, H. Sharif, and D. Tipper, “A survey on smart grid com-munication infrastructures: Motivations, requirements and challenges,”IEEE communications surveys & tutorials, vol. 15, no. 1, pp. 5–20,2012.

[2] C.-W. Ten, C.-C. Liu, and G. Manimaran, “Vulnerability assessment ofcybersecurity for scada systems,” IEEE Transactions on Power Systems,vol. 23, no. 4, pp. 1836–1846, 2008.

[3] A. Sayghe, Y. Hu, I. Zografopoulos, X. Liu, R. G. Dutta, Y. Jin,and C. Konstantinou, “A survey of machine learning methods fordetecting false data injection attacks in power systems,” arXiv preprintarXiv:2008.06926, 2020.

[4] V. Y. Pillitteri and T. L. Brewer, “Guidelines for smart grid cybersecu-rity,” Tech. Rep., 2014.

[5] R. M. Lee, M. J. Assante, and T. Conway, “Analysis of the cyberattack on the ukrainian power grid,” Electricity Information Sharing andAnalysis Center (E-ISAC), vol. 388, 2016.

[6] A. Abur and A. G. Exposito, Power system state estimation: theory andimplementation. CRC press, 2004.

[7] A. S. Musleh, G. Chen, and Z. Y. Dong, “A survey on the detection algo-rithms for false data injection attacks in smart grids,” IEEE Transactionson Smart Grid, vol. 11, no. 3, pp. 2218–2234, 2020.

[8] Z. Tan, A. Jamdagni, X. He, P. Nanda, and R. P. Liu, “A systemfor denial-of-service attack detection based on multivariate correlationanalysis,” IEEE Transactions on Parallel and Distributed Systems,vol. 25, no. 2, pp. 447–456, 2014.

[9] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, “Malicious data attackson the smart grid,” IEEE Transactions on Smart Grid, vol. 2, no. 4, pp.645–658, 2011.

[10] X. Liu, Z. Li, X. Liu, and Z. Li, “Masking transmission line outages viafalse data injection attacks,” IEEE Transactions on Information Forensicsand Security, vol. 11, no. 7, pp. 1592–1602, 2016.

[11] Y. Liu, P. Ning, and M. K. Reiter, “False data injection attacks againststate estimation in electric power grids,” ACM Transactions on Informa-tion and System Security (TISSEC), vol. 14, no. 1, pp. 1–33, 2011.

[12] G. Hug and J. A. Giampapa, “Vulnerability assessment of ac stateestimation with respect to false data injection cyber-attacks,” IEEETransactions on smart grid, vol. 3, no. 3, pp. 1362–1370, 2012.

[13] S. Ahmed, Y. Lee, S. Hyun, and I. Koo, “Unsupervised machinelearning-based detection of covert data integrity assault in smart gridnetworks utilizing isolation forest,” IEEE Transactions on InformationForensics and Security, vol. 14, no. 10, pp. 2765–2777, 2019.

[14] Q. Yang, J. Yang, W. Yu, D. An, N. Zhang, and W. Zhao, “On falsedata-injection attacks against power system state estimation: Modelingand countermeasures,” IEEE Transactions on Parallel and DistributedSystems, vol. 25, no. 3, pp. 717–729, 2014.

[15] T. Wu, W. Xue, H. Wang, C. Chung, G. Wang, J. Peng, and Q. Yang,“Extreme learning machine-based state reconstruction for automaticattack filtering in cyber physical power system,” IEEE Transactions onIndustrial Informatics, 2020.

[16] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,”ACM computing surveys (CSUR), vol. 41, no. 3, pp. 1–58, 2009.

IEEE TRANS. SUBMITTED FOR REVIEW 13

[17] G. Pang, C. Shen, L. Cao, and A. v. d. Hengel, “Deep learning foranomaly detection: A review,” arXiv preprint arXiv:2007.02500, 2020.

[18] Z. Chu, O. Kosut, and L. Sankar, “Detecting load redistribution attacksvia support vector models,” arXiv preprint arXiv:2003.06543, 2020.

[19] M. Cui, J. Wang, and B. Chen, “Flexible machine learning-based cyber-attack detection using spatiotemporal patterns for distribution systems,”IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1805–1808, 2020.

[20] A. Jindal, A. Dua, K. Kaur, M. Singh, N. Kumar, and S. Mishra,“Decision tree and svm-based data analytics for theft detection in smartgrid,” IEEE Transactions on Industrial Informatics, vol. 12, no. 3, pp.1005–1016, 2016.

[21] J. Zhang and H. Wang, “Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance,” Knowl-edge and information systems, vol. 10, no. 3, pp. 333–355, 2006.

[22] R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: Asurvey,” arXiv preprint arXiv:1901.03407, 2019.

[23] Y. Zhang, J. Wang, and B. Chen, “Detecting false data injectionattacks in smart grids: A semi-supervised deep learning approach,” IEEETransactions on Smart Grid, pp. 1–1, 2020.

[24] S. A. Foroutan and F. R. Salmasi, “Detection of false data injectionattacks against state estimation in smart grids based on a mixturegaussian distribution learning method,” IET Cyber-Physical Systems:Theory Applications, vol. 2, no. 4, pp. 161–171, 2017.

[25] M. Ozay, I. Esnaola, F. T. Yarman Vural, S. R. Kulkarni, and H. V. Poor,“Machine learning methods for attack detection in the smart grid,” IEEETransactions on Neural Networks and Learning Systems, vol. 27, no. 8,pp. 1773–1786, 2016.

[26] Z. Zhang, R. Deng, D. K. Y. Yau, P. Cheng, and J. Chen, “Analysis ofmoving target defense against false data injection attacks on power grid,”IEEE Transactions on Information Forensics and Security, vol. 15, pp.2320–2335, 2020.

[27] B. Li, G. Xiao, R. Lu, R. Deng, and H. Bao, “On feasibility andlimitations of detecting false data injection attacks on power gridstate estimation using d-facts devices,” IEEE Transactions on IndustrialInformatics, vol. 16, no. 2, pp. 854–864, 2020.

[28] M. Higgins, F. Teng, and T. Parisini, “Stealthy mtd against unsupervisedlearning-based blind fdi attacks in power systems,” IEEE Transactionson Information Forensics and Security, pp. 1–1, 2020.

[29] A. Sagheer and M. Kotb, “Unsupervised pre-training of a deep lstm-based stacked autoencoder for multivariate time series forecasting prob-lems,” Scientific Reports, vol. 9, no. 1, pp. 1–16, 2019.

[30] Y. Mo and B. Sinopoli, “Secure control against replay attacks,” in2009 47th Annual Allerton Conference on Communication, Control, andComputing (Allerton), 2009, pp. 911–918.

[31] J. Zhao, G. Zhang, Z. Y. Dong, and M. La Scala, “Robust forecastingaided power system state estimation considering state correlations,”IEEE Transactions on Smart Grid, vol. 9, no. 4, pp. 2658–2666, 2018.

[32] J. Zhao, G. Zhang, Z. Y. Dong, and K. P. Wong, “Forecasting-aidedimperfect false data injection attacks against power system nonlinearstate estimation,” IEEE Transactions on Smart Grid, vol. 7, no. 1, pp.6–8, 2015.

[33] S. K. Singh, K. Khanna, R. Bose, B. K. Panigrahi, and A. Joshi, “Joint-transformation-based detection of false data injection attacks in smartgrid,” IEEE Transactions on Industrial Informatics, vol. 14, no. 1, pp.89–97, 2018.

[34] Z. Yang, H. Liu, T. Bi, and Q. Yang, “Bad data detection algorithm forpmu based on spectral clustering,” Journal of Modern Power Systemsand Clean Energy, vol. 8, no. 3, pp. 473–483, 2020.

[35] K. Pan, A. M. H. Teixeira, M. Cvetkovic, and P. Palensky, “Combineddata integrity and availability attacks on state estimation in cyber-physical power grids,” in 2016 IEEE International Conference on SmartGrid Communications (SmartGridComm), 2016, pp. 271–277.

[36] K. Pan, A. Teixeira, M. Cvetkovic, and P. Palensky, “Cyber risk analysisof combined data attacks against power system state estimation,” IEEETransactions on Smart Grid, vol. 10, no. 3, pp. 3044–3056, 2019.

[37] J. Tian, B. Wang, T. Li, F. Shang, and K. Cao, “Coordinated cyber-physical attacks considering dos attacks in power systems,” InternationalJournal of Robust and Nonlinear Control, vol. 30, no. 11, pp. 4345–4358, 2020.

[38] W. Qiu, Q. Tang, K. Zhu, W. Wang, Y. Liu, and W. Yao, “Detectionof synchrophasor false data injection attack using feature interactivenetwork,” IEEE Transactions on Smart Grid, pp. 1–1, 2020.

[39] Y. He, G. J. Mendis, and J. Wei, “Real-time detection of false datainjection attacks in smart grid: A deep learning-based intelligent mech-anism,” IEEE Transactions on Smart Grid, vol. 8, no. 5, pp. 2505–2516,2017.

[40] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality ofdata with neural networks,” science, vol. 313, no. 5786, pp. 504–507,2006.

[41] C. Wang, S. Tindemans, K. Pan, and P. Palensky, “Detection of falsedata injection attacks using the autoencoder approach,” arXiv preprintarXiv:2003.02229, 2020.

[42] Y. Lin and J. Wang, “Probabilistic deep autoencoder for power systemmeasurement outlier detection and reconstruction,” IEEE Transactionson Smart Grid, vol. 11, no. 2, pp. 1796–1798, 2020.

[43] W. Wang, P. Tang, L. Xiong, and X. Jiang, “Radar: Recurrent autoen-coder based detector for adversarial examples on temporal ehr,” 2020.

[44] H. Saadat et al., Power system analysis. McGraw-Hill, 1999, vol. 2.[45] J. Zhao, G. Zhang, M. La Scala, Z. Y. Dong, C. Chen, and J. Wang,

“Short-term state forecasting-aided method for detection of smart gridgeneral false data injection attacks,” IEEE Transactions on Smart Grid,vol. 8, no. 4, pp. 1580–1590, 2017.

[46] M. A. Rahman and H. Mohsenian-Rad, “False data injection attacksagainst nonlinear state estimation in smart power grids,” in 2013 IEEEPower & Energy Society General Meeting. IEEE, 2013, pp. 1–5.

[47] D. B. Rubin, “Inference and missing data,” Biometrika, vol. 63, no. 3,pp. 581–592, 1976.

[48] L. Wang, Q. Zhou, and S. Jin, “Physics-guided deep learning for powersystem state estimation,” Journal of Modern Power Systems and CleanEnergy, vol. 8, no. 4, pp. 607–615, 2020.

[49] H. Wang, X. Wen, Y. Xu, B. Zhou, J.-C. Peng, and W. Liu, “Operatingstate reconstruction in cyber physical smart grid for automatic attackfiltering,” IEEE Transactions on Industrial Informatics, 2020.

[50] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press,2016.

[51] L. Zhu and N. Laptev, “Deep and confident prediction for time seriesat uber,” in 2017 IEEE International Conference on Data MiningWorkshops (ICDMW). IEEE, 2017, pp. 103–110.

[52] D. Xu, E. Ricci, Y. Yan, J. Song, and N. Sebe, “Learning deep represen-tations of appearance and motion for anomalous event detection,” arXivpreprint arXiv:1510.01553, 2015.

[53] L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui,A. Binder, E. Müller, and M. Kloft, “Deep one-class classification,”in International conference on machine learning, 2018, pp. 4393–4402.

[54] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neuralcomputation, vol. 9, no. 8, pp. 1735–1780, 1997.

[55] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extract-ing and composing robust features with denoising autoencoders,” inProceedings of the 25th international conference on Machine learning,2008, pp. 1096–1103.

[56] Y. Gal and Z. Ghahramani, “A theoretically grounded application ofdropout in recurrent neural networks,” in Advances in neural informationprocessing systems, 2016, pp. 1019–1027.

[57] T. Kieu, B. Yang, C. Guo, and C. S. Jensen, “Outlier detection for timeseries with recurrent autoencoder ensembles.” in IJCAI, 2019, pp. 2725–2732.

[58] J. Chen, S. Sathe, C. Aggarwal, and D. Turaga, “Outlier detection withautoencoder ensembles,” in Proceedings of the 2017 SIAM internationalconference on data mining. SIAM, 2017, pp. 90–98.

[59] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, andL. Bottou, “Stacked denoising autoencoders: Learning useful represen-tations in a deep network with a local denoising criterion.” Journal ofmachine learning research, vol. 11, no. 12, 2010.

[60] L. Hirth, J. Muhlenohordt, I. Schlecht, and J. Weibezhhn,“Time series data,” Jun 2019. [Online]. Available: https://data.open-power-system-data.org/time_series/2019-06-05

[61] J. Zhang, Y. Wang, Y. Weng, and N. Zhang, “Topology identification andline parameter estimation for non-pmu distribution network: A numericalmethod,” IEEE Transactions on Smart Grid, 2020.

[62] R. D. Zimmerman, C. E. Murillo-Sánchez, and R. J. Thomas, “Mat-power: Steady-state operations, planning, and analysis tools for powersystems research and education,” IEEE Transactions on Power Systems,vol. 26, no. 1, pp. 12–19, 2011.

[63] J. Yang, S. Zhang, Y. Xiang, J. Liu, J. Liu, X. Han, and F. Teng, “Lstmauto-encoder based representative scenario generation method for hybridhydro-pv power system,” IET Generation, Transmission & Distribution,2020.

[64] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C.Williamson, “Estimating the support of a high-dimensional distribution,”Neural computation, vol. 13, no. 7, pp. 1443–1471, 2001.

[65] F. T. Liu, K. M. Ting, and Z. Zhou, “Isolation forest,” in 2008 EighthIEEE International Conference on Data Mining, 2008, pp. 413–422.