high-performance intrusion detection system using deep

High-Performance Intrusion Detection System using Deep Learning in Packet and

Flow-Based Networks

Kaniz Farhana

Department of Computer Science & Engineering

Port City International University, Chittagong

Bangladesh

[email protected]

Maqsudur Rahman


Port City International University, Chittagong

Bangladesh

[email protected]

Muhammad Anwarul Azim


University of Chittagong, Bangladesh

[email protected]

Md. Tofael Ahmed

Department of Information & Communication

Technology, Comilla University, Bangladesh

[email protected]

ABSTRACT The information revolution, extensive cloud

computing, and enormous network traffic have made

the security of systems from threats and attack more

crucial. Continuous monitoring of the system and

network from malicious incidents and vulnerabilities

has a great role in the prevention of software and

hardware resources. Intrusion Detection System has

become a significant aspect of the security of the

Internet and intranet where the pattern of data on

networks constantly changes with time and new

attacks. Many types of researches are concentrating

on Deep Learning (DL) methods that provide

effective solutions with great accuracy and

performance for applying to big data related to

security and privacy of network and system

automation. In this paper, we investigated the

performance of various DL techniques, Deep Neural

Network, Convolutional Neural Network, Recurrent

Neural Network (RNN), Long Short-Term Memory

(LSTM), and Gated Recurrent Unit that is trained,

validated, and tested using the CICIDS2017 dataset

with some additional metadata that contained various

fields including packet and flow-based network

traffic. Then we proposed a Hybrid Bidirectional-

RNN-LSTM model for multi-class and binary

classification in the Keras and TensorFlow DL

environments. With the selected important features,

the experimental result of our proposed method

produced more than 99% accuracy for both binary

and multi-class classification which is higher

compared to existing researches. Evaluation metrics

such as confusion matrix, precision, recall, f1-score,

and Receiver Operating Characteristics showed good

results. These outcomes adduce that DL techniques

have higher effectiveness for detecting intrusion in

the packet and flow-based networks.

KEYWORDS Packet and Flow-based Network, Intrusion Detection

System, Deep Learning, Deep Neural Network,

Convolutional Neural Network, Simple Recurrent

Neural Network, Gated Recurrent Unit, Long Short-

Term Memory, Hybrid Bidirectional-RNN-LSTM,

Big data, Keras, and Google TensorFlow.

1 INTRODUCTION

1.1 Background

Due to the recent development in network

communications, the threats from different types

of sources have been increased a great deal

which makes the use of information and

communication technology more vulnerable [1].

Computer networks can be investigated from

many different views. Analyzing the network

traffic in both attacks as well as normal nature

should be essential from the security point of

view. Therefore, capturing the network traffic in

packet and flow format is necessary for the

identification of Intrusion into the system. In

other words, intrusion detection and prevention

can be possible when the traffic is available in

the packet and flow-based format from the

network.

1.2 Intrusion Detection System

Intrusion Detection System (IDS) poses a great

significance to track the presence of the threats.

Researchers take a massive interest in

cybersecurity to investigate a more optimal way

55

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011

mailto:[email protected]

mailto:[email protected]

of handling attacks through intrusion detection.

Research about IDS takes the challenge of

various real-world attack scenarios and detects

intrusions efficiently. An IDS follows the

network flows or packet patterns of the data to

ensure the attack. Host-based IDS (H-IDS) and

Network-based IDS (N-IDS) are two types of

IDS [2]. H-IDS collects information from hosts,

computers, etc. and N-IDS collects raw data

packets from networks. Leveraging the intrusion

detection technique, anomaly and misuse

detection systems are also a matter of concern

[2]. Anomaly detection detects pattern changes

in the host system or the network traffics and

misuse detection finds abnormalities in both

already known benign attacks & past attack

signatures.

The increasing amount of real-time attacks

makes it much difficult to handle those and show

better performance using traditional intrusion

detection approaches including Machine

Learning (ML) models [3, 27].

1.3 Deep Learning

Deep Learning, the subset of ML is based on

Artificial Intelligence. DL approaches include

unsupervised, supervised, or semi-supervised

learning algorithms to construct output features

and attack predictions [4]. The application of

various DL models on big datasets with different

attack scenarios and performance evaluation has

become a popular topic for IDS research [5-9].

DL techniques learn the variation of the

attributes to show great performance on

imbalanced datasets. DL works with multiple

layer transformation and each next layer learns

from its previous layer.

In the DL, Convolutional Neural Network

(CNN) is a kind of deep neural network which is

build up with neurons, where neurons have

learnable weights. In CNN every neuron

receives various inputs, then calculates their

weighted sum, and passes it by an activation

function to give the output in the final layer.

CNN produces high-quality representations. A

Deep Neural Network (DNN) is an artificial

neural network with a collection of neurons

where all the neurons are sequences of multiple

layers. Neurons receive inputs from the previous

neuron and perform simple computation. The

DNN uses exact mathematical manipulation to

pivot the input into the output. Recurrent Neural

Networks (RNN) is a feed-forward network that

has an internal memory that processes sequences

of inputs. It has a recurrent nature because of

using the same function for each input data when

the current output depends on the previous

computation. All the inputs in the RNN are

related to each other. It is very useful for time

series and sequential data. Long Short-Term

Memory (LSTM) is the modified version of

RNN. In LSTM, the vanishing gradient problem

of RNN is solved. LSTM trains the model using

back-propagation and is suitable for processing,

classification, and prediction. It is an

unsupervised learning model while it is trained

using supervised learning models. Gated

Recurrent Unit (GRU) is a gating mechanism in

RNN similar to the LSTM unit without an output

gate. GRU can solve the vanishing gradient

problem by using an update gate and a reset gate.

GRU uses less memory and trains fast as it also

uses fewer parameters for training. Two

independent RNNs are put together in

Bidirectional RNN (Bi-RNN). This network has

both forward and backward data from the

sequence in every step. In RNN the future input

cannot extend from the present state, although in

Bi-RNN the current state can hold out to the

future input. Updating both input and output

layers does not happen at the same time.

Therefore, when back-propagation is executing

additional processing is needed.

1.4 Application Area

There are not many datasets available with valid

and versatile scenarios for detecting intrusion in

the packet and flow-based networks [10].

Among the datasets available over the year, the

CICIDS2017 dataset [11, 12] published from the

University of New Brunswick comes with

several features that have a great influence on

classification and prediction.

1.5 Contribution This paper focuses on intrusion detection using

several DL techniques applied on publicly

available CICIDS2017 datasets for both multi-

class and binary classification. We investigate

multiple DL techniques such as CNN, DNN,

Simple RNN, LSTM, GRU, and Bidirectional

RNN. After that, we have proposed a Hybrid Bi-

56


RNN-LSTM model for binary and multi-class

classification. We construct this hybrid DL

model by combining the LSTM layer with

Bidirectional RNN in the final layer to achieve

higher accuracy using Keras and Google

TensorFlow libraries for creating a DL

environment.

1.6 Structure of Paper

The contents of the paper are presented as

follows. Section 2 gives insight into some of the

recent researches that are related to this paper.

Section 3 explains the dataset used in the paper.

The next section 4 is about different techniques

that are applied for data preprocessing. The

methodology is described in section 5. In section

6, test results and performance analysis are

discussed. Finally, section 7 concludes the paper

with remarks on future work.

2 RELATED WORK

Intrusion detection and performance analysis of

DL drew the attention of researchers due to the

variety in attacks and continued changes in the

attacking environment. Vinayakumar et al. [13]

implemented Multilayer Perceptron (MLP) and

Deep Belief Net (DBN) for the intrusion

detection system. They did a comprehensive

review among various classical ML techniques,

MLP and DBN using benchmark datasets

KDDcup'99 and NSLKDD datasets where good

performances were shown.

Serpil et al. [14] presented different approaches

of DNN, Shallow Neural Network, and Auto

Encoder to detect malicious activities using the

CICIDS201 dataset. Their research exhibited

differences for different sets of selected features.

With all features, it resulted in 98.4% accuracy.

Ulya et al. [15] proposed LSTM and DNN

binary prediction for DoS and DDoS attacks.

They trained their model on the CICIDS2017

dataset, got 99.9%, and 99.8% true positive rates

for LSTM and DNN models respectively.

Kayvan et al. [16] published an IDS for hybrid

anomaly classification using multiple DL

techniques with a binary algorithm optimizer

using the CICIDS2017 dataset for the

performance measure. DNN, Binary Bat

Algorithm (BBA), Binary Genetic Algorithm

(BGA), and Binary Gravitational Search

Algorithm (BGSA) methods displayed 96.427%,

97.023%, 96.480%, and 99.002% accuracy

respectively.

Zang et al. [17] proposed hierarchical network

intrusion detection models designed with LSTM

and CNN mainly, which learned temporal and

spatial features from the real data. For the

experiment, they used CICIDS2017 and CTU

datasets, both datasets showed their proposed

models produced 99% accuracy, very high

scores for precision, recall, and f1-score.

Aksu et al. [18] performed a comparative

analysis of DL and ML methods applying CNN

and Support Vector Machine (SVM) algorithms

on the CICIDS2017 dataset to detect port scan

attacks. Their models achieved 97.8% and

69.79% accuracy for DL and ML respectively.

An AI-SIEM method was proposed in [19] by

Lee et al., which focused on the true positive and

false-positive rates. They used the combination

of NSLKDD and CICIDS2017 datasets for

experimental purposes. Performance evaluation

among ML and proposed DL models was

conducted to show that their models gave better

results and outperformed ML models.

Navaporn et al. [20] proposed IDS using

TensorFlow to detect popular attacks named

DoS, port scan, network scan via ICMP, UDP,

and TCP. MAWILab'2017 dataset was used to

train and evaluate the performance for Snort,

RNN, Stacked RNN, and CNN models. Their

evaluation results rendered that deep learning

models performed better than the Snort model.

Roopak et al. [21] proposed four different DL

models and compared them to ML algorithms

and evaluated DDoS attack scenarios from the

CICIDS2017 dataset and found 97.16% accuracy

for the combined CNN and LSTM model.

A comparative analysis was conducted on deep

learning techniques in [22] by Vinayakumar et

al. This model used network traffic data as time

series and implemented several DL models such

as LSTM, RNN, and Internal Recurrent Neural

57


Networks (IRNN) using KDDcup99, NSLKDD,

and UNSW-NB15 datasets. RNN and IRNN

performed better.

In this paper, we are focusing on the comparative

analysis of DL techniques CNN, Simple RNN,

LSTM, GRU, and DNN applied to the

CICIDS2017 dataset. Finally, we propose Bi-

RNN-LSTM that has illustrated great accuracy

compared to existing research results and has

performed superior concerning the precision,

recall, and f1-score.

3 DATASET

The lack of sufficient types of network traffic

patterns including different kinds of attacks is a

vital issue with most of the IDS datasets. Our

paper uses a benchmark dataset CICIDS2017 for

evaluation and comparison [11, 12]. It is an

open-source dataset, introduced by the Canadian

Institute for Cybersecurity. This dataset has a

collection of 79 features with benign traffic and

14 attacks traffic spanned over Monday to Friday

for different sets of attacks within a fixed period

[23]. The dataset is fully labeled with the packet

and bidirectional flow-based records with

additional metadata [24], thereby, suitable for

our area of application. The second column of

Table 1 shows the original number of records or

instances present in the dataset according to the

attack types. The 14 attack types and the benign

attacks are regarded as 1 and 0 respectively for

binary classification.

4 DATA PREPROCESSING

DL is connected with data, and to achieve the

best performance data preprocessing is essential

for the transformation of unrefined data into a

reliable and compatible format. Data

preprocessing reduces the ambiguities in the

dataset.

4.1 Duplicate Removal

Duplicate values are removed from the dataset

for both binary and multi-class classification to

get more precise evaluation results, which

reduces the total instances number for the model.

The third column of Table 1 shows the instance

number present in the dataset after removing

duplicates.

Table 1. CICIDS2017 Dataset instances and duplicates

Attack Types

Original Number

of Instances

(Before

Removing

Duplicates)

Number of

Instances (After

Removing

Duplicates)

Benign 2273097 1950101 DDoS 128027 128016 Port Scan 158930 1958 Bot 1966 1441 Infiltration 36 36 Web Attack-

Brute Force 1507 1470

Web Attack-SQL

Injection 21 21

Web Attack-XSS 652 652 FTP Patator 7938 5933 SSH Patator 5897 3217 Heartbleed 11 11 DoS Goldeneye 10293 10286 DoS Hulk 231073 172849 DoS slowhttptest 5499 5228

Duplicate values are removed from the CSV files

using spreadsheet software before feature

selection, class balancing and data

normalization.

4.2. Feature Selection

While working with high-dimensional data,

training time is affected by the number of

features and increases the chance of overfitting.

This can be avoided by the feature selection.

Sharafaldin et al. [11] explained the best features

for the CICIDS2017 dataset based on the attack

types. A Random forest regressor was used to

identify the best features among the 79 features

present in the dataset. For this research, 24

features with the class type feature are selected

as the best [11, 12]. The selected features include

flow duration, average packet size, active min,

active mean, flag counts, etc. [23].

4.3 Class Balance

From Table 1, it can be observed that the amount

of benign data is huge compared to the real

attack data. More than 80% of benign data

present, which can result in biased while training

the model towards the majority benign attack.

This class imbalance can also show low accuracy

58


with a high false alarm rate or over the fit result.

In this research, to reduce the class imbalance

effect dataset named Monday-

WorkingHours.pcap_ISCX containing benign

records only has not been included in the final

set of data used.

4.4 Data Normalization

All the features present in the dataset have

different ranges of values and DL model layers

act sensitivity based on the weight of data.

Therefore, data needs to be normalized to make

the learning process smoother for DL model

layers. Max-min normalization is applied to

normalize the features within the 0 to 1 range.

Values in the snippet of the dataset are the values

of selected features from [26]. A snippet of the

dataset before normalization:

98306862, 357, 71.4, 0, 3506.022, 9830686,

31100000, 54, 24600000, 316, 19700000, 0,

0.050861, 0.061033, 0, 0, 0, 1087.091, 357, 0,

235, 13009, 13009, DoS Hulk

The above snippet of the dataset after the max-

min normalization:

0.819206, 0.00035, 0.018445, 0, 0.466333,

0.081921, 0.366745, 0, 0.205, 0.000002,

0.164167, 0, 0, 0, 0, 0, 0, 0.418662, 0.00035,

0.000015, 0.003601, 0.000132, 0.000132, DoS

Hulk

4.5 Encoding Class Label to Numeric

For multi-class classification, class labels must

be changed into numeric values from string

names before passing the dataset into the DL

models. Similarly, for binary classification,

binary values must be replaced the class labels.

These new class labels are created by label

encoding that is applied instead of randomly

numbering to circumvent the biases because of

the hierarchy of the given numbers to the attack

class label. LableBinarizer() and label_binarize()

functions from sklearn are used to transform

Table 2. Class distribution after label encoding

Class

Numbers Class Names Instance

Numbers

Class 0 Benign 1950101

Class 1 Bot 1441

Class 2 DDoS 128016

Class 3 DoS Goldeneye 10286

Class 4 DoS Hulk 172849

Class 5 DoS Slowhttptest 5228

Class 6 DoS Slowloris 5385

Class 7 FTP-Patator 5933

Class 8 Heartbleed 11

Class 9 Infiltration 36

Class 10 PortScan 1958

Class 11 SSH-Patator 3217

Class 12 Web Attack-Brute Force 1470

Class 13 Web Attack-SQL Injection 21

Class 14 Web Attack-XSS 652

operations with fixed classes. The class label for

each record changes into binary values array for

all 15 classes randomly after label encoding.

Table 2 shows the class distribution after label

encoding.

4.6 Training, Validation, and Test Split

All the combined datasets are split into three

sets, training, validation, and test sets for multi-

class and binary classification. The combined

dataset with 24 selected features (including the

class label) has been divided into 1101341

(60%), 367114 (20%), and 367114 (20%)

number of instances respectively for multi-class

and binary classification. Table 3 shows the

instance number in the test set according to the

attack types.

5 METHODOLOGY

5.1 Process Design

We focus on the performances of multiple DL

techniques, DNN, CNN, Simple RNN, GRU,

LSTM, and our proposed hybrid Bidirectional-

RNN-LSTM model for binary and multi-class

classification. The design of the overall

processes for applying different DL methods and

our proposed method is depicted in Figure 1.

newmin+newminnewmaxminmax

minn=n'

59


Table 3: Test set Instances distribution according to the

attack classes

Multi- Classes

Test set instance

(Multi-class)

0 Benign 299932 1 Bot 299 2 DDoS 25718 3 DoS Goldeneye 2008 4 DoS Hulk 34458 5 DoS Slowhttptest 1067 6 DoS Slowloris 1038 7 FTP-Patator 1173 8 Heartbleed 2 9 Infiltration 9 10 PortScan 400 11 SSH-Patator 629 12 Web Attack Brute Force 250 13 Web Attack SQL Injection 4 14 Web Attack XSS 127

Binary Classes

The test set instance

(Binary class) 0 Benign 299932 1 Attack 67182

Figure 1. Design of overall processes

5.2 Algorithm

The working with the dataset and experimenting

DL methods along with our proposed method is

narrated in the following algorithm.

Algorithm Step 1: Start

Input: Prepossessed Dataset with 24

selected features and label encoded class

types

Step 2. Splitting dataset into train, validation,

and test sets

Step 3. Building the trained model and adding

multiple layers to the DL networks

Step 4. Compiling and validating the model to

calculate the accuracy and class

probabilities

Step 5. Model evaluation by the test set to get

the evaluation metrics scores and

confusion matrix

Step 6. Calculation of Receiver Operating

Characteristics (ROC)

Step 7. End.

5.3 DL Model Construction

All the DL models are built with input, hidden,

and output layers for both binary and multi-class

classification. We used TensorFlow and Keras

library through Jupiter notebook as a DL

environment. Every node in the DL layers is

fully connected. Input shape has been set to the

selected 23 attributes excluding the class

attribute that is passed through as the output

layer. The Sigmoid function is used on the final

layer for binary classification and multi-class

classification. The Softmax function is applied to

the final layer, which produces the probabilities

for all attack classes. For both classifications, the

first layer in the CNN model is followed by 32,

64, 64, and 128 nodes with two max-pooling

layers. And for Simple RNN, LSTM, GRU,

DNN models all four layers have 128 nodes. In

the Bi-RNN-LSTM model, all layers are

followed by 256 nodes. Dropout-out layers are

also applied in between DL layers to reduce the

overfitting in neural network layers.

5.4 Working Process

The working process for CNN, Simple RNN,

LSTM, GRU, DNN, and Bi-RNN-LSTM models

follows the steps summarized below.

After the input phase in step 1, the dataset is split

to train, validating and test in step 2. All the

models have been built on the training set with

multiple neural layers added then the trained

model is compiled in steps 3 and validated with

the validation set in step 4. In step 5, the model

is evaluated to get the values of the confusion

matrix, precision, recall, accuracy, and f1-score.

Finally, ROC scores are calculated in step 6 to

get the performance of the models.

60


6 RESULT and ANALYSIS

6.1 Evaluation Metrics

The primary evaluation metric is accuracy for

this research. The evaluation metrics other than

the accuracy are precision, recall, f1-score,

confusion matrix, and ROC for both binary and

multi-class classification.

The confusion matrix generally would describe

a classification model's performance on test data

with the known true values.

Table 4. Confusion Matrix

Actual Values Positive (1) Negative (0)

Predicted

Values

Positive

(1) True Positive

(TP) False Positive

(FP)

Negative

(0) False Negative

(FN) True Negative

(TN)

Table 4 is the representation of the Confusion

matrix for binary classification.

The parameters True Positive (TP) represents the

correctly predicted positive value or attack type,

False Positive (FP) is the incorrectly predicted

class or attack type, True Negative (TN) refers

accurately predicted negative values of the class

or normal type and False Negative (FN) refers to

inaccurately predicted value as normal (benign)

type [25].

Precision refers to the ratio between correctly

predicted cases and total positive predicted

cases.

Recall also alludes to a ratio between the

accurately predicted positive cases and all the

cases in the actual class.

The weighted average of precision and recall is

f1-score. The f1-score is more useful if the class

distribution is uneven, it takes counts on both

false positive and false negative.

ROC is the graphical illustration that shows the

capacity of a classification model, two

parameters used in ROC are true positive and

false positive.

6.2 DL Methods Comparison

The models are evaluated by the test set of the

data. Table 5 gives the result for binary

classification, among the models Bi-RNN-LSTM

model presents the highest precision (99%),

recall (98.69%), f1-score (98.85%), and accuracy

(99.58%).

Table 5. Test Result for Binary Classification (percentage)

DL

Method CNN

Simple

RNN LSTM GRU DNN

Bi-

RNN-

LSTM

Accuracy

(Binary) 99.51 99.45 99.42 99.35 99.46 99.58

Precision 98.66 98.61 98.76 98.54 98.17 99.00 Recall 97.63 98.40 98.03 97.92 98.59 98.69 F1 score 98.63 98.51 98.40 98.23 98.53 98.85

Accuracy scores for the multi-class classification

are shown in Table 6, where the highest 99.58%

accuracy is achieved by the Bi-RNN-LSTM

model. Hence, the Bi-RNN-LSTM model results

can be counted as an optimal result overall for

binary and multi-class classification.

Table 6. Test Result for Multi-class Classification

(Accuracy in percentage)

DL

Method CNN

Simple

RNN LSTM GRU DNN

Bi-

RNN-

LSTM

Accuracy

(Multi) 99.53 99.48 99.45 99.34 99.51 99.58

Precision, Recall, f1-score, and weighted

average values for multi-class classification can

be observed from Tables 7 to 9. It can be

observed that all the models classify benign

attacks perfectly as the value of precision, recall,

and f1-score is 1 for all cases.

Among all the six models Bi-RNN-LSTM gives

weighted average value 1 for precision, recall

and f1 score where DNN & CNN models

weighted average value for precision, recall & f1

score are 1, 1, and 0.99 respectively, and Simple

RNN, GRU & LSTM models weighted average

value is 0.99 for precision, recall, and f1-score

respectively.

61


Table 7. Precision scores for Multi-class Classification

Class CNN Simple

RNN LSTM GRU DNN

Bi-

RNN-

LSTM 0 1.00 1.00 1.00 1.00 1.00 1.00 1 1.00 0.99 0.89 0.96 1.00 0.99 2 1.00 1.00 1.00 1.00 1.00 1.00 3 0.99 0.97 0.97 0.97 0.99 0.99 4 0.98 0.99 0.98 0.97 0.99 0.99 5 0.91 0.92 0.91 0.92 0.94 0.95 6 1.00 0.98 0.96 0.98 1.00 0.98 7 0.99 1.00 0.99 0.99 0.99 0.99 8 1.00 1.00 1.00 1.00 1.00 1.00 9 1.00 0.00 0.00 1.00 1.00 1.00 10 0.98 0.98 0.92 0.98 0.94 0.99 11 1.00 1.00 1.00 1.00 1.00 1.00 12 1.00 0.73 0.67 0.77 0.65 0.62 13 0.00 0.00 0.00 0.00 0.00 0.00 14 1.00 0.00 0.00 1.00 1.00 1.00 W.

avg. 1.00 0.99 0.99 0.99 1.00 1.00

Table 8. Recall scores for Multi-class Classification

Class CNN Simple

RNN LSTM GRU DNN

Bi-

RNN-

LSTM 0 1.00 1.00 1.00 1.00 1.00 1.00 1 0.80 0.48 0.82 0.50 0.52 0.51 2 1.00 1.00 1.00 1.00 0.99 1.00 3 0.98 0.98 0.97 0.98 0.98 0.99 4 0.99 0.98 0.99 0.98 0.99 0.98 5 0.97 0.98 0.97 0.98 0.99 0.99 6 0.98 0.99 0.98 0.98 0.97 0.99 7 0.99 0.99 0.98 0.99 0.98 0.99 8 1.00 1.00 1.00 1.00 1.00 1.00 9 0.33 0.00 0.00 0.11 0.11 0.33 10 0.88 0.89 0.89 0.88 0.9 0.88 11 0.93 0.93 0.93 0.93 0.93 0.93 12 0.10 0.53 0.10 0.10 0.91 0.91 13 0.00 0.00 0.00 0.00 0.00 0.00 14 0.01 0.00 0.00 0.01 0.01 0.01 W.

avg. 1.00 0.99 0.99 0.99 1.00 1.00

6.3 Class Result Analysis

It can be seen that the precision, recall, and f1-

scores for attack classes 9, 13, and 14 have

relatively low values, almost 0 in most cases for

all the models. DNN, CNN, GRU, and Bi-RNN-

LSTM models fail to classify class 13 attacks

completely, while Simple RNN and LSTM could

not classify class 9, 13, and 14 attacks,

representing Infiltration, Web Attack-SQL

injection, and Web Attack-XSS attacks

respectively.

Table 9. F1-scores for Multi-class Classification

Class CNN Simple

RNN LSTM GRU DNN

Bi-

RNN-

LSTM 0 1.00 1.00 1.00 1.00 1.00 1.00 1 0.89 0.65 0.86 0.65 0.69 0.67 2 1.00 1.00 1.00 1.00 0.99 1.00 3 0.98 0.98 0.97 0.98 0.98 0.99 4 0.99 0.98 0.99 0.98 0.99 0.99 5 0.94 0.95 0.94 0.95 0.96 0.97 6 0.99 0.98 0.97 0.98 0.98 0.98 7 0.99 0.99 0.99 0.99 0.98 0.99 8 1.00 1.00 1.00 1.00 1.00 1.00 9 0.50 0.00 0.00 0.20 0.20 0.50 10 0.93 0.93 0.91 0.92 0.92 0.93 11 0.96 0.96 0.96 0.96 0.96 0.96 12 0.18 0.61 0.17 0.17 0.76 0.74 13 0.00 0.00 0.00 0.00 0.00 0.00 14 0.02 0.00 0.00 0.02 0.02 0.02 W.

avg. 0.99 0.99 0.99 0.99 0.99 1.00

Having imbalanced instances among the classes

and the low number of attack instances results in

such low evaluation scores for these classes and

models are unable to detect those attacks.

6.4 Proposed and Existing Method Comparison

Table 10. Accuracy Comparison with existing research

Models This research

Binary

Classification multi-class

Classification CNN 99.51 99.53 Simple RNN 99.45 99.48 LSTM 99.42 99.45 GRU 99.35 99.34 DNN 99.46 99.51 Bi-RNN-LSTM 99.58 99.58

Kayvan et al. [16] Binary

Classification DNN 96.43 DNN+BBA 97.02 DNN+BGA 96.48 DNN+BGSA 99.00

Lee et al. [19] multi-class

Classification SVM 96.8 Naive Bayes 62.1 Random Forest 97.9 Decision Tree 97.9 KNN 97.8 EP-FCNN 99.5 EP-CNN 98.8 EP-LSTM 98.6

62


Compared to the existing research findings in

[16, 19], the investigated DL models give better

performance results for both binary and multi-

class classification based on the CICIDS2017

dataset. The proposed Bi-RNN-LSTM model

produces the highest accuracy among all models.

Table 10 shows the comparison of accuracy with

the existing works.

6.5 Confusion Matrix

Figure 2. Confusion matrix for binary classification

models

The Confusion matrix for all binary

classification models is shown in Figure 2 and

for multi-class classification models, the

confusion matrix is shown in Figure 3.1 and 3.2

respectively, where the x-axis represents the

predicted label and the y-axis represents the true

label.

According to the Confusion matrices for binary

classification models, among 299932 benign

instances, CNN and Bi-RNN-LSTM models

classified 299705 and 299269 instances

correctly. Also among 67182 attack instances,

Bi-RNN-LSTM detected most numbers of

attacks by 66301.

For multi-class classification, the Bi-RNN-

LSTM model classified 299473 benign records

which seems higher than any other model as

found in the confusion matrices.

6.6 ROC

Figure 4 and Figure 5 show the ROC for all

binary and all multi-class classification

respectively and the ROC score is 1, considered

as a perfect performance.

7 CONCLUSION

Due to the huge expansion of digital

information, system automation, and the

Internet, security threats are a big concern. IDS

has a great effect to detect intrusion to the

network. Performance evaluation of several IDS

models has been done through this paper

applying DL techniques, all the models are

trained, validated, and tested on the real-world

CICIDS2017 dataset. The data preprocessing

phase is ensured to make the models work

smoothly without bias for attack classes and not

to over-fit the models. DL models are

investigated to achieve better accuracy.

We proposed the Bi-RNN-LSTM model that

showed higher accuracy, precision, recall, and

f1-score values with 99.58%, 99.00, 98.69%, and

98.85% respectfully for binary classification.

Accuracy for the multi-class classification is

99.58% achieved by that model. For attack

classes Infiltration, Web Attack SQL injection,

and Web Attack-XSS, we have low accuracy in

most cases for all the models. The imbalanced

number of instances among the classes and the

low number of attack instances result in such

low evaluation scores, which lead the models not

to be able to detect those attack classes.

63


Figure 3.1 Confusion matrix for multi-class classification

models

Figure 4. ROC Binary Classification

Figure 3.2 Confusion matrix for multi-class classification

models

Figure 5. ROC multi-class Classification

64


In the future, we are intending to work on

designing an IDS for the low numbered instances

detection among datasets. The gap observed

through this study is that the unavailability of the

labeled data for some attack types, and it would

be a beneficial investment to collect more data

on those attack types. Training new labeled

datasets with more attack types and instances

assembles to significant advances in

cybersecurity investigation.

Deep learning methods show some variations for

assessment, depending on how many times the

models needed to be a train or retrain, and

designing IDS for the low numbered instances

detection possess a great challenge, both leading

to the proliferate area for future research.

REFERENCES

[1] Vaidya, T., 2001-2013.: Survey and analysis of major

cyberattacks. arXiv preprint arXiv: 1507.06673

(2015).

[2] Chowdhury, M.M.U., Hammond, F., Konowicz, G.,

Xin, C., Wu, H. and Li, J.: A few-shot deep learning

approach for improved intrusion detection. 2017 IEEE

8th Annual Ubiquitous Computing, Electronics and

Mobile Communication Conference (UEMCON),

New York, NY, 2017, pp. 456-462 (2017).

[3] Azwar, H., et al.: Intrusion Detection in secure

network for Cybersecurity systems using Machine

Learning and Data Mining. 2018 IEEE 5th

International Conference on Engineering

Technologies and Applied Sciences (ICETAS),

Bangkok, Thailand, pp. 1-9 (2018).

[4] Shrestha, A., Mahmood, A.: Review of Deep Learning

Algorithms and Architectures. In IEEE Access, vol. 7,

pp. 53040-53065 (2019).

[5] Naseer, S., et al.: Enhanced Network Anomaly

Detection Based on Deep Neural Networks. In IEEE

Access, vol. 6, pp. 48231-48246 (2018).

[6] Karatas, G., et al.: Deep Learning in Intrusion

Detection Systems. International Congress on Big

Data, Deep Learning and Fighting Cyber Terrorism

(IBIGDELFT), Ankara, Turkey, pp. 113-116 (2018).

[7] Khan, F.A., et al.: A Novel Two-Stage Deep Learning

Model for Efficient Network Intrusion Detection. In

IEEE Access, vol. 7, pp. 30373-30385 (2019).

[8] Shone, N., et al.: A Deep Learning Approach to

Network Intrusion Detection. In IEEE Transactions on

Emerging Topics in Computational Intelligence, vol.

2, no. 1, pp. 41-50 (2018).

[9] Farahnakian, F., Heikkonen, J.: A deep auto-encoder

based approach for intrusion detection system. 20th

International Conference on Advanced

Communication Technology (ICACT), Chuncheon-si

Gangwon-do, Korea (South), pp. 1-1 (2018).

[10] Sommer, R., Paxson, V.: Outside the Closed world:

On Using Machine Learning for Network Intrusion

Detection. IEEE Symposium on Security and Privacy,

IEEE, pp. 305-316 (2010). [11] Sharafaldin, I., et al.: Toward Generating a New

Intrusion Detection Dataset and Intrusion Traffic

Characterization. 4th International Conference on

Information Systems Security and Privacy, pp. 108-

116 (2018).

[12] Sharafaldin, I., et al.: A Detailed Analysis of the

CICIDS2017 Data Set. International Conference on

Information Systems Security and Privacy, pp. 172-

188 (2019).

[13] Vinayakumar, R., Soman, K.P., Poornachandran, P.:

Evaluating effectiveness of shallow and deep

networks to intrusion detection system. International

Conference on Advances in Computing,

Communications and Informatics (ICACCI), Udupi,

2017, pp. 1282-1289 (2017).

[14] Ustebay, S., Turgut, Z., Aydin, M.A.: Cyber Attack

Detection by Using Neural Network Approaches:

Shallow Neural Network, Deep Neural Network and

AutoEncoder. Communications in Computer and

Information Science, vol 1039. Springer, Cham

(2019).

[15] Sabeel, U., Heydari, S.S., Mohanka, H., Bendhaou,

Y., Elgazzar, K., El-Khatib, K.: Evaluation of Deep

Learning in Detecting Unknown Network Attacks.

2019 International Conference on Smart Applications,

Communications and Networking (SmartNets), Sharm

El Sheik, Egypt, 2019, pp. 1-6 (2019),

[16] Atefi, K., Hashim, H., Khodadadi, T.: A Hybrid

Anomaly Classification with Deep Learning (DL) and

Binary Algorithms (BA) as Optimizer in the Intrusion

Detection System (IDS). 16th IEEE International

Colloquium on Signal Processing & Its Applications

(CSPA), Langkawi, Malaysia, 2020, pp. 29-34 (2020).

[17] Zhang, Y., Chen, X., Jin, L., Wang, X., Guo, D.:

Network Intrusion Detection: Based on Deep

Hierarchical Network and Original Flow Data. In

IEEE Access, vol. 7, pp. 37004-37016 (2019).

[18] Aksu, D., Ali Aydin, M.: Detecting Port Scan

Attempts with Comparative Analysis of Deep

Learning and Support Vector Machine Algorithms.

International Congress on Big Data, Deep Learning

and Fighting Cyber Terrorism (IBIGDELFT),

ANKARA, Turkey, 2018, pp. 77-80 (2018).

[19] Lee, J., Kim, J., Kim, I., Han, K.: Cyber Threat

Detection Based on Artificial Neural Networks Using

Event Profiles. In IEEE Access, vol. 7, pp. 165607-

165626 (2019).

[20] Chockwanich, N., Visoottiviseth, V.: Intrusion

Detection by Deep Learning with TensorFlow. 21st

International Conference on Advanced

Communication Technology (ICACT), PyeongChang

Kwangwoon_Do, Korea (South), pp. 654-659 (2019).

[21] Roopak, M., YunTian G., Chambers, J.: Deep

Learning Models for Cyber Security in IoT Networks.

2019 IEEE 9th Annual Computing and

Communication Workshop and Conference (CCWC),

Las Vegas, NV, USA, pp. 0452-0457 (2019).

65


[22] Vinayakumar, R., Dr. Soman, K.P., Poornachandran,

P.: A comparative analysis of deep learning

approaches for network intrusion detection systems

(N-IDSS): Deep learning for N-IDSs. International

Journal of Digital Crime and Forensics, vol. 11, pp.

65-89, (2019).

[23] UNB, Intrusion Detection Evaluation Dataset

(CICIDS2017). University of New Brunswick.

https://www.unb.ca/cic/datasets/ids-2017.html

[24] M. Ring, et al.: A Survey of Network-based Intrusion

Detection Data Sets. arXiv: 1903.02460 (2019).

[25] Buczak A.L., Guven, E.: A Survey of Data Mining

and Machine Learning Methods for Cyber Security

Intrusion Detection. In IEEE Communications

Surveys & Tutorials, vol. 18, no. 2, pp. 1153-1176,

Secondquarter (2016).

[26] Farhana, K., Rahman, M., Ahmed, Md.T.: An

intrusion detection system for packet and flow based

networks using deep neural network approach.

International Journal of Electrical and Computer

Engineering (IJECE), vol. 10, no. 5 (2020).

[27] Azim, M. A., Tanvir, & Islam, M. K.: Network Traffic

Classification Using Ensemble Learning with Time

Related Features. International Journal of New

Computer Architectures and Their Applications, vol.

10, no. 2, pp. 23-31, (2020).

66


https://arxiv.org/search/cs?searchtype=author&query=Ring%2C+M

https://arxiv.org/abs/1903.02460

high-performance intrusion detection system using deep

Documents