high-performance intrusion detection system using deep
TRANSCRIPT
High-Performance Intrusion Detection System using Deep Learning in Packet and
Flow-Based Networks
Kaniz Farhana
Department of Computer Science & Engineering
Port City International University, Chittagong
Bangladesh
Maqsudur Rahman
Department of Computer Science & Engineering
Port City International University, Chittagong
Bangladesh
Muhammad Anwarul Azim
Department of Computer Science & Engineering
University of Chittagong, Bangladesh
Md. Tofael Ahmed
Department of Information & Communication
Technology, Comilla University, Bangladesh
ABSTRACT The information revolution, extensive cloud
computing, and enormous network traffic have made
the security of systems from threats and attack more
crucial. Continuous monitoring of the system and
network from malicious incidents and vulnerabilities
has a great role in the prevention of software and
hardware resources. Intrusion Detection System has
become a significant aspect of the security of the
Internet and intranet where the pattern of data on
networks constantly changes with time and new
attacks. Many types of researches are concentrating
on Deep Learning (DL) methods that provide
effective solutions with great accuracy and
performance for applying to big data related to
security and privacy of network and system
automation. In this paper, we investigated the
performance of various DL techniques, Deep Neural
Network, Convolutional Neural Network, Recurrent
Neural Network (RNN), Long Short-Term Memory
(LSTM), and Gated Recurrent Unit that is trained,
validated, and tested using the CICIDS2017 dataset
with some additional metadata that contained various
fields including packet and flow-based network
traffic. Then we proposed a Hybrid Bidirectional-
RNN-LSTM model for multi-class and binary
classification in the Keras and TensorFlow DL
environments. With the selected important features,
the experimental result of our proposed method
produced more than 99% accuracy for both binary
and multi-class classification which is higher
compared to existing researches. Evaluation metrics
such as confusion matrix, precision, recall, f1-score,
and Receiver Operating Characteristics showed good
results. These outcomes adduce that DL techniques
have higher effectiveness for detecting intrusion in
the packet and flow-based networks.
KEYWORDS Packet and Flow-based Network, Intrusion Detection
System, Deep Learning, Deep Neural Network,
Convolutional Neural Network, Simple Recurrent
Neural Network, Gated Recurrent Unit, Long Short-
Term Memory, Hybrid Bidirectional-RNN-LSTM,
Big data, Keras, and Google TensorFlow.
1 INTRODUCTION
1.1 Background
Due to the recent development in network
communications, the threats from different types
of sources have been increased a great deal
which makes the use of information and
communication technology more vulnerable [1].
Computer networks can be investigated from
many different views. Analyzing the network
traffic in both attacks as well as normal nature
should be essential from the security point of
view. Therefore, capturing the network traffic in
packet and flow format is necessary for the
identification of Intrusion into the system. In
other words, intrusion detection and prevention
can be possible when the traffic is available in
the packet and flow-based format from the
network.
1.2 Intrusion Detection System
Intrusion Detection System (IDS) poses a great
significance to track the presence of the threats.
Researchers take a massive interest in
cybersecurity to investigate a more optimal way
55
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
of handling attacks through intrusion detection.
Research about IDS takes the challenge of
various real-world attack scenarios and detects
intrusions efficiently. An IDS follows the
network flows or packet patterns of the data to
ensure the attack. Host-based IDS (H-IDS) and
Network-based IDS (N-IDS) are two types of
IDS [2]. H-IDS collects information from hosts,
computers, etc. and N-IDS collects raw data
packets from networks. Leveraging the intrusion
detection technique, anomaly and misuse
detection systems are also a matter of concern
[2]. Anomaly detection detects pattern changes
in the host system or the network traffics and
misuse detection finds abnormalities in both
already known benign attacks & past attack
signatures.
The increasing amount of real-time attacks
makes it much difficult to handle those and show
better performance using traditional intrusion
detection approaches including Machine
Learning (ML) models [3, 27].
1.3 Deep Learning
Deep Learning, the subset of ML is based on
Artificial Intelligence. DL approaches include
unsupervised, supervised, or semi-supervised
learning algorithms to construct output features
and attack predictions [4]. The application of
various DL models on big datasets with different
attack scenarios and performance evaluation has
become a popular topic for IDS research [5-9].
DL techniques learn the variation of the
attributes to show great performance on
imbalanced datasets. DL works with multiple
layer transformation and each next layer learns
from its previous layer.
In the DL, Convolutional Neural Network
(CNN) is a kind of deep neural network which is
build up with neurons, where neurons have
learnable weights. In CNN every neuron
receives various inputs, then calculates their
weighted sum, and passes it by an activation
function to give the output in the final layer.
CNN produces high-quality representations. A
Deep Neural Network (DNN) is an artificial
neural network with a collection of neurons
where all the neurons are sequences of multiple
layers. Neurons receive inputs from the previous
neuron and perform simple computation. The
DNN uses exact mathematical manipulation to
pivot the input into the output. Recurrent Neural
Networks (RNN) is a feed-forward network that
has an internal memory that processes sequences
of inputs. It has a recurrent nature because of
using the same function for each input data when
the current output depends on the previous
computation. All the inputs in the RNN are
related to each other. It is very useful for time
series and sequential data. Long Short-Term
Memory (LSTM) is the modified version of
RNN. In LSTM, the vanishing gradient problem
of RNN is solved. LSTM trains the model using
back-propagation and is suitable for processing,
classification, and prediction. It is an
unsupervised learning model while it is trained
using supervised learning models. Gated
Recurrent Unit (GRU) is a gating mechanism in
RNN similar to the LSTM unit without an output
gate. GRU can solve the vanishing gradient
problem by using an update gate and a reset gate.
GRU uses less memory and trains fast as it also
uses fewer parameters for training. Two
independent RNNs are put together in
Bidirectional RNN (Bi-RNN). This network has
both forward and backward data from the
sequence in every step. In RNN the future input
cannot extend from the present state, although in
Bi-RNN the current state can hold out to the
future input. Updating both input and output
layers does not happen at the same time.
Therefore, when back-propagation is executing
additional processing is needed.
1.4 Application Area
There are not many datasets available with valid
and versatile scenarios for detecting intrusion in
the packet and flow-based networks [10].
Among the datasets available over the year, the
CICIDS2017 dataset [11, 12] published from the
University of New Brunswick comes with
several features that have a great influence on
classification and prediction.
1.5 Contribution This paper focuses on intrusion detection using
several DL techniques applied on publicly
available CICIDS2017 datasets for both multi-
class and binary classification. We investigate
multiple DL techniques such as CNN, DNN,
Simple RNN, LSTM, GRU, and Bidirectional
RNN. After that, we have proposed a Hybrid Bi-
56
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
RNN-LSTM model for binary and multi-class
classification. We construct this hybrid DL
model by combining the LSTM layer with
Bidirectional RNN in the final layer to achieve
higher accuracy using Keras and Google
TensorFlow libraries for creating a DL
environment.
1.6 Structure of Paper
The contents of the paper are presented as
follows. Section 2 gives insight into some of the
recent researches that are related to this paper.
Section 3 explains the dataset used in the paper.
The next section 4 is about different techniques
that are applied for data preprocessing. The
methodology is described in section 5. In section
6, test results and performance analysis are
discussed. Finally, section 7 concludes the paper
with remarks on future work.
2 RELATED WORK
Intrusion detection and performance analysis of
DL drew the attention of researchers due to the
variety in attacks and continued changes in the
attacking environment. Vinayakumar et al. [13]
implemented Multilayer Perceptron (MLP) and
Deep Belief Net (DBN) for the intrusion
detection system. They did a comprehensive
review among various classical ML techniques,
MLP and DBN using benchmark datasets
KDDcup'99 and NSLKDD datasets where good
performances were shown.
Serpil et al. [14] presented different approaches
of DNN, Shallow Neural Network, and Auto
Encoder to detect malicious activities using the
CICIDS201 dataset. Their research exhibited
differences for different sets of selected features.
With all features, it resulted in 98.4% accuracy.
Ulya et al. [15] proposed LSTM and DNN
binary prediction for DoS and DDoS attacks.
They trained their model on the CICIDS2017
dataset, got 99.9%, and 99.8% true positive rates
for LSTM and DNN models respectively.
Kayvan et al. [16] published an IDS for hybrid
anomaly classification using multiple DL
techniques with a binary algorithm optimizer
using the CICIDS2017 dataset for the
performance measure. DNN, Binary Bat
Algorithm (BBA), Binary Genetic Algorithm
(BGA), and Binary Gravitational Search
Algorithm (BGSA) methods displayed 96.427%,
97.023%, 96.480%, and 99.002% accuracy
respectively.
Zang et al. [17] proposed hierarchical network
intrusion detection models designed with LSTM
and CNN mainly, which learned temporal and
spatial features from the real data. For the
experiment, they used CICIDS2017 and CTU
datasets, both datasets showed their proposed
models produced 99% accuracy, very high
scores for precision, recall, and f1-score.
Aksu et al. [18] performed a comparative
analysis of DL and ML methods applying CNN
and Support Vector Machine (SVM) algorithms
on the CICIDS2017 dataset to detect port scan
attacks. Their models achieved 97.8% and
69.79% accuracy for DL and ML respectively.
An AI-SIEM method was proposed in [19] by
Lee et al., which focused on the true positive and
false-positive rates. They used the combination
of NSLKDD and CICIDS2017 datasets for
experimental purposes. Performance evaluation
among ML and proposed DL models was
conducted to show that their models gave better
results and outperformed ML models.
Navaporn et al. [20] proposed IDS using
TensorFlow to detect popular attacks named
DoS, port scan, network scan via ICMP, UDP,
and TCP. MAWILab'2017 dataset was used to
train and evaluate the performance for Snort,
RNN, Stacked RNN, and CNN models. Their
evaluation results rendered that deep learning
models performed better than the Snort model.
Roopak et al. [21] proposed four different DL
models and compared them to ML algorithms
and evaluated DDoS attack scenarios from the
CICIDS2017 dataset and found 97.16% accuracy
for the combined CNN and LSTM model.
A comparative analysis was conducted on deep
learning techniques in [22] by Vinayakumar et
al. This model used network traffic data as time
series and implemented several DL models such
as LSTM, RNN, and Internal Recurrent Neural
57
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
Networks (IRNN) using KDDcup99, NSLKDD,
and UNSW-NB15 datasets. RNN and IRNN
performed better.
In this paper, we are focusing on the comparative
analysis of DL techniques CNN, Simple RNN,
LSTM, GRU, and DNN applied to the
CICIDS2017 dataset. Finally, we propose Bi-
RNN-LSTM that has illustrated great accuracy
compared to existing research results and has
performed superior concerning the precision,
recall, and f1-score.
3 DATASET
The lack of sufficient types of network traffic
patterns including different kinds of attacks is a
vital issue with most of the IDS datasets. Our
paper uses a benchmark dataset CICIDS2017 for
evaluation and comparison [11, 12]. It is an
open-source dataset, introduced by the Canadian
Institute for Cybersecurity. This dataset has a
collection of 79 features with benign traffic and
14 attacks traffic spanned over Monday to Friday
for different sets of attacks within a fixed period
[23]. The dataset is fully labeled with the packet
and bidirectional flow-based records with
additional metadata [24], thereby, suitable for
our area of application. The second column of
Table 1 shows the original number of records or
instances present in the dataset according to the
attack types. The 14 attack types and the benign
attacks are regarded as 1 and 0 respectively for
binary classification.
4 DATA PREPROCESSING
DL is connected with data, and to achieve the
best performance data preprocessing is essential
for the transformation of unrefined data into a
reliable and compatible format. Data
preprocessing reduces the ambiguities in the
dataset.
4.1 Duplicate Removal
Duplicate values are removed from the dataset
for both binary and multi-class classification to
get more precise evaluation results, which
reduces the total instances number for the model.
The third column of Table 1 shows the instance
number present in the dataset after removing
duplicates.
Table 1. CICIDS2017 Dataset instances and duplicates
Attack Types
Original Number
of Instances
(Before
Removing
Duplicates)
Number of
Instances (After
Removing
Duplicates)
Benign 2273097 1950101 DDoS 128027 128016 Port Scan 158930 1958 Bot 1966 1441 Infiltration 36 36 Web Attack-
Brute Force 1507 1470
Web Attack-SQL
Injection 21 21
Web Attack-XSS 652 652 FTP Patator 7938 5933 SSH Patator 5897 3217 Heartbleed 11 11 DoS Goldeneye 10293 10286 DoS Hulk 231073 172849 DoS slowhttptest 5499 5228
Duplicate values are removed from the CSV files
using spreadsheet software before feature
selection, class balancing and data
normalization.
4.2. Feature Selection
While working with high-dimensional data,
training time is affected by the number of
features and increases the chance of overfitting.
This can be avoided by the feature selection.
Sharafaldin et al. [11] explained the best features
for the CICIDS2017 dataset based on the attack
types. A Random forest regressor was used to
identify the best features among the 79 features
present in the dataset. For this research, 24
features with the class type feature are selected
as the best [11, 12]. The selected features include
flow duration, average packet size, active min,
active mean, flag counts, etc. [23].
4.3 Class Balance
From Table 1, it can be observed that the amount
of benign data is huge compared to the real
attack data. More than 80% of benign data
present, which can result in biased while training
the model towards the majority benign attack.
This class imbalance can also show low accuracy
58
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
with a high false alarm rate or over the fit result.
In this research, to reduce the class imbalance
effect dataset named Monday-
WorkingHours.pcap_ISCX containing benign
records only has not been included in the final
set of data used.
4.4 Data Normalization
All the features present in the dataset have
different ranges of values and DL model layers
act sensitivity based on the weight of data.
Therefore, data needs to be normalized to make
the learning process smoother for DL model
layers. Max-min normalization is applied to
normalize the features within the 0 to 1 range.
Values in the snippet of the dataset are the values
of selected features from [26]. A snippet of the
dataset before normalization:
98306862, 357, 71.4, 0, 3506.022, 9830686,
31100000, 54, 24600000, 316, 19700000, 0,
0.050861, 0.061033, 0, 0, 0, 1087.091, 357, 0,
235, 13009, 13009, DoS Hulk
The above snippet of the dataset after the max-
min normalization:
0.819206, 0.00035, 0.018445, 0, 0.466333,
0.081921, 0.366745, 0, 0.205, 0.000002,
0.164167, 0, 0, 0, 0, 0, 0, 0.418662, 0.00035,
0.000015, 0.003601, 0.000132, 0.000132, DoS
Hulk
4.5 Encoding Class Label to Numeric
For multi-class classification, class labels must
be changed into numeric values from string
names before passing the dataset into the DL
models. Similarly, for binary classification,
binary values must be replaced the class labels.
These new class labels are created by label
encoding that is applied instead of randomly
numbering to circumvent the biases because of
the hierarchy of the given numbers to the attack
class label. LableBinarizer() and label_binarize()
functions from sklearn are used to transform
Table 2. Class distribution after label encoding
Class
Numbers Class Names Instance
Numbers
Class 0 Benign 1950101
Class 1 Bot 1441
Class 2 DDoS 128016
Class 3 DoS Goldeneye 10286
Class 4 DoS Hulk 172849
Class 5 DoS Slowhttptest 5228
Class 6 DoS Slowloris 5385
Class 7 FTP-Patator 5933
Class 8 Heartbleed 11
Class 9 Infiltration 36
Class 10 PortScan 1958
Class 11 SSH-Patator 3217
Class 12 Web Attack-Brute Force 1470
Class 13 Web Attack-SQL Injection 21
Class 14 Web Attack-XSS 652
operations with fixed classes. The class label for
each record changes into binary values array for
all 15 classes randomly after label encoding.
Table 2 shows the class distribution after label
encoding.
4.6 Training, Validation, and Test Split
All the combined datasets are split into three
sets, training, validation, and test sets for multi-
class and binary classification. The combined
dataset with 24 selected features (including the
class label) has been divided into 1101341
(60%), 367114 (20%), and 367114 (20%)
number of instances respectively for multi-class
and binary classification. Table 3 shows the
instance number in the test set according to the
attack types.
5 METHODOLOGY
5.1 Process Design
We focus on the performances of multiple DL
techniques, DNN, CNN, Simple RNN, GRU,
LSTM, and our proposed hybrid Bidirectional-
RNN-LSTM model for binary and multi-class
classification. The design of the overall
processes for applying different DL methods and
our proposed method is depicted in Figure 1.
newmin+newminnewmaxminmax
minn=n'
59
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
Table 3: Test set Instances distribution according to the
attack classes
Multi- Classes
Test set instance
(Multi-class)
0 Benign 299932 1 Bot 299 2 DDoS 25718 3 DoS Goldeneye 2008 4 DoS Hulk 34458 5 DoS Slowhttptest 1067 6 DoS Slowloris 1038 7 FTP-Patator 1173 8 Heartbleed 2 9 Infiltration 9 10 PortScan 400 11 SSH-Patator 629 12 Web Attack Brute Force 250 13 Web Attack SQL Injection 4 14 Web Attack XSS 127
Binary Classes
The test set instance
(Binary class) 0 Benign 299932 1 Attack 67182
Figure 1. Design of overall processes
5.2 Algorithm
The working with the dataset and experimenting
DL methods along with our proposed method is
narrated in the following algorithm.
Algorithm Step 1: Start
Input: Prepossessed Dataset with 24
selected features and label encoded class
types
Step 2. Splitting dataset into train, validation,
and test sets
Step 3. Building the trained model and adding
multiple layers to the DL networks
Step 4. Compiling and validating the model to
calculate the accuracy and class
probabilities
Step 5. Model evaluation by the test set to get
the evaluation metrics scores and
confusion matrix
Step 6. Calculation of Receiver Operating
Characteristics (ROC)
Step 7. End.
5.3 DL Model Construction
All the DL models are built with input, hidden,
and output layers for both binary and multi-class
classification. We used TensorFlow and Keras
library through Jupiter notebook as a DL
environment. Every node in the DL layers is
fully connected. Input shape has been set to the
selected 23 attributes excluding the class
attribute that is passed through as the output
layer. The Sigmoid function is used on the final
layer for binary classification and multi-class
classification. The Softmax function is applied to
the final layer, which produces the probabilities
for all attack classes. For both classifications, the
first layer in the CNN model is followed by 32,
64, 64, and 128 nodes with two max-pooling
layers. And for Simple RNN, LSTM, GRU,
DNN models all four layers have 128 nodes. In
the Bi-RNN-LSTM model, all layers are
followed by 256 nodes. Dropout-out layers are
also applied in between DL layers to reduce the
overfitting in neural network layers.
5.4 Working Process
The working process for CNN, Simple RNN,
LSTM, GRU, DNN, and Bi-RNN-LSTM models
follows the steps summarized below.
After the input phase in step 1, the dataset is split
to train, validating and test in step 2. All the
models have been built on the training set with
multiple neural layers added then the trained
model is compiled in steps 3 and validated with
the validation set in step 4. In step 5, the model
is evaluated to get the values of the confusion
matrix, precision, recall, accuracy, and f1-score.
Finally, ROC scores are calculated in step 6 to
get the performance of the models.
60
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
6 RESULT and ANALYSIS
6.1 Evaluation Metrics
The primary evaluation metric is accuracy for
this research. The evaluation metrics other than
the accuracy are precision, recall, f1-score,
confusion matrix, and ROC for both binary and
multi-class classification.
The confusion matrix generally would describe
a classification model's performance on test data
with the known true values.
Table 4. Confusion Matrix
Actual Values Positive (1) Negative (0)
Predicted
Values
Positive
(1) True Positive
(TP) False Positive
(FP)
Negative
(0) False Negative
(FN) True Negative
(TN)
Table 4 is the representation of the Confusion
matrix for binary classification.
The parameters True Positive (TP) represents the
correctly predicted positive value or attack type,
False Positive (FP) is the incorrectly predicted
class or attack type, True Negative (TN) refers
accurately predicted negative values of the class
or normal type and False Negative (FN) refers to
inaccurately predicted value as normal (benign)
type [25].
Precision refers to the ratio between correctly
predicted cases and total positive predicted
cases.
Recall also alludes to a ratio between the
accurately predicted positive cases and all the
cases in the actual class.
The weighted average of precision and recall is
f1-score. The f1-score is more useful if the class
distribution is uneven, it takes counts on both
false positive and false negative.
ROC is the graphical illustration that shows the
capacity of a classification model, two
parameters used in ROC are true positive and
false positive.
6.2 DL Methods Comparison
The models are evaluated by the test set of the
data. Table 5 gives the result for binary
classification, among the models Bi-RNN-LSTM
model presents the highest precision (99%),
recall (98.69%), f1-score (98.85%), and accuracy
(99.58%).
Table 5. Test Result for Binary Classification (percentage)
DL
Method CNN
Simple
RNN LSTM GRU DNN
Bi-
RNN-
LSTM
Accuracy
(Binary) 99.51 99.45 99.42 99.35 99.46 99.58
Precision 98.66 98.61 98.76 98.54 98.17 99.00 Recall 97.63 98.40 98.03 97.92 98.59 98.69 F1 score 98.63 98.51 98.40 98.23 98.53 98.85
Accuracy scores for the multi-class classification
are shown in Table 6, where the highest 99.58%
accuracy is achieved by the Bi-RNN-LSTM
model. Hence, the Bi-RNN-LSTM model results
can be counted as an optimal result overall for
binary and multi-class classification.
Table 6. Test Result for Multi-class Classification
(Accuracy in percentage)
DL
Method CNN
Simple
RNN LSTM GRU DNN
Bi-
RNN-
LSTM
Accuracy
(Multi) 99.53 99.48 99.45 99.34 99.51 99.58
Precision, Recall, f1-score, and weighted
average values for multi-class classification can
be observed from Tables 7 to 9. It can be
observed that all the models classify benign
attacks perfectly as the value of precision, recall,
and f1-score is 1 for all cases.
Among all the six models Bi-RNN-LSTM gives
weighted average value 1 for precision, recall
and f1 score where DNN & CNN models
weighted average value for precision, recall & f1
score are 1, 1, and 0.99 respectively, and Simple
RNN, GRU & LSTM models weighted average
value is 0.99 for precision, recall, and f1-score
respectively.
61
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
Table 7. Precision scores for Multi-class Classification
Class CNN Simple
RNN LSTM GRU DNN
Bi-
RNN-
LSTM 0 1.00 1.00 1.00 1.00 1.00 1.00 1 1.00 0.99 0.89 0.96 1.00 0.99 2 1.00 1.00 1.00 1.00 1.00 1.00 3 0.99 0.97 0.97 0.97 0.99 0.99 4 0.98 0.99 0.98 0.97 0.99 0.99 5 0.91 0.92 0.91 0.92 0.94 0.95 6 1.00 0.98 0.96 0.98 1.00 0.98 7 0.99 1.00 0.99 0.99 0.99 0.99 8 1.00 1.00 1.00 1.00 1.00 1.00 9 1.00 0.00 0.00 1.00 1.00 1.00 10 0.98 0.98 0.92 0.98 0.94 0.99 11 1.00 1.00 1.00 1.00 1.00 1.00 12 1.00 0.73 0.67 0.77 0.65 0.62 13 0.00 0.00 0.00 0.00 0.00 0.00 14 1.00 0.00 0.00 1.00 1.00 1.00 W.
avg. 1.00 0.99 0.99 0.99 1.00 1.00
Table 8. Recall scores for Multi-class Classification
Class CNN Simple
RNN LSTM GRU DNN
Bi-
RNN-
LSTM 0 1.00 1.00 1.00 1.00 1.00 1.00 1 0.80 0.48 0.82 0.50 0.52 0.51 2 1.00 1.00 1.00 1.00 0.99 1.00 3 0.98 0.98 0.97 0.98 0.98 0.99 4 0.99 0.98 0.99 0.98 0.99 0.98 5 0.97 0.98 0.97 0.98 0.99 0.99 6 0.98 0.99 0.98 0.98 0.97 0.99 7 0.99 0.99 0.98 0.99 0.98 0.99 8 1.00 1.00 1.00 1.00 1.00 1.00 9 0.33 0.00 0.00 0.11 0.11 0.33 10 0.88 0.89 0.89 0.88 0.9 0.88 11 0.93 0.93 0.93 0.93 0.93 0.93 12 0.10 0.53 0.10 0.10 0.91 0.91 13 0.00 0.00 0.00 0.00 0.00 0.00 14 0.01 0.00 0.00 0.01 0.01 0.01 W.
avg. 1.00 0.99 0.99 0.99 1.00 1.00
6.3 Class Result Analysis
It can be seen that the precision, recall, and f1-
scores for attack classes 9, 13, and 14 have
relatively low values, almost 0 in most cases for
all the models. DNN, CNN, GRU, and Bi-RNN-
LSTM models fail to classify class 13 attacks
completely, while Simple RNN and LSTM could
not classify class 9, 13, and 14 attacks,
representing Infiltration, Web Attack-SQL
injection, and Web Attack-XSS attacks
respectively.
Table 9. F1-scores for Multi-class Classification
Class CNN Simple
RNN LSTM GRU DNN
Bi-
RNN-
LSTM 0 1.00 1.00 1.00 1.00 1.00 1.00 1 0.89 0.65 0.86 0.65 0.69 0.67 2 1.00 1.00 1.00 1.00 0.99 1.00 3 0.98 0.98 0.97 0.98 0.98 0.99 4 0.99 0.98 0.99 0.98 0.99 0.99 5 0.94 0.95 0.94 0.95 0.96 0.97 6 0.99 0.98 0.97 0.98 0.98 0.98 7 0.99 0.99 0.99 0.99 0.98 0.99 8 1.00 1.00 1.00 1.00 1.00 1.00 9 0.50 0.00 0.00 0.20 0.20 0.50 10 0.93 0.93 0.91 0.92 0.92 0.93 11 0.96 0.96 0.96 0.96 0.96 0.96 12 0.18 0.61 0.17 0.17 0.76 0.74 13 0.00 0.00 0.00 0.00 0.00 0.00 14 0.02 0.00 0.00 0.02 0.02 0.02 W.
avg. 0.99 0.99 0.99 0.99 0.99 1.00
Having imbalanced instances among the classes
and the low number of attack instances results in
such low evaluation scores for these classes and
models are unable to detect those attacks.
6.4 Proposed and Existing Method Comparison
Table 10. Accuracy Comparison with existing research
Models This research
Binary
Classification multi-class
Classification CNN 99.51 99.53 Simple RNN 99.45 99.48 LSTM 99.42 99.45 GRU 99.35 99.34 DNN 99.46 99.51 Bi-RNN-LSTM 99.58 99.58
Kayvan et al. [16] Binary
Classification DNN 96.43 DNN+BBA 97.02 DNN+BGA 96.48 DNN+BGSA 99.00
Lee et al. [19] multi-class
Classification SVM 96.8 Naive Bayes 62.1 Random Forest 97.9 Decision Tree 97.9 KNN 97.8 EP-FCNN 99.5 EP-CNN 98.8 EP-LSTM 98.6
62
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
Compared to the existing research findings in
[16, 19], the investigated DL models give better
performance results for both binary and multi-
class classification based on the CICIDS2017
dataset. The proposed Bi-RNN-LSTM model
produces the highest accuracy among all models.
Table 10 shows the comparison of accuracy with
the existing works.
6.5 Confusion Matrix
Figure 2. Confusion matrix for binary classification
models
The Confusion matrix for all binary
classification models is shown in Figure 2 and
for multi-class classification models, the
confusion matrix is shown in Figure 3.1 and 3.2
respectively, where the x-axis represents the
predicted label and the y-axis represents the true
label.
According to the Confusion matrices for binary
classification models, among 299932 benign
instances, CNN and Bi-RNN-LSTM models
classified 299705 and 299269 instances
correctly. Also among 67182 attack instances,
Bi-RNN-LSTM detected most numbers of
attacks by 66301.
For multi-class classification, the Bi-RNN-
LSTM model classified 299473 benign records
which seems higher than any other model as
found in the confusion matrices.
6.6 ROC
Figure 4 and Figure 5 show the ROC for all
binary and all multi-class classification
respectively and the ROC score is 1, considered
as a perfect performance.
7 CONCLUSION
Due to the huge expansion of digital
information, system automation, and the
Internet, security threats are a big concern. IDS
has a great effect to detect intrusion to the
network. Performance evaluation of several IDS
models has been done through this paper
applying DL techniques, all the models are
trained, validated, and tested on the real-world
CICIDS2017 dataset. The data preprocessing
phase is ensured to make the models work
smoothly without bias for attack classes and not
to over-fit the models. DL models are
investigated to achieve better accuracy.
We proposed the Bi-RNN-LSTM model that
showed higher accuracy, precision, recall, and
f1-score values with 99.58%, 99.00, 98.69%, and
98.85% respectfully for binary classification.
Accuracy for the multi-class classification is
99.58% achieved by that model. For attack
classes Infiltration, Web Attack SQL injection,
and Web Attack-XSS, we have low accuracy in
most cases for all the models. The imbalanced
number of instances among the classes and the
low number of attack instances result in such
low evaluation scores, which lead the models not
to be able to detect those attack classes.
63
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
Figure 3.1 Confusion matrix for multi-class classification
models
Figure 4. ROC Binary Classification
Figure 3.2 Confusion matrix for multi-class classification
models
Figure 5. ROC multi-class Classification
64
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
In the future, we are intending to work on
designing an IDS for the low numbered instances
detection among datasets. The gap observed
through this study is that the unavailability of the
labeled data for some attack types, and it would
be a beneficial investment to collect more data
on those attack types. Training new labeled
datasets with more attack types and instances
assembles to significant advances in
cybersecurity investigation.
Deep learning methods show some variations for
assessment, depending on how many times the
models needed to be a train or retrain, and
designing IDS for the low numbered instances
detection possess a great challenge, both leading
to the proliferate area for future research.
REFERENCES
[1] Vaidya, T., 2001-2013.: Survey and analysis of major
cyberattacks. arXiv preprint arXiv: 1507.06673
(2015).
[2] Chowdhury, M.M.U., Hammond, F., Konowicz, G.,
Xin, C., Wu, H. and Li, J.: A few-shot deep learning
approach for improved intrusion detection. 2017 IEEE
8th Annual Ubiquitous Computing, Electronics and
Mobile Communication Conference (UEMCON),
New York, NY, 2017, pp. 456-462 (2017).
[3] Azwar, H., et al.: Intrusion Detection in secure
network for Cybersecurity systems using Machine
Learning and Data Mining. 2018 IEEE 5th
International Conference on Engineering
Technologies and Applied Sciences (ICETAS),
Bangkok, Thailand, pp. 1-9 (2018).
[4] Shrestha, A., Mahmood, A.: Review of Deep Learning
Algorithms and Architectures. In IEEE Access, vol. 7,
pp. 53040-53065 (2019).
[5] Naseer, S., et al.: Enhanced Network Anomaly
Detection Based on Deep Neural Networks. In IEEE
Access, vol. 6, pp. 48231-48246 (2018).
[6] Karatas, G., et al.: Deep Learning in Intrusion
Detection Systems. International Congress on Big
Data, Deep Learning and Fighting Cyber Terrorism
(IBIGDELFT), Ankara, Turkey, pp. 113-116 (2018).
[7] Khan, F.A., et al.: A Novel Two-Stage Deep Learning
Model for Efficient Network Intrusion Detection. In
IEEE Access, vol. 7, pp. 30373-30385 (2019).
[8] Shone, N., et al.: A Deep Learning Approach to
Network Intrusion Detection. In IEEE Transactions on
Emerging Topics in Computational Intelligence, vol.
2, no. 1, pp. 41-50 (2018).
[9] Farahnakian, F., Heikkonen, J.: A deep auto-encoder
based approach for intrusion detection system. 20th
International Conference on Advanced
Communication Technology (ICACT), Chuncheon-si
Gangwon-do, Korea (South), pp. 1-1 (2018).
[10] Sommer, R., Paxson, V.: Outside the Closed world:
On Using Machine Learning for Network Intrusion
Detection. IEEE Symposium on Security and Privacy,
IEEE, pp. 305-316 (2010). [11] Sharafaldin, I., et al.: Toward Generating a New
Intrusion Detection Dataset and Intrusion Traffic
Characterization. 4th International Conference on
Information Systems Security and Privacy, pp. 108-
116 (2018).
[12] Sharafaldin, I., et al.: A Detailed Analysis of the
CICIDS2017 Data Set. International Conference on
Information Systems Security and Privacy, pp. 172-
188 (2019).
[13] Vinayakumar, R., Soman, K.P., Poornachandran, P.:
Evaluating effectiveness of shallow and deep
networks to intrusion detection system. International
Conference on Advances in Computing,
Communications and Informatics (ICACCI), Udupi,
2017, pp. 1282-1289 (2017).
[14] Ustebay, S., Turgut, Z., Aydin, M.A.: Cyber Attack
Detection by Using Neural Network Approaches:
Shallow Neural Network, Deep Neural Network and
AutoEncoder. Communications in Computer and
Information Science, vol 1039. Springer, Cham
(2019).
[15] Sabeel, U., Heydari, S.S., Mohanka, H., Bendhaou,
Y., Elgazzar, K., El-Khatib, K.: Evaluation of Deep
Learning in Detecting Unknown Network Attacks.
2019 International Conference on Smart Applications,
Communications and Networking (SmartNets), Sharm
El Sheik, Egypt, 2019, pp. 1-6 (2019),
[16] Atefi, K., Hashim, H., Khodadadi, T.: A Hybrid
Anomaly Classification with Deep Learning (DL) and
Binary Algorithms (BA) as Optimizer in the Intrusion
Detection System (IDS). 16th IEEE International
Colloquium on Signal Processing & Its Applications
(CSPA), Langkawi, Malaysia, 2020, pp. 29-34 (2020).
[17] Zhang, Y., Chen, X., Jin, L., Wang, X., Guo, D.:
Network Intrusion Detection: Based on Deep
Hierarchical Network and Original Flow Data. In
IEEE Access, vol. 7, pp. 37004-37016 (2019).
[18] Aksu, D., Ali Aydin, M.: Detecting Port Scan
Attempts with Comparative Analysis of Deep
Learning and Support Vector Machine Algorithms.
International Congress on Big Data, Deep Learning
and Fighting Cyber Terrorism (IBIGDELFT),
ANKARA, Turkey, 2018, pp. 77-80 (2018).
[19] Lee, J., Kim, J., Kim, I., Han, K.: Cyber Threat
Detection Based on Artificial Neural Networks Using
Event Profiles. In IEEE Access, vol. 7, pp. 165607-
165626 (2019).
[20] Chockwanich, N., Visoottiviseth, V.: Intrusion
Detection by Deep Learning with TensorFlow. 21st
International Conference on Advanced
Communication Technology (ICACT), PyeongChang
Kwangwoon_Do, Korea (South), pp. 654-659 (2019).
[21] Roopak, M., YunTian G., Chambers, J.: Deep
Learning Models for Cyber Security in IoT Networks.
2019 IEEE 9th Annual Computing and
Communication Workshop and Conference (CCWC),
Las Vegas, NV, USA, pp. 0452-0457 (2019).
65
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011
[22] Vinayakumar, R., Dr. Soman, K.P., Poornachandran,
P.: A comparative analysis of deep learning
approaches for network intrusion detection systems
(N-IDSS): Deep learning for N-IDSs. International
Journal of Digital Crime and Forensics, vol. 11, pp.
65-89, (2019).
[23] UNB, Intrusion Detection Evaluation Dataset
(CICIDS2017). University of New Brunswick.
https://www.unb.ca/cic/datasets/ids-2017.html
[24] M. Ring, et al.: A Survey of Network-based Intrusion
Detection Data Sets. arXiv: 1903.02460 (2019).
[25] Buczak A.L., Guven, E.: A Survey of Data Mining
and Machine Learning Methods for Cyber Security
Intrusion Detection. In IEEE Communications
Surveys & Tutorials, vol. 18, no. 2, pp. 1153-1176,
Secondquarter (2016).
[26] Farhana, K., Rahman, M., Ahmed, Md.T.: An
intrusion detection system for packet and flow based
networks using deep neural network approach.
International Journal of Electrical and Computer
Engineering (IJECE), vol. 10, no. 5 (2020).
[27] Azim, M. A., Tanvir, & Islam, M. K.: Network Traffic
Classification Using Ensemble Learning with Time
Related Features. International Journal of New
Computer Architectures and Their Applications, vol.
10, no. 2, pp. 23-31, (2020).
66
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 10(2): 55-66The Society of Digital Information and Wireless Communications (SDIWC), 2021 ISSN: 2305-0011