multiagent artificial immune system for network intrusion detection
DESCRIPTION
Multi-Agent artificial immune system for network intrusion detectionTRANSCRIPT
Under Supervision of
Prof. Sanaa El-Ola HanafiProf. Aboul ella Hassanien*
Faculty of Computers and Information, Cairo University
Cairo UniversityFaculty of Computers and InformationInformation Technology Department
MultiAgent artificial immune system for network intrusion detection
Amira Sayed Abdel-Aziz*
• Scientific Research Group in Egypt (SRGE) http://www.egyptscience.net
2
Agenda Introduction
Problem Definition Motivation
Preliminaries Network Intrusion Detection Artificial Immune Systems Negative Selection Algorithm MultiAgent Systems
Proposed Approaches Results and Discussion Conclusions and Future Work
3
Introduction
4
Introduction
Network and information security
are of high importance.
Research is continuous in these fields to keep up
with the increasing complexity of attacks.
Intrusion Detection is a major research area that:
Aims to identify suspicious activities in a monitored
system,
from authorized and unauthorized users,
by monitoring and analyzing the system activities.
5
Problem Definition
Problems with anomaly intrusion detection
Can’t give much details of detected anomalies.
High false alarm rate.
Centralization problem for network intrusion
detection systems – having a single point of
failure.
6
Motivation
Similarity between anomaly intrusion
detection system and immunity system.
Applying Negative Selection Algorithm, where
it is better and more efficient to define what is
normal than to define what is anomalous.
Solving problems mentioned in anomaly
intrusion detection system.
7
Motivation
Combining multiple techniques to build the
system, as a single technique is not enough
for best results.
Building a multiagent system as a distributed
system to replace centralized intrusion
detection system.
8
In this thesis, a multi-agent
anomaly network intrusion
detection system is implemented,
inspired by biological immunity, to
detect and classify network
attacks.
9
Preliminaries
10
Preliminaries – Network Intrusion Detection An Intrusion Detection System (IDS) is a
system built to detect outside and inside intruders to an environment by collecting and analyzing its behavior data.
11
Preliminaries – Artificial Immune Systems Artificial Immune Systems (AIS) are set of
techniques inspired by the Human Immune System.
AISImmuno-logy
Computer Science
Engineer-ing
12
Preliminaries – Artificial Immune Systems
Human
Immune
System
Tolerant
Robust
Decentralized
Adaptive
Self-protect
ing
Diverse
Dynamic
13
Preliminaries – Artificial Immune Systems The HIS has different cells with so many different roles,
which results in a number of algorithms that give differing levels of complexity and can accomplish a range of tasks.
14
Preliminaries – Artificial Immune Systems AIS Techniques:
Clonal Selection Algorithm. Negative Selection Algorithm. Idiotypic Network Approaches. Danger Theory. Dedtrictic Cell Algorithm.
15
Negative Selection Algorithm (NSA) is an
artificial immune system technique that is
based on the self/non-self discrimination.
Preliminaries – Negative Selection Algorithm
16
Preliminaries – MultiAgent Systems
A Multi-Agent System (MAS) is a
computerized system that is composed of
intelligent entities called agents, that interact
with each other and the surrounding
environment.
A MAS is a dynamic system, where the agents
may unintentionally affect the environment in
unpredictable ways.
17
Preliminaries – MultiAgent Systems
Cooperation
Autonomy
Adaptation
The agents are actually software agents,
usually act in collaboration with each other to
achieve certain goals.
18
Proposed MultiAgent Artificial Immune System for Network Intrusion Detection
19
Proposed Approaches Approach 1: Intrusion Detection System
inspired by Artificial Immune System Using Genetic Algorithm.
Approach 2: Continuous Features Discretization for Anomaly Detectors Generation.
Approach 3: Feature Selection for Anomaly Detectors Generation.
Approach 4: Multi-layer Hybrid System for Anomalies Detection and Classification.
Approach 5: Multiagent AIS for Network Intrusion Detection and Classification.
20
Proposed Approaches Approach 1
AIS with GADG (Genetic Algorithm
for Detectors Generation) system
for anomaly network intrusion
detection – different distance
measure used while generating
the anomaly detectors.
D. Dasgupta and F. Gonzalez, “An Immunity-based Technique to Characterize Intrusions in Computer Networks”, IEEE Transactions on Evolutionary Computation, Vol. 6(3), pp. 281-291, 2003.
Start
Define self space S as a collection of
strings to represent
normal activity
Generate set of detectors R
using GA
Use detectors to detect
anomalous connections
End
21
Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
GADG algorithm
22
Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
The detectors of the AIS are presented by rules as:
.
.
.
Where the features in a feature vector are x1 to xn, and the detectors (rules) in a detector set are R1to Rm
.
23
Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
A variability value is used to define the high and low limits of each feature’s value.
Based on the variability size, the self space can be narrow or wide.
24
Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
To calculate the fitness of an individual (or a rule) in the GA, two things are to be taken into account:
the number of elements in the training sample that can be included in a rule’s hyper-cube
And the volume of the hyper-cube that the rule represents
Consequently, the fitness is calculated using the following equation
25
Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
Euclidean and Minkowski distance each was used as a distance measure in the GADG, for the sake of comparison.
Euclidean distance measure:
Minkowski distance measure:
where p is the Minkowski metric order, and it can take values from 0 to infinity (and can even be a real value between 0 and 1).
26
Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
The algorithm was originally suggested with the application on real-valued features in the NSL-KDD data set.
The real-valued features are not enough to detect all types of attacks, so the algorithm should expand to include features of different types.
Amira Sayed A. Aziz, Mostafa Salama, Aboul ella Hassanien, and Sanaa El-Ola Hanafi. "Detectors generation using genetic algorithm for a negative selection inspired anomaly network intrusion detection system." In 2012 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 597-602. IEEE, September 2012.
Amira Sayed A. Aziz, Mostafa A. Salama, Aboul ella Hassanien, and Sanaa El-Ola Hanafi. "Artificial Immune System Inspired Intrusion Detection System Using Genetic Algorithm." Informatica (03505596) 36, no. 4 (December 2012). (Impact factor =1.12).
27
Proposed Approaches Approach 2Applying continuous features discretization to create homogeneity between feature of different types.
Start
Apply EWB for continuous
features discretization
Define self space S as a collection of
strings to represent
normal activity
Generate set of detectors R
using GA
Use detectors to detect
anomalous connections
End
28
Continuous Features Discretization for Anomaly Detectors Generation
The problem with using different features is
that they have different data types: binary,
categorical, and continuous (real and integer).
So, continuous feature discretization should
be applied to:
cover a wide range of values in a way that can
represent each region uniquely,
create some sort of homogeneity between
features values to apply the GA.
29
Continuous Features Discretization for Anomaly Detectors Generation Equal-width interval binning is the simplest
method for data discretization. The range of values is divided into k equally
sized bins, as k is a parameter supplied by the user as the required number of bins.
The bin width is calculated as
The equal-width interval binning algorithm is a global, unsupervised, and static discretization algorithm.
30
Continuous Features Discretization for Anomaly Detectors Generation The fitness is measured by calculating the
matching percentage between an individual and the normal samples, as:
where a is the number of samples matching the individual by 100% , and A is the total number of normal samples. Three distance measures were used for
comparison in the GADG algorithm: Euclidean, Minkowski, and Hamming.
31
Continuous Features Discretization for Anomaly Detectors Generation The Hamming distance is calculated as:
where n is the number of features.
32
Continuous Features Discretization for Anomaly Detectors Generation The group of features used in the application were
proposed by a previous algorithm suggested in another paper.
Still, we need to find the set of features that would give the best results in the proposed approach.
Hence, a feature selection technique should be applied.
Amira Sayed A. Aziz, Ahmad Taher Azar, Aboul Ella Hassanien, and Sanaa El-Ola Hanafy. "Continuous Features Discretization for Anomaly Intrusion Detectors Generation." In Soft Computing in Industrial Applications (Proceedings of the 17th Online World Conference on Soft Computing in Industrial Applications, December 2012), pp. 209-221. Springer International Publishing, 2014.
33
Proposed Approaches Approach 3Comparative analysis between different feature selection techniques: CFS, SFFS, SFBS, and PCA.
Start
Apply EWB for continuous
features discretization
Apply feature selection
technique to select best feature set
Define self space S as a collection of
strings to represent
normal activityGenerate set of detectors R
using GA
Use detectors to detect
anomalous connections
End
34
Feature Selection for Anomaly Detectors Generation An accurate mapping of lower-dimensional
space of features is needed so no information is lost by discarding important and basic features.
A feature is good when it is relevant but not redundant to the other relevant features.
The Feature Selection is an essential machine learning technique that is important and efficient in building classification systems.
When used to reduce features, it results in lower computation costs and better classification performance.
35
Feature Selection for Anomaly Detectors Generation Correlation Feature Selection (CFS) is a heuristic
approach that evaluates the worthiness of a features subset where a feature is considered good if it is highly correlated to the class but not to the other features.
Sequential-Floating Forward Selection (SFFS) basically starts with an empty set, then at each iteration it adds sequentially the next best feature. In addition to that, after each forward step, SFFS performs a backward step that discards the worst feature of the subset after a new feature is added. The backward steps are performed as long as the objective function is increasing.
36
Feature Selection for Anomaly Detectors Generation Sequential-Floating Backward Selection (SFBS)
starts with the full set of features, then it sequentially removes the feature that least reduces the objective function value. Then, SFBS performs forward steps after each backward step, as long as the objective function increases.
Principal Components Analysis (PCA) is a way to find and highlight similarities and differences between data by identifying the existing patterns.
37
Feature Selection for Anomaly Detectors Generation The proposed AIS for anomaly intrusion detection
gives very good detection results so far, but the problem with anomaly intrusion detection is that data records are labeled as either normal or anomaly.
Hence, a classifier is needed to label the detected anomalies with their right attack class.
Amira Sayed A. Aziz, Ahmad Taher Azar, Mostafa A. Salama, Aboul Ella Hassanien, and Sanaa El-Ola Hanafy. "Genetic algorithm with different feature selection techniques for anomaly detectors generation." In Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on, pp. 769-774. IEEE, September 2013.
38
Proposed Approaches Approach 4Multi-layer AIS for network intrusion detection and classification.
Start
Apply EWB for continuous
features discretization
Apply feature selection
technique to select best feature set
Define self space S as a collection of
strings to represent
normal activity
Generate set of detectors R
using GA
Use detectors to detect
anomalous connections
End
Pass the detected
anomalies to a classifier to label them with their proper attack
class
39
Multi-layer Hybrid System for Anomalies Detection and Classification
In anomaly NIDS, traffic is usually classified into
either normal or anomaly.
Hence, a multi-category classifier to label the
detected anomalies with their right attack classes is
needed.
Many classifiers were applied for a comparative
analysis, to find which classifier would best classify
the detected anomalies: Naïve Bayes, Multi-layer
Perceptron Neural Network, and Decision Trees.
40
Multi-layer Hybrid System for Anomalies Detection and Classification
A Naïve Bayesian classifier is a simple
probabilistic classifier based on applying
Bayes theorem with strong naive
independence assumptions.
An Multi-Layer Perceptron (MLP) is an Artificial
Neural Network, where it is a finite acyclic
graph where neurons of the i-th layer serve as
input features for neurons of i+1-th layer.
41
Multi-layer Hybrid System for Anomalies Detection and Classification
A Decision Tree (DT) is a structure of layered
nodes (a hierarchical organization of rules),
where a non-terminal node represents a
decision on a particular data item and a leaf
(terminal) node represents a class.
Four types of decision trees were tested: J48
(C4.5) decision tree, Naïve Bayes Tree (NBTree),
Best-First Tree (BFTree), and Random Forrest
(RFTree).
42
Multi-layer Hybrid System for Anomalies Detection and Classification
The network intrusion detection system is a centralized
system, where it face many problems of processing
overload and single point of failure.
The AIS is a system of distributed nature, so the
intrusion detection system is better implemented as a
multi-agent system. Amira Sayed A. Aziz, Aboul Ella Hassanien, Ahmad Taher Azar, and Sanaa El-Ola
Hanafi. "Machine Learning Techniques for Anomalies Detection and Classification." The International Conference on Advances in Security of Information and Communication Networks SecNet 2013, pp. 219-229. Springer Berlin Heidelberg, September 2013.
Amira Sayed A. Aziz, Aboul ella Hassanien, Sanaa El-Ola Hanafy, M.F. Tolba, "Multi-layer hybrid machine learning techniques for anomalies detection and classification approach", 2013 13th International Conference on Hybrid Intelligent Systems (HIS), pp. 216-221, IEEE, December 2013.
43
Proposed Approaches Approach 5 Multiagent AIS for network
intrusion detection and classification.
Main AgentApply EWB for
continuous features
discretization
Apply feature selection
technique to select best feature set
Define self space S as a collection
of strings to represent normal
activity
Generate set of detectors R
using GA
Detector AgentUse detectors to
detect anomalous connections
Pass the detected anomalies to a
classifier to label them with their
proper attack class
44
The Final Proposed System Model The proposed model is a multi-agent system,
that applies an AIS technique for anomaly network intrusion detection and classification.
45
Main Agent
The task of the main agent is to make
preparations for the detector agents to carry
on the detection and classification processes,
using the train data.
46
Detector Agent
47
Detector Agent The system can be very robust against the
failure of one or two agents. A MAS is also scalable as it is easier to add
agents to add new capabilities to the system, which would be more complex in a monolithic system.
Amira Sayed A. Aziz, Sanaa El-Ola Hanafi, Aboul ella Hassanien, "Multi-Agent Artificial Immune System for Network Intrusion Detection and Classification", SOCO14, 9th International Conference on Soft Computing Models in Industrial and Environmental Applications - Bilbao, Spain, 25 - 27 June 2014.
48
Results and Discussion
49
Data Set
The approaches were run against the NSL-
KDD IDS evaluation data set.
There are four general types of attacks in the
data set: Denial of Service (DoS), Probe, User
to Root (U2R), and Remote to Local (R2L).
50
Data Set
Denial-of-Service Attack (DoS) flooding the
network with useless traffic.
Probe Attack a program used for monitoring or
collecting data about network activity.
User-to-Root (U2R) user attempts to gain root-
level privileges.
Remote-to-Local (R2L) user attempts to gain
local accessibility through remote connection
51
Data Set
The distributions of normal and attacks
records in the NSL-KDD data set.
Total Records Normal DoS Probe U2R R2L
Train_20% 25192
13449 9234 2289 11 209
53.39% 36.65% 9.09% 0.04% 0.83%
Train_All 125973
67343 46927 11656 52 995
53.46% 36.456%
9.25% 0.04% 0.79%
Test+ 225449711 7458 2421 200 2754
43.08% 33.08% 10.74% 0.89% 12.22%
52
Approach 1: Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
Settings
53
Approach 1: Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
* M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, “A Detailed Analysis of the KDD Cup 99 data set”, In Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications, 2009.
54
Approach 1: Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
Average Detection Rates (Minkowski) vs. variation values
55
Approach 1: Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
Average Detection Rates (Minkowski) vs. threshold values
56
Approach 1: Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
True Positives Rates (Minkowski)
57
Approach 1: Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
True Negatives Rates (Minkowski)
58
Approach 1: Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
Average True Positives Rates (Minkowski) vs. variation values
59
Approach 1: Intrusion Detection System inspired by Artificial Immune System Using Genetic Algorithm
Average True Negatives Rates (Minkowski) vs. variation values
60
Approach 2: Continuous Features Discretization for Anomaly Detectors Generation
* S.T. Powers and J. He, “A hybrid artificial immune system and Self-Organising Map for Network Intrusion Detection”, Information Science,Vol. 178(15), pp. 3024-3042, 2008.
61
Approach 2: Continuous Features Discretization for Anomaly Detectors Generation
Settings
62
Approach 2: Continuous Features Discretization for Anomaly Detectors Generation
Average Detection Rates
63
Approach 2: Continuous Features Discretization for Anomaly Detectors Generation
Average True Positives Rates
64
Approach 2: Continuous Features Discretization for Anomaly Detectors Generation
Average True Negatives Rates
65
Approach 3: Feature Selection for Anomaly Detectors Generation
Settings
66
Approach 3: Feature Selection for Anomaly Detectors Generation
Selected Features
67
Approach 3: Feature Selection for Anomaly Detectors Generation
Average Detection Rates (Accuracy)
68
Approach 3: Feature Selection for Anomaly Detectors Generation
Average True Positives Rates (Sensitivity)
69
Approach 3: Feature Selection for Anomaly Detectors Generation
Average True Negatives Rates (Specificity)
70
Approach 4: Multi-layer Hybrid System for Anomalies Detection and Classification
True Positives Rates (Euclidean)
71
Approach 4: Multi-layer Hybrid System for Anomalies Detection and Classification
True Positives Rates (Minkowski)
72
Approach 4: Multi-layer Hybrid System for Anomalies Detection and Classification
DoS Attack Classification Results
73
Approach 4: Multi-layer Hybrid System for Anomalies Detection and Classification
Probe Attack Classification Results
74
Approach 4: Multi-layer Hybrid System for Anomalies Detection and Classification
R2L Attack Classification Results
75
Approach 4: Multi-layer Hybrid System for Anomalies Detection and Classification
U2R Attack Classification Results
76
Settings:
For the GADG process, the population size was 600,
number of generations is 1000, and the threshold value is
0.8.
These values gave the best results with the features
selected by SFFS in proposed approach 3.
In the experiment, 26 features were selected by SFFS.
For the classifiers, the Train_20percent data was used for
the training, as the classifiers proved to give very good
results without having to use the whole train data records.
Approach 5: Multi-agent Artificial Immune System for Network Intrusion Detection
77
For the anomaly detection process, the detection rate is calculated as successfully detected true positives (anomalies) and true negatives (normal).
89.78% of the attacks were successfully detected as anomalies.Total Normal (TN)
Total Anomalies (TP)
DoS Probe U2R R2L
7996 11521 6939 2303 159 2120
82.34% 89.78% 93.04% 95.13% 79.50% 76.98%
Approach 5: Multi-agent Artificial Immune System for Network Intrusion Detection
78
The anomaly data is then classified by going through NB classifier first to label the r2l and u2r attacks, then the remaining (other) anomalies go through the BFTree classifier to label the dos and probe attacks, and label the false alarms as normal.
Normal Anomalies DoS Probe U2R R2L
1505 8046/11521
5440/6939
1826/2303 2/159 778/2120
87.76% 72.16% 78.40% 79.29% 1.26% 36.70%
Approach 5: Multi-agent Artificial Immune System for Network Intrusion Detection
79
Combining the previous results, the final
results of successfully detected and labeled
data records are shown in the table below.
Normal Anomalies DoS Probe U2R R2L
9501/9711
8046/12833
5440/7458
1826/2421 2/200 778/2754
97.84% 62.7% 72.94% 75.42% 1.00% 28.25%
Approach 5: Multi-agent Artificial Immune System for Network Intrusion Detection
80
Obviously U2R and R2L attacks were not
recognized well by the classifiers as the other
attacks.
This is due to their very low representation in
the training data.
Approach 5: Multi-agent Artificial Immune System for Network Intrusion Detection
81
The false classification results of the detected
anomalies are:
Approach 5: Multi-agent Artificial Immune System for Network Intrusion Detection
82
Conclusions and Future Work
83
Conclusion
A multi-layer hybrid machine learning intrusion
detection system was designed and developed to
achieve high efficiency and improve the detection
and classification rate accuracy inspired by
immune systems with negative selection approach.
The final application was able to detect 90% of the
attacks as anomalies, with the false positives (false
alarms) rate reduced from 17% to only 2%.
84
Conclusion
Different distance measurement functions
were applied for the generation of detectors in
the genetic algorithm, including the Minkowski
distance function and the Euclidean distance
function.
With all values used within the GA, the
Minkowski distance function gave better
detection rates.
85
Conclusion
It was shown that the decision trees give the
best results in general. The Naïve Bayes
classifier give the best results with the attacks
that are least presented in the data set or
have very few training records.
A multi-agent multi-layer artificial immune
system for network intrusion detection was
implemented and tested.
86
Conclusion
The system has the privilege of being light-
weight, as well as being a distributed system
where each detector agent detects and
classifies anomalies directed to the containing
host only.
87
Future Work
For future research, we can extend the
functionality of the multi-agent system, and
involve more features to be able to detect
behavioral attacks on the host, such as R2L
attacks.
Trust dialogues should be adapted in the
system for the communication between the
agents.
88
Thank You
Questions?