research article a hybrid swarm intelligence algorithm for...

16
Research Article A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features P. Amudha, 1 S. Karthik, 2 and S. Sivakumari 1 1 Department of CSE, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore 641 108, India 2 Department of CSE, SNS College of Technology, Coimbatore 641 035, India Correspondence should be addressed to P. Amudha; [email protected] Received 20 January 2015; Revised 19 May 2015; Accepted 31 May 2015 Academic Editor: Giuseppe A. Trunfio Copyright © 2015 P. Amudha et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. is is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC) with Enhanced Particle Swarm Optimization (EPSO) to predict the intrusion detection problem. e algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. e purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup’99 benchmark dataset from the UCI Machine Learning repository is used. e performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different. 1. Introduction Due to the tremendous growth in the field of information technology, one of the significant challenging issues is net- work security. Hence, intrusion detection system (IDS) which is an indispensable component of the network needs to be secured. e traditional IDS is unable to handle newly arising attacks. e main goal of IDSs is to identify and distinguish the normal and abnormal network connections in an accurate and quick manner which is considered as one of the main issues in intrusion detection system because of the large amount of attributes or features. To study about this aspect, data mining based network intrusion detection is widely used to identify how and where the intrusions occur. Related to achieving real-time intrusion detection, researchers have investigated several methods of performing feature selection. Reducing the number of features by selecting the important features is critical to improve the accuracy and speed of clas- sification algorithms. Hence, selecting the differentiating fea- tures and developing the best classifier model in terms of high accuracy and detection rates are the main focus of this work. e research on machine learning or data mining consid- ers the intrusion detection as a classification problem, imple- menting algorithms such as Na¨ ıve Bayes, genetic algorithm, neural networks, Support Vector Machine, decision trees. In order to improve the accuracy of an individual classifier, pop- ular approach is to combine the classifiers. Recently, appli- cation of swarm intelligence technique for intrusion detec- tion has gained prominence among the research community [1]. Swarm intelligence can be a measure presenting the communal behaviour of social insect colonies or other animal societies to implement algorithms [2]. e potential of swarm intelligence makes it a perfect candidate for IDS, which needs to distinguish normal and abnormal behaviors from large amount of data. e main objective of this work is (1) to select impor- tant features using two feature selection methods, namely, single feature selection method and random feature selection method and (2) to propose a hybrid optimization algorithm based on Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) algorithms for classifying intrusion detection dataset. e studies on ABC and PSO indicate Hindawi Publishing Corporation e Scientific World Journal Volume 2015, Article ID 574589, 15 pages http://dx.doi.org/10.1155/2015/574589

Upload: others

Post on 27-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

Research ArticleA Hybrid Swarm Intelligence Algorithm for Intrusion DetectionUsing Significant Features

P Amudha1 S Karthik2 and S Sivakumari1

1Department of CSE Avinashilingam Institute for Home Science and Higher Education for Women Coimbatore 641 108 India2Department of CSE SNS College of Technology Coimbatore 641 035 India

Correspondence should be addressed to P Amudha amudharulgmailcom

Received 20 January 2015 Revised 19 May 2015 Accepted 31 May 2015

Academic Editor Giuseppe A Trunfio

Copyright copy 2015 P Amudha et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computersThis is due to the extensive growth of internet connectivity and accessibility to information systems worldwide To deal with thisproblem in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC) with Enhanced ParticleSwarm Optimization (EPSO) to predict the intrusion detection problem The algorithms are combined together to find out betteroptimization results and the classification accuracies are obtained by 10-fold cross-validation method The purpose of this paperis to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of theproposed hybrid classification algorithm To investigate the performance of the proposedmethod intrusion detection KDDCuprsquo99benchmark dataset from the UCIMachine Learning repository is usedThe performance of the proposedmethod is compared withthe other machine learning algorithms and found to be significantly different

1 Introduction

Due to the tremendous growth in the field of informationtechnology one of the significant challenging issues is net-work securityHence intrusion detection system (IDS)whichis an indispensable component of the network needs to besecuredThe traditional IDS is unable to handle newly arisingattacks The main goal of IDSs is to identify and distinguishthe normal and abnormal network connections in an accurateand quick manner which is considered as one of the mainissues in intrusion detection system because of the largeamount of attributes or features To study about this aspectdata mining based network intrusion detection is widelyused to identify how and where the intrusions occur Relatedto achieving real-time intrusion detection researchers haveinvestigated several methods of performing feature selectionReducing the number of features by selecting the importantfeatures is critical to improve the accuracy and speed of clas-sification algorithms Hence selecting the differentiating fea-tures and developing the best classifiermodel in terms of highaccuracy and detection rates are the main focus of this work

The research onmachine learning or data mining consid-ers the intrusion detection as a classification problem imple-menting algorithms such as Naıve Bayes genetic algorithmneural networks Support Vector Machine decision trees Inorder to improve the accuracy of an individual classifier pop-ular approach is to combine the classifiers Recently appli-cation of swarm intelligence technique for intrusion detec-tion has gained prominence among the research community[1] Swarm intelligence can be a measure presenting thecommunal behaviour of social insect colonies or other animalsocieties to implement algorithms [2]The potential of swarmintelligencemakes it a perfect candidate for IDS which needsto distinguish normal and abnormal behaviors from largeamount of data

The main objective of this work is (1) to select impor-tant features using two feature selection methods namelysingle feature selection method and random feature selectionmethod and (2) to propose a hybrid optimization algorithmbased on Artificial Bee Colony (ABC) and Particle SwarmOptimization (PSO) algorithms for classifying intrusiondetection dataset The studies on ABC and PSO indicate

Hindawi Publishing Corporatione Scientific World JournalVolume 2015 Article ID 574589 15 pageshttpdxdoiorg1011552015574589

2 The Scientific World Journal

that ABC has powerful global search ability but poor localsearch ability [3] while the PSO has powerful local searchability but poor global search ability [4] In order to provide apowerful global search capability and local search capabilityin this paper a hybridized model called MABC-EPSO isproposed which brings the two algorithms together so thatthe computation process may benefit from both advanta-ges In this hybrid algorithm the local search and globalsearch abilities are balanced to obtain more quality resultsKDDCUPrsquo99 intrusion detection dataset developed by MITLincoln Laboratory is used for experiments to find theaccuracy of the proposed hybrid approach

The rest of this paper is organized as follows Section 2provides an overview of related work Section 3 presents theprinciples of PSO and ABC Section 4 describes the method-ology dataset description and preprocessing proposed fea-ture selection methods and hybrid approach Section 5 givesperformance metrics experimental results and discussionsFinally conclusion is given in Section 6

2 Related Work

Being related to achieving real-time intrusion detectionresearchers have investigated several methods of performingfeature selection Kohavi and John [4] described the fea-ture subset selection problem in supervised learning whichinvolves identifying the relevant or useful features in adataset and giving only that subset to the learning algorithmThe real-life intrusion detection dataset contains redundantfeatures or insignificant features The redundant featuresmake it harder to detect possible intrusion patterns [5] Withthe increasing applications of classification algorithms andfeature selection methods for intrusion detection dataset acomprehensive list of a few such literatures is given in [6ndash23]

Machine learning algorithms such as neural networks [9]fuzzy clustering [14] have been applied to IDS to constructgood detection model Support vector machine (SVM) [24]has become a popular researchmethod in intrusion detectiondue to its good generalization performance and the sparserepresentation of solution Satpute et al [25] enhanced theperformance of intrusion detection system by combiningPSO and its variantswithmachine learning techniques for thedetection of anomaly in network intrusion detection systemChung and Wahid [26] proposed a novel simplified swarmoptimization (SSO) algorithm as a rule based classifier andfor feature selection for classifying audio data The algorithmismore flexible and cost-effective to solve complex computingenvironments Revathi and Malathi [10 11] proposed hybridsimplified swarm optimization to preprocess the data andcompared the proposed approach with a new hybridizedapproach PSO with Random Forest and found that theproposed method provided high detection rate and optimalsolution

Karaboga and Basturk [27] proposed Artificial Bee Col-ony (ABC) algorithm based on a particular intelligent behav-iour of honeybee swarms By understanding the basic behav-iour characteristics of foragers ABC algorithm was devel-oped and was compared with that of differential evolution

Particle Swarm Optimization and evolutionary algorithmfor multidimensional and multimodal numeric problemsKaraboga and Akay [28] proposed ABC algorithm for anom-aly-based network intrusion detection system to optimizethe solution The proposed method was classified into fourstages such as parameterization training stage testing stageand detection stage D D Kumar and B Kumar [29] appliedABC algorithm for anomaly-based IDS and used featureselection techniques to reduce the number of features used fordetection and classification Mustafa ServetKiran andMesut-Gunduz [30] proposed a hybridization of PSO and ABCfor different continuous optimization problems in which theinformation between particle swarm and bee colony helpsin increasing global and local search abilities of the hybridapproach

3 Theoretical Background

The following subsections provide the necessary backgroundto understand the problem

31 Particle Swarm Optimization Particle Swarm Optimiza-tion (PSO) is one of the popular heuristic technique whichhas been successfully applied in many different applicationareas but however it suffers from premature convergenceespecially in high dimension multimodal problems

The algorithm of the standard PSO is as follows

(1) Initialize a population of particleswith randomly cho-sen positions and velocities

(2) Calculate the fitness value of each particle in the pop-ulation

(3) If the fitness value of the particle 119894 is better than itspbest value then set the fitness value as a new pbest ofparticle 119894

(4) If pbest is updated and it is better than the currentgbest then set gbest to the current pbest value ofparticle 119894

(5) Update particlersquos velocity andposition according to (1)and (2)

(6) If the best fitness value or the maximum generation ismet then stop the process otherwise repeat the proc-ess from step 2

In PSO a swarm consists of 119873 particles in a D-dimensionalsearching space The 119894th particle is represented as 119883

119894=

(1199091198941 1199091198942 119909119894119889) The best previous position pbest of any

particle is 119875119894= (1199011198941 1199011198942 119901119894119889) and the velocity for particle 119894

is 119881119894= (V1198941 V1198942 V119894119889) The global best particle in the whole

swarm is denoted by 119875119892and it represents the fittest particle

[31] During each iteration each particle updates its velocityaccording to the following equation

V119905119894119889= V119905minus1119894119889

+ 1198881 sdot rand1 sdot (119901119894119889 minus119909119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(1)

The Scientific World Journal 3

where 1198881and 1198882denote the acceleration coefficients 119889 = 1 2

119863 and rand1 and rand2 are random numbers uniformlydistributed within [0 1]

Each particle thenmoves to a new potential position as inthe following equation

119909119905

119894119889= 119909119905minus1119894119889

+ V119905119894119889 119889 = 1 2 119863 (2)

32 Artificial Bee Colony The Artificial Bee Colony (ABC)algorithm is an optimization algorithm based on the intel-ligent foraging behaviour of honey bee swarm proposedby Karaboga and Basturk [27] The Artificial Bee Colonycomprises of three groups scout bees onlooker bees andemployed bees The bee which carries out random search isknown as scout beeThe bee which visits the food source is anemployed beeThe bee which waits on the dance region is anonlooker bee and the onlooker bee with scout is also calledunemployed beeThe employed and unemployed bees searchfor the good food sources around the hive The employedbees share the stored food source information with onlookerbees The amount of food sources is equal to the amount ofemployed bees and also is equal to the number of onlookerbees The solutions of the employed bees which cannotbe enhanced by a fixed number of bounds become scoutsand their solutions are abandoned [28] In the context ofoptimization the amount of food sources in ABC algorithmrepresents the number of solutions in the population Thepoint of a good food source indicates the location of apromising solution to the optimization problem [27]

The four main phases of ABC algorithm are as follows

Initialization Phase The scout bees randomly generate thepopulation size (SN) of food sources The input vector 119909

119898

which contains 119863 variables represents food source where 119863represents the searching space dimension of the objectivefunction to be optimized Using (3) initial sources of foodare produced randomly

119909119898= 119897119894+ rand (0 1) lowast (119906119894 minus 119897119894) (3)

where 119906119894and 119897119894are the upper and lower bounds of the solution

space of objective function and rand(0 1) is a random num-ber within the range [0 1]

Employed Bee Phase The employed bee finds a new foodsource within the region of the food source The employedbees reminisce higher quantity of food source and share itwith onlooker bees Equation (4) determines the neighbourfood source V

119898119894and is calculated by

V119898119894= 119909119898119894+120601119898119894(119909119898119894minus119909119896119894) (4)

where 119894 is a randomly selected parameter index 119909119896is a

randomly selected food source and 120601119898119894

is a random numberwithin the range [minus1 1] Suitable tuning on specific problemscan be made using this parameter range The fitness of foodsources which is needed to find the global optimal solution

is calculated by (5) And a greedy selection method is usedbetween 119909

119898and V119898

fit119894=

1119891119894+ 1

119891119894ge 0

1 + 10038161003816100381610038161198911198941003816100381610038161003816 119891119894lt 0

(5)

where 119891119894represents the objective value of 119894th solution

Onlooker Bee Phase Onlooker bees examine the effectivenessof food sources by observing the waggle dance in the danceregion and then randomly select a rich food sourceThen thebees perform a random search in the neighbourhood areaof food source using (4) The quantity of a food source isevaluated by its profitability 119875

119894using the following equation

119901119894=

fit119894

sumSN119899=1 fit119899

(6)

where fit119894denotes the fitness of the solution represented by

food source 119894 and SNdenotes the total number of food sourceswhich is equal to number of employed bees

Scout Phase If the effectiveness of food source cannot beimproved by the fixed number of trials then the scout beesremove the solutions and randomly search for new solutionsby using (3) [29]

The pseudocode of the ABC algorithm is given inAlgorithm 1

4 Methodology

41 Research Framework In this study the framework of theproposed work is given as follows

(i) Data preprocessing it prepared the data for classi-fication and removed unused features and duplicateinstances

(ii) Feature selection it determined the feature subsetusing SFSMandRFSMmethods that contribute to theclassification

(iii) Hybrid classification it performed classification usingMABC-EPSO algorithm to enhance the classificationaccuracy for the KDDCUPrsquo99 dataset

The objective of this study is to help the network administra-tor in preprocessing the network data using feature selectionmethods and to perform classification using hybrid algorithmwhich aims to fit a classifier model to the prescribed data

42 Data Source and Dataset Description In this section weprovide brief description of KDDCuprsquo99 dataset [30] which isderived fromUCIMachine Learning Repository [31] In 1998DARPA intrusion detection evaluation program to performa comparison of various intrusion detection methods asimulated environment was set up by the MIT Lincoln Labto obtain raw TCPIP dump data for a local-area network(LAN) The functioning of the environment was like a realone which included both background network traffic and

4 The Scientific World Journal

Input initial solutionsOutput optimal solutionBEGINGenerate the initial population 119909

119898119898 = 1 2 SN

Evaluate the fitness (fit119894) of the population

set cycle = 1repeatFOR (employed phase)

Produce a new solution V119898using (4)

Calculate fit119894

Apply greedy selection processCalculate the probability 119875

119894using (6)

FOR (onlooker phase) Select a solution 119909

119898depending on 119875

119894

Produce a new solution V119898

Calculate fit119894

Apply greedy selection processIf (scout phase) There is an abandoned solution for the scout depending on limitThen Replace it with a new solution which will by randomly produced by (3)

Memorize the best solution so farcycle = cycle + 1Until cycle = MCNend

Algorithm 1 Artificial Bee Colony

Table 1 Distribution of connection types in 10 KDDCuprsquo99 data-set

Label of occurrence

DoS Probe U2R R2L Totalattack

Totalnormal

Training data 7924 083 001 023 8031 1969Testing data 7390 134 007 520 8151 1949

wide variety of attacks A version of 1998 DARPA datasetKDDCuprsquo99 is now widely accepted as a standard bench-mark dataset and received much attention in the researchcommunity of intrusion detection The main motivation ofusing KDDCuprsquo99 Dataset is to show that the proposedmethod has the advantage of becoming an efficient classi-fication algorithm when applied to the intrusion detectionsystem In this paper 10 KDD Cuprsquo99 dataset is used forexperimentation The distribution of connection types andsample size in 10 KDDCUP dataset is shown in Tables 1and 2 The feature information of 10 KDDCUP dataset isshown in Table 3 The dataset consists of one type of normaldata and 22 different attack types categorized into 4 classesnamely denial of service (DoS) Probe user-to-root (U2R)and remote-to-login (R2L)

43 Data Preprocessing Data preprocessing is the time-consuming task which prepares the data for subsequentanalysis as per the requirement for intrusion detection systemmodel The main aim of data preprocessing is to transformthe raw network data into suitable form for further analysisFigure 1 illustrates the steps involved in data processing and

Table 2 Sample size in 10 KDDCUP dataset

Category of attack Attack nameNormal Normal (97277)

DoSNeptune (107201) Smurf (280790) Pod(264) Teardrop (979) Land (21) Back(2203)

Probe Portsweep (1040) IPsweep (1247) Nmap(231) Satan (1589)

U2R Bufferoverflow (30) LoadModule (9)Perl (3) Rootkit (10)

R2LGuesspassword (53) Ftpwrite (8) Imap(12) Phf (4) Multihop (7) Warezmaster(20) Warezclient (1020)

Table 3 Feature information of 10 KDDCUP dataset

Dataset characteristics MultivariateAttribute characteristics Categorical integerAssociated task ClassificationArea ComputerNumber of instances 494020Number of attributes 42Number of classes 1 normal class 4 attack classes

how raw input data are processed for further statistical meas-ures

Various statistical analyses such as feature selectiondimensionality reduction and normalization are essential toselect significant features from the dataset If the dataset con-tains duplicate instances then the classification algorithms

The Scientific World Journal 5

Data preprocessing Data analysis

Network audit data

Association mining

Classification

Clustering

Fill missing value

Remove duplicate instances

Feature selection or dimensionality reduction

Alarmalert

Figure 1 Data preprocessing

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBeginLet119883 = 119909

1 1199092 119909

119899 where 119899 represents the number of features in the dataset

for 119894 = 1 2 119899

119883(119894) = 119909

(119894)one dimensional feature vector

Apply SVM classifier

Sort features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold thenSelect the featuresEnd

Algorithm 2 Single feature selection method

Table 4 Details of instances in the dataset

Before removingduplicates

After removingduplicates Selected instances

Normal 97277 87832 8783DoS 391458 54572 7935Probe 4107 2131 2131U2R 52 52 52R2L 1126 999 999Total 494020 145586 19900

consume more time and also provide inefficient results Toachieve more accurate and efficient model duplication elim-ination is needed The main deficiency in this dataset is thelarge number of redundant instances This large amount ofduplicate instances will make learning algorithms be partialtowards the frequently occurring instances and will inhibit itfrom learning infrequent instances which are generally moreunsafe to networks Also the existence of these duplicateinstances will cause the evaluation results to be biased by themethods which have better detection rates on the frequentlyoccurring instances [32] Eliminating duplicate instanceshelps in reducing false-positive rate for intrusion detectionHence duplicate instances are removed so the classifiers willnot be partial towards more frequently occurring instancesThe details of instances in the dataset are shown in Table 4After preprocessing selected random sample of 10 normaldata and 10Neptune attack inDoS class and four new sets ofdata are generated with the normal class and four categoriesof attack [33] Moreover irrelevant and redundant attributes

of intrusion detection dataset may lead to complex intrusiondetection model and reduce detection accuracy

44 Feature Selection Feature selection is an important dataprocessing process As the dataset is large it is essentialto remove the insignificant features in order to distinguishnormal traffic or intrusions in a well-timed manner In thispaper feature subsets are formed based on single featuremethod (SFSM) random feature selection method (RFSM)and compared the two techniques The proposed methodsreduce the features in the datasets which aim to improveaccuracy rate reduce processing time and improve efficiencyfor intrusion detection

441 Single Feature Selection Method Single feature method(SFSM) uses the one-dimensional feature vector In thefirst iteration it considers only the first attribute and isevaluated for calculating the accuracy using the SupportVectorMachine classifier In the second iteration it considersonly the corresponding attribute for evaluation The processis repeated until all 41 features are evaluated After calculatingthe entire featurersquos efficiency it is sorted and vital features areselected whose accuracy and detection rate are acc thresholdand dr threshold values respectively The pseudocode ofsingle feature selection algorithm is given in Algorithm 2

442 Random Feature Selection Method In this methodthe features are removed randomly and evaluated using theclassifier In the first iteration all the features are evaluatedusing SVM classifier and then by deleting one feature updatethe dataset using the classifier efficiency The importance of

6 The Scientific World Journal

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBegin

Let119883 = 1199091 1199092 119909

119899 where 119899 represents the number of features in the dataset

Let 119878 = 119883forall119899

119894minus1119883 doDelete 119909

119894from119883

119878 = 119878 minus 119909119894update feature subset

Apply SVM classifierDelete 119909

119894from119883

endSort the features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold then119878 = 119878 minus 119909

119894selecting vital features

End

Algorithm 3 Random feature selection method

Table 5 List of features selected using SFSM methods

Dataset Selected features Number offeatures

DoS + 10 normal24 32 41 28 40 27 3435 5 17 21 4 39 11 9 714 1 30 6

20

Probe + 10 normal 11 1 15 26 10 4 21 1819 25 39 31 7 35 28 15

R2L + 10 normal16 26 30 3 7 21 6 1412 35 32 18 38 17 4110 31

17

U2R + 10 normal 27 40 26 1 34 41 7 1828 3 20 37 11 13

Table 6 List of features selected using RFSMmethods

Dataset Selected features Number offeatures

DoS + 10 normal 4 9 21 39 14 28 3 829 33 17 12 38 31 14

Probe + 10 normal 27 2 3 30 11 33 23 939 20 21 37 12

R2L + 10 normal 24 15 23 7 25 16 8 3329 38 21 30 32 13

U2R + 10 normal 6 19 22 30 21 28 3627 11 17 20 11

the provided feature is calculated In the second iterationanother feature is removed randomly from the dataset andupdated The process is repeated until only one feature isleft After calculating the entire featurersquos efficiency it is sortedin descending order of its accuracy If the accuracy anddetection rate are greater than the threshold value (accuracyand detection rate obtained using all features) then selectthose features as vital featuresThepseudocode of the randomfeature selection algorithm is given in Algorithm 3

Tables 5 and 6 show the feature subsets identified usingthe two feature selection methods and size of the subsetsidentified as a percentage of the full feature set

45 Hybrid Classification Approach Artificial intelligenceandmachine learning techniques were used to build differentIDSs but they have shown limitations in achieving highdetection accuracy and fast processing time Computationalintelligence techniques known for their ability to adaptand to exhibit fault tolerance high computational speedand resilience against noisy information compensate for thelimitations of these approaches [1] Our aim is to increase thelevel of performance of intrusion detection of the most usedclassification techniques nowadays by using optimizationmethods like PSO andABCThis work develops an algorithmthat combines the logic of both ABC and PSO to producea high performance IDS and their combination has theadvantage of providing amore reliable solution to todayrsquos dataintensive computing processes

Artificial Bee Colony algorithm is a newly proposed opti-mization algorithm and is becoming a hot topic in computa-tional intelligence nowadays Because its high probability ofavoiding the local optima it can make up the disadvantage ofParticle Swarm Optimization algorithm Moreover ParticleSwarm Optimization Algorithm can help us to find out theoptimal solutionmore easily In such circumstances we bringthe two algorithms together so that the computation processmay benefit from both of the advantagesThe flowchart of theproposed hybrid MABC-EPSO is given in Figure 2

In this hybrid model the colony is divided into twoparts one possesses the swarm intelligence of Artificial BeeColony and the other one is the particle swarm intelligenceAssuming that there is cooperation between the two parts ineach iteration one part which finds out the better solutionwill share its achievement with the other part The inferiorsolution will be replaced by the better solution and will besubstituted in the next iterationThe process ofMABC-EPSOis as follows

Step 1 (initialization of parameters) Set the number of indi-viduals of the swarm set the maximum circle index of thealgorithm set the search range of the solution set the otherconstants needed in both ABC and PSO

The Scientific World Journal 7

No

Yes

Initialize the parameters of EPSO and MABC

Evaluate the fitness value

Calculate the particle Employed bee phase

Update particle positions

Update the best

Onlooker bee phase

Scout bee phase

Determine the best of MABC

Is the termination condition satisfied

Select the best solution

EPSO MABC

Data preprocessing

Feature selection using SFSM and RFSM

Network audit data

Determine the gbest of EPSO and the best of MABC

Determine the pbest and gbest of EPSO

Figure 2 Flowchart of the proposed hybrid MABC-EPSO model

Step 2 (initialization of the colony) Generate a colony witha specific number of individuals Bee colony is divided intotwo categories employed foragers and unemployed foragersaccording to each individualrsquos fitness value on the other handas a particle swarm calculate the fitness value of each particleand take the best location as the global best location

Step 3 In bee colony to evaluate the fitness value of eachsolution an employee bee is assigned using (5)The employeebee selects a new candidate solution from the nearby foodsources and then uses greedy selectionmethod by calculatingthe Rastrigin function as follows

Min119891 (119909) = 10119899 +

119899

sum

119894=1[1199092

119894minus 10 cos (2120587119909

119894)] (7)

Amultimodal function is said to contain more than one localoptimum A function of variables is separable if it can bemodified as a sum of functions of just one variable [34] Thedimensionality of the search space is another significant fac-tor in the complexity of the problem The challenge involvedin finding optimal solutions to this function is that on theway towards the global optimum an optimization problemcan be easily confined in a local optimumHence the classicalbenchmark function Rastrigin [34] is implemented using

Artificial Bee Colony algorithm and named as ModifiedArtificial BeeColony (MABC) algorithm In (1)119891

119894is Rastrigin

function whose value is 0 at its global minimum (0 0 0)This function is chosen because it is considered to be oneof the best test functions for finding the global minimumInitialization range for the function is [minus15 15]This functionis with cosine modulation to produce many local minimaThus the function is multimodal

Step 4 If the fitness value is larger than the earlier one thebee remembers the new point and forgets the previous oneotherwise it keeps the previous solution Based on the sharedinformation by employee bees an onlooker bee calculatesthe shared fitness value and selects a food source with aprobability value computed as in (6)

Step 5 An onlooker bee constructs a new solution selectedamong the neighbors of a previous solution It also checksthe fitness value and if this value is better than the previousone it will substitute the old one with the new positionotherwise it evokes the old position The objective of scoutbees is to determine new random food sources to substitutethe solutions that cannot be enhanced after reaching theldquolimitrdquo value In order to obtain the best optimized solutionthe algorithm goes through a predefined number of cycles

8 The Scientific World Journal

(MCN) After all the choices have been made the best solu-tion generated in that iteration is called MABCbest

Step 6 As there is a large effect of initial velocity in the bal-ancing of exploration and exploitation process of swarmin this proposed Enhanced Particle Swarm Optimization(EPSO) algorithm inertia weight (120596) [35] is used to controlthe velocity and hence the velocity update equation (8)becomes as follows

V119905119894119889= 120596 sdot V119905minus1

119894119889+ 1198881 sdot rand1 sdot (119901119894119889 minus119909

119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(8)

A small inertia weight facilitates a local search whereas alarge inertia weight facilitates a global search In the EPSOalgorithm linear decreasing inertia weight [36] as in (9) isused to enhance the efficiency and performance of PSO Itis found experimentally that inertia weight from 09 to 04provides the optimal results

119908119896= 119908max minus

119908max minus 119908minitermax

times 119896 (9)

In particle swarm after the comparison among the solutionsthat each particle has experienced and the comparison amongthe solutions that all the particles have ever experienced thebest location in that iteration is called EPSObest

Step 7 Theminimum of the value MABCbest and EPSObest iscalled Best and is defined as

Best =

EPSObest if EPSObest le MABCbest

MABCbest if MABCbest le EPSObest(10)

Step 8 If the termination condition is satisfied then end theprocess and report the best solution Otherwise return toStep 2

Parameter Settings The algorithms are evaluated using thetwo feature sets selected by SFSM and RFSM In ABC algo-rithm the parameters set are bee colony size 40 MCN 500and limit 5 In EPSO algorithm the inertia weight 120596 in (11)varies from 09 to 07 linearly with the iterations Also theacceleration coefficients 119888

1and 1198882are set as 2 The upper and

lower bounds for V(Vmin Vmax) are set as the maximum upperand lower bounds of 119909

V119905119894119889= 120596V119905minus1119894119889

+ 1198881rand (0 1) (119901119894119889 minus119909119905minus1119894119889)

+ 1198882rand (0 1) (119901119892119889 minus119909119905minus1119894119889)

(11)

5 Experimental Work

This section provides the performance metrics that are usedto assess the efficiency of the proposed approach It alsopresents and analyzes the experimental results of hybridapproach and compares it with the other classifiers

Table 7 Confusion matrix

Actual PredictedNormal Attack

Normal True Negative (TN) False Positive (FP)Attack False Negative (FN) True Positive (TP)True Positive (TP) the number of of attacks that are correctly identifiedTrue Negative (TN) the number of normal records that are correctlyclassifiedFalse Positive (FP) the number of normal records incorrectly classifiedFalse Negative (FN) the number of attacks incorrectly classified

51 PerformanceMetrics Theperformancemetrics like accu-racy sensitivity specificity false alarm rate and trainingtime are recorded for the intrusion detection dataset onapplying the proposed MABC-PSO classification algorithmGenerally sensitivity and specificity are the statistical mea-sures used to carry out the performance of classificationalgorithms Hence sensitivity and specificity are chosen to bethe parametric indices for carrying out the classification taskIn intrusion detection problem sensitivity can also be calleddetection rate The number of instances predicted correctlyor incorrectly by a classification model is summarized in aconfusion matrix and is shown in Table 7

The classification accuracy is the percentage of the overallnumber of connections correctly classified

Classification accuracy = (TP + TN)(TP + TN + FP + FN)

(12)

Sensitivity (True Positive Fraction) is the percentage of thenumber of attack connections correctly classified in the test-ing dataset

Sensitivity = TP(TP + FN)

(13)

Specificity (True Negative Fraction) is the percentage of thenumber of normal connections correctly classified in the test-ing dataset

Specificity = TN(TP + FN)

(14)

False alarm rate (FAR) is the percentage of the number of nor-mal connections incorrectly classified in the testing andtraining dataset

False Alarm Rate (FAR) = FP(TN + FP)

(15)

Cross-validation is a technique for assessing how the resultsof a statistical analysis will generalize to an independentdataset It is the standard way of measuring the accuracy ofa learning scheme and it is used to estimate how accuratelya predictive model will perform in practice In this work10-fold cross-validation method is used for improving theclassifier reliability In 10-fold cross-validation the originaldata is divided randomly into 10 parts During each run oneof the partitions is chosen for testing while the remaining

The Scientific World Journal 9

Table 8 Performance comparison of classification algorithms on accuracy rate

Classification Algorithms Average accuracy () Feature selection method

C45 [6]

9911 All features9869 Genetic algorithm9884 Best-first9941 Correlation feature selection

BayesNet [6]

9953 All features9952 Genetic algorithm9891 Best-first9892 Correlation feature selection

ABC-SVM [7] 92768Binary ABCPSO-SVM [7] 8388

GA-SVM [7] 8073

KNN [8] 9824 All features9811 Fast feature selection

Bayes Classifier [8] 7609 All features7194 Fast feature selection

ANN [9] 8157 Feature reductionSSO-RF [10 11] 927 SSOHybrid SSO [12] 9767 SSORSDT [13] 9788 Rough setID3 [13] 97665 All featuresC45 [13] 97582FC-ANN [14] 9671 All features

Proposed MABC-EPSO8859 All features9932 Single feature selection method9982 Random feature selection method

nine-tenths are used for training This process is repeated10 times so that each partition is used for training exactlyonceThe average of the results from the 10-fold gives the testaccuracy of the algorithm [37]

52 Results and Discussions Themain motivation is to showthat the proposed hybrid method has the advantage ofbecoming an efficient classification algorithm based on ABCand PSO To further prove the robustness of the proposedmethod other popular machine learning algorithms [38]such asNaives Bayes (NB)which is a statistical classifier deci-sion tree (j48) radial basis function (RBF) network SupportVectorMachine (SVM) that is based on the statistical learningtheory and basic ABC are tested on KDDCuprsquo99 dataset Foreach classification algorithm their default control parametersare used In Table 8 the results are reported for accuracy rateobtained by various classification algorithms using differentfeature selection methods

The performance comparison of the classifiers on accu-racy rate is given in Figures 3ndash6 The results show thaton classifiying the dataset with all features the averageaccuracy rate of 855 845 and 8859 is obtained forSVM ABC and proposed hybrid approaches When SFSMis applied accuracy rate of ABC and proposed MABC-EPSO

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 3 Accuracy comparison of classifiers for DoS dataset

is increased significantly to 9436 and 9932 The highestaccuracy (9982) is reported when the proposed MABC-EPSO with random feature selection method is employed It

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

2 The Scientific World Journal

that ABC has powerful global search ability but poor localsearch ability [3] while the PSO has powerful local searchability but poor global search ability [4] In order to provide apowerful global search capability and local search capabilityin this paper a hybridized model called MABC-EPSO isproposed which brings the two algorithms together so thatthe computation process may benefit from both advanta-ges In this hybrid algorithm the local search and globalsearch abilities are balanced to obtain more quality resultsKDDCUPrsquo99 intrusion detection dataset developed by MITLincoln Laboratory is used for experiments to find theaccuracy of the proposed hybrid approach

The rest of this paper is organized as follows Section 2provides an overview of related work Section 3 presents theprinciples of PSO and ABC Section 4 describes the method-ology dataset description and preprocessing proposed fea-ture selection methods and hybrid approach Section 5 givesperformance metrics experimental results and discussionsFinally conclusion is given in Section 6

2 Related Work

Being related to achieving real-time intrusion detectionresearchers have investigated several methods of performingfeature selection Kohavi and John [4] described the fea-ture subset selection problem in supervised learning whichinvolves identifying the relevant or useful features in adataset and giving only that subset to the learning algorithmThe real-life intrusion detection dataset contains redundantfeatures or insignificant features The redundant featuresmake it harder to detect possible intrusion patterns [5] Withthe increasing applications of classification algorithms andfeature selection methods for intrusion detection dataset acomprehensive list of a few such literatures is given in [6ndash23]

Machine learning algorithms such as neural networks [9]fuzzy clustering [14] have been applied to IDS to constructgood detection model Support vector machine (SVM) [24]has become a popular researchmethod in intrusion detectiondue to its good generalization performance and the sparserepresentation of solution Satpute et al [25] enhanced theperformance of intrusion detection system by combiningPSO and its variantswithmachine learning techniques for thedetection of anomaly in network intrusion detection systemChung and Wahid [26] proposed a novel simplified swarmoptimization (SSO) algorithm as a rule based classifier andfor feature selection for classifying audio data The algorithmismore flexible and cost-effective to solve complex computingenvironments Revathi and Malathi [10 11] proposed hybridsimplified swarm optimization to preprocess the data andcompared the proposed approach with a new hybridizedapproach PSO with Random Forest and found that theproposed method provided high detection rate and optimalsolution

Karaboga and Basturk [27] proposed Artificial Bee Col-ony (ABC) algorithm based on a particular intelligent behav-iour of honeybee swarms By understanding the basic behav-iour characteristics of foragers ABC algorithm was devel-oped and was compared with that of differential evolution

Particle Swarm Optimization and evolutionary algorithmfor multidimensional and multimodal numeric problemsKaraboga and Akay [28] proposed ABC algorithm for anom-aly-based network intrusion detection system to optimizethe solution The proposed method was classified into fourstages such as parameterization training stage testing stageand detection stage D D Kumar and B Kumar [29] appliedABC algorithm for anomaly-based IDS and used featureselection techniques to reduce the number of features used fordetection and classification Mustafa ServetKiran andMesut-Gunduz [30] proposed a hybridization of PSO and ABCfor different continuous optimization problems in which theinformation between particle swarm and bee colony helpsin increasing global and local search abilities of the hybridapproach

3 Theoretical Background

The following subsections provide the necessary backgroundto understand the problem

31 Particle Swarm Optimization Particle Swarm Optimiza-tion (PSO) is one of the popular heuristic technique whichhas been successfully applied in many different applicationareas but however it suffers from premature convergenceespecially in high dimension multimodal problems

The algorithm of the standard PSO is as follows

(1) Initialize a population of particleswith randomly cho-sen positions and velocities

(2) Calculate the fitness value of each particle in the pop-ulation

(3) If the fitness value of the particle 119894 is better than itspbest value then set the fitness value as a new pbest ofparticle 119894

(4) If pbest is updated and it is better than the currentgbest then set gbest to the current pbest value ofparticle 119894

(5) Update particlersquos velocity andposition according to (1)and (2)

(6) If the best fitness value or the maximum generation ismet then stop the process otherwise repeat the proc-ess from step 2

In PSO a swarm consists of 119873 particles in a D-dimensionalsearching space The 119894th particle is represented as 119883

119894=

(1199091198941 1199091198942 119909119894119889) The best previous position pbest of any

particle is 119875119894= (1199011198941 1199011198942 119901119894119889) and the velocity for particle 119894

is 119881119894= (V1198941 V1198942 V119894119889) The global best particle in the whole

swarm is denoted by 119875119892and it represents the fittest particle

[31] During each iteration each particle updates its velocityaccording to the following equation

V119905119894119889= V119905minus1119894119889

+ 1198881 sdot rand1 sdot (119901119894119889 minus119909119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(1)

The Scientific World Journal 3

where 1198881and 1198882denote the acceleration coefficients 119889 = 1 2

119863 and rand1 and rand2 are random numbers uniformlydistributed within [0 1]

Each particle thenmoves to a new potential position as inthe following equation

119909119905

119894119889= 119909119905minus1119894119889

+ V119905119894119889 119889 = 1 2 119863 (2)

32 Artificial Bee Colony The Artificial Bee Colony (ABC)algorithm is an optimization algorithm based on the intel-ligent foraging behaviour of honey bee swarm proposedby Karaboga and Basturk [27] The Artificial Bee Colonycomprises of three groups scout bees onlooker bees andemployed bees The bee which carries out random search isknown as scout beeThe bee which visits the food source is anemployed beeThe bee which waits on the dance region is anonlooker bee and the onlooker bee with scout is also calledunemployed beeThe employed and unemployed bees searchfor the good food sources around the hive The employedbees share the stored food source information with onlookerbees The amount of food sources is equal to the amount ofemployed bees and also is equal to the number of onlookerbees The solutions of the employed bees which cannotbe enhanced by a fixed number of bounds become scoutsand their solutions are abandoned [28] In the context ofoptimization the amount of food sources in ABC algorithmrepresents the number of solutions in the population Thepoint of a good food source indicates the location of apromising solution to the optimization problem [27]

The four main phases of ABC algorithm are as follows

Initialization Phase The scout bees randomly generate thepopulation size (SN) of food sources The input vector 119909

119898

which contains 119863 variables represents food source where 119863represents the searching space dimension of the objectivefunction to be optimized Using (3) initial sources of foodare produced randomly

119909119898= 119897119894+ rand (0 1) lowast (119906119894 minus 119897119894) (3)

where 119906119894and 119897119894are the upper and lower bounds of the solution

space of objective function and rand(0 1) is a random num-ber within the range [0 1]

Employed Bee Phase The employed bee finds a new foodsource within the region of the food source The employedbees reminisce higher quantity of food source and share itwith onlooker bees Equation (4) determines the neighbourfood source V

119898119894and is calculated by

V119898119894= 119909119898119894+120601119898119894(119909119898119894minus119909119896119894) (4)

where 119894 is a randomly selected parameter index 119909119896is a

randomly selected food source and 120601119898119894

is a random numberwithin the range [minus1 1] Suitable tuning on specific problemscan be made using this parameter range The fitness of foodsources which is needed to find the global optimal solution

is calculated by (5) And a greedy selection method is usedbetween 119909

119898and V119898

fit119894=

1119891119894+ 1

119891119894ge 0

1 + 10038161003816100381610038161198911198941003816100381610038161003816 119891119894lt 0

(5)

where 119891119894represents the objective value of 119894th solution

Onlooker Bee Phase Onlooker bees examine the effectivenessof food sources by observing the waggle dance in the danceregion and then randomly select a rich food sourceThen thebees perform a random search in the neighbourhood areaof food source using (4) The quantity of a food source isevaluated by its profitability 119875

119894using the following equation

119901119894=

fit119894

sumSN119899=1 fit119899

(6)

where fit119894denotes the fitness of the solution represented by

food source 119894 and SNdenotes the total number of food sourceswhich is equal to number of employed bees

Scout Phase If the effectiveness of food source cannot beimproved by the fixed number of trials then the scout beesremove the solutions and randomly search for new solutionsby using (3) [29]

The pseudocode of the ABC algorithm is given inAlgorithm 1

4 Methodology

41 Research Framework In this study the framework of theproposed work is given as follows

(i) Data preprocessing it prepared the data for classi-fication and removed unused features and duplicateinstances

(ii) Feature selection it determined the feature subsetusing SFSMandRFSMmethods that contribute to theclassification

(iii) Hybrid classification it performed classification usingMABC-EPSO algorithm to enhance the classificationaccuracy for the KDDCUPrsquo99 dataset

The objective of this study is to help the network administra-tor in preprocessing the network data using feature selectionmethods and to perform classification using hybrid algorithmwhich aims to fit a classifier model to the prescribed data

42 Data Source and Dataset Description In this section weprovide brief description of KDDCuprsquo99 dataset [30] which isderived fromUCIMachine Learning Repository [31] In 1998DARPA intrusion detection evaluation program to performa comparison of various intrusion detection methods asimulated environment was set up by the MIT Lincoln Labto obtain raw TCPIP dump data for a local-area network(LAN) The functioning of the environment was like a realone which included both background network traffic and

4 The Scientific World Journal

Input initial solutionsOutput optimal solutionBEGINGenerate the initial population 119909

119898119898 = 1 2 SN

Evaluate the fitness (fit119894) of the population

set cycle = 1repeatFOR (employed phase)

Produce a new solution V119898using (4)

Calculate fit119894

Apply greedy selection processCalculate the probability 119875

119894using (6)

FOR (onlooker phase) Select a solution 119909

119898depending on 119875

119894

Produce a new solution V119898

Calculate fit119894

Apply greedy selection processIf (scout phase) There is an abandoned solution for the scout depending on limitThen Replace it with a new solution which will by randomly produced by (3)

Memorize the best solution so farcycle = cycle + 1Until cycle = MCNend

Algorithm 1 Artificial Bee Colony

Table 1 Distribution of connection types in 10 KDDCuprsquo99 data-set

Label of occurrence

DoS Probe U2R R2L Totalattack

Totalnormal

Training data 7924 083 001 023 8031 1969Testing data 7390 134 007 520 8151 1949

wide variety of attacks A version of 1998 DARPA datasetKDDCuprsquo99 is now widely accepted as a standard bench-mark dataset and received much attention in the researchcommunity of intrusion detection The main motivation ofusing KDDCuprsquo99 Dataset is to show that the proposedmethod has the advantage of becoming an efficient classi-fication algorithm when applied to the intrusion detectionsystem In this paper 10 KDD Cuprsquo99 dataset is used forexperimentation The distribution of connection types andsample size in 10 KDDCUP dataset is shown in Tables 1and 2 The feature information of 10 KDDCUP dataset isshown in Table 3 The dataset consists of one type of normaldata and 22 different attack types categorized into 4 classesnamely denial of service (DoS) Probe user-to-root (U2R)and remote-to-login (R2L)

43 Data Preprocessing Data preprocessing is the time-consuming task which prepares the data for subsequentanalysis as per the requirement for intrusion detection systemmodel The main aim of data preprocessing is to transformthe raw network data into suitable form for further analysisFigure 1 illustrates the steps involved in data processing and

Table 2 Sample size in 10 KDDCUP dataset

Category of attack Attack nameNormal Normal (97277)

DoSNeptune (107201) Smurf (280790) Pod(264) Teardrop (979) Land (21) Back(2203)

Probe Portsweep (1040) IPsweep (1247) Nmap(231) Satan (1589)

U2R Bufferoverflow (30) LoadModule (9)Perl (3) Rootkit (10)

R2LGuesspassword (53) Ftpwrite (8) Imap(12) Phf (4) Multihop (7) Warezmaster(20) Warezclient (1020)

Table 3 Feature information of 10 KDDCUP dataset

Dataset characteristics MultivariateAttribute characteristics Categorical integerAssociated task ClassificationArea ComputerNumber of instances 494020Number of attributes 42Number of classes 1 normal class 4 attack classes

how raw input data are processed for further statistical meas-ures

Various statistical analyses such as feature selectiondimensionality reduction and normalization are essential toselect significant features from the dataset If the dataset con-tains duplicate instances then the classification algorithms

The Scientific World Journal 5

Data preprocessing Data analysis

Network audit data

Association mining

Classification

Clustering

Fill missing value

Remove duplicate instances

Feature selection or dimensionality reduction

Alarmalert

Figure 1 Data preprocessing

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBeginLet119883 = 119909

1 1199092 119909

119899 where 119899 represents the number of features in the dataset

for 119894 = 1 2 119899

119883(119894) = 119909

(119894)one dimensional feature vector

Apply SVM classifier

Sort features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold thenSelect the featuresEnd

Algorithm 2 Single feature selection method

Table 4 Details of instances in the dataset

Before removingduplicates

After removingduplicates Selected instances

Normal 97277 87832 8783DoS 391458 54572 7935Probe 4107 2131 2131U2R 52 52 52R2L 1126 999 999Total 494020 145586 19900

consume more time and also provide inefficient results Toachieve more accurate and efficient model duplication elim-ination is needed The main deficiency in this dataset is thelarge number of redundant instances This large amount ofduplicate instances will make learning algorithms be partialtowards the frequently occurring instances and will inhibit itfrom learning infrequent instances which are generally moreunsafe to networks Also the existence of these duplicateinstances will cause the evaluation results to be biased by themethods which have better detection rates on the frequentlyoccurring instances [32] Eliminating duplicate instanceshelps in reducing false-positive rate for intrusion detectionHence duplicate instances are removed so the classifiers willnot be partial towards more frequently occurring instancesThe details of instances in the dataset are shown in Table 4After preprocessing selected random sample of 10 normaldata and 10Neptune attack inDoS class and four new sets ofdata are generated with the normal class and four categoriesof attack [33] Moreover irrelevant and redundant attributes

of intrusion detection dataset may lead to complex intrusiondetection model and reduce detection accuracy

44 Feature Selection Feature selection is an important dataprocessing process As the dataset is large it is essentialto remove the insignificant features in order to distinguishnormal traffic or intrusions in a well-timed manner In thispaper feature subsets are formed based on single featuremethod (SFSM) random feature selection method (RFSM)and compared the two techniques The proposed methodsreduce the features in the datasets which aim to improveaccuracy rate reduce processing time and improve efficiencyfor intrusion detection

441 Single Feature Selection Method Single feature method(SFSM) uses the one-dimensional feature vector In thefirst iteration it considers only the first attribute and isevaluated for calculating the accuracy using the SupportVectorMachine classifier In the second iteration it considersonly the corresponding attribute for evaluation The processis repeated until all 41 features are evaluated After calculatingthe entire featurersquos efficiency it is sorted and vital features areselected whose accuracy and detection rate are acc thresholdand dr threshold values respectively The pseudocode ofsingle feature selection algorithm is given in Algorithm 2

442 Random Feature Selection Method In this methodthe features are removed randomly and evaluated using theclassifier In the first iteration all the features are evaluatedusing SVM classifier and then by deleting one feature updatethe dataset using the classifier efficiency The importance of

6 The Scientific World Journal

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBegin

Let119883 = 1199091 1199092 119909

119899 where 119899 represents the number of features in the dataset

Let 119878 = 119883forall119899

119894minus1119883 doDelete 119909

119894from119883

119878 = 119878 minus 119909119894update feature subset

Apply SVM classifierDelete 119909

119894from119883

endSort the features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold then119878 = 119878 minus 119909

119894selecting vital features

End

Algorithm 3 Random feature selection method

Table 5 List of features selected using SFSM methods

Dataset Selected features Number offeatures

DoS + 10 normal24 32 41 28 40 27 3435 5 17 21 4 39 11 9 714 1 30 6

20

Probe + 10 normal 11 1 15 26 10 4 21 1819 25 39 31 7 35 28 15

R2L + 10 normal16 26 30 3 7 21 6 1412 35 32 18 38 17 4110 31

17

U2R + 10 normal 27 40 26 1 34 41 7 1828 3 20 37 11 13

Table 6 List of features selected using RFSMmethods

Dataset Selected features Number offeatures

DoS + 10 normal 4 9 21 39 14 28 3 829 33 17 12 38 31 14

Probe + 10 normal 27 2 3 30 11 33 23 939 20 21 37 12

R2L + 10 normal 24 15 23 7 25 16 8 3329 38 21 30 32 13

U2R + 10 normal 6 19 22 30 21 28 3627 11 17 20 11

the provided feature is calculated In the second iterationanother feature is removed randomly from the dataset andupdated The process is repeated until only one feature isleft After calculating the entire featurersquos efficiency it is sortedin descending order of its accuracy If the accuracy anddetection rate are greater than the threshold value (accuracyand detection rate obtained using all features) then selectthose features as vital featuresThepseudocode of the randomfeature selection algorithm is given in Algorithm 3

Tables 5 and 6 show the feature subsets identified usingthe two feature selection methods and size of the subsetsidentified as a percentage of the full feature set

45 Hybrid Classification Approach Artificial intelligenceandmachine learning techniques were used to build differentIDSs but they have shown limitations in achieving highdetection accuracy and fast processing time Computationalintelligence techniques known for their ability to adaptand to exhibit fault tolerance high computational speedand resilience against noisy information compensate for thelimitations of these approaches [1] Our aim is to increase thelevel of performance of intrusion detection of the most usedclassification techniques nowadays by using optimizationmethods like PSO andABCThis work develops an algorithmthat combines the logic of both ABC and PSO to producea high performance IDS and their combination has theadvantage of providing amore reliable solution to todayrsquos dataintensive computing processes

Artificial Bee Colony algorithm is a newly proposed opti-mization algorithm and is becoming a hot topic in computa-tional intelligence nowadays Because its high probability ofavoiding the local optima it can make up the disadvantage ofParticle Swarm Optimization algorithm Moreover ParticleSwarm Optimization Algorithm can help us to find out theoptimal solutionmore easily In such circumstances we bringthe two algorithms together so that the computation processmay benefit from both of the advantagesThe flowchart of theproposed hybrid MABC-EPSO is given in Figure 2

In this hybrid model the colony is divided into twoparts one possesses the swarm intelligence of Artificial BeeColony and the other one is the particle swarm intelligenceAssuming that there is cooperation between the two parts ineach iteration one part which finds out the better solutionwill share its achievement with the other part The inferiorsolution will be replaced by the better solution and will besubstituted in the next iterationThe process ofMABC-EPSOis as follows

Step 1 (initialization of parameters) Set the number of indi-viduals of the swarm set the maximum circle index of thealgorithm set the search range of the solution set the otherconstants needed in both ABC and PSO

The Scientific World Journal 7

No

Yes

Initialize the parameters of EPSO and MABC

Evaluate the fitness value

Calculate the particle Employed bee phase

Update particle positions

Update the best

Onlooker bee phase

Scout bee phase

Determine the best of MABC

Is the termination condition satisfied

Select the best solution

EPSO MABC

Data preprocessing

Feature selection using SFSM and RFSM

Network audit data

Determine the gbest of EPSO and the best of MABC

Determine the pbest and gbest of EPSO

Figure 2 Flowchart of the proposed hybrid MABC-EPSO model

Step 2 (initialization of the colony) Generate a colony witha specific number of individuals Bee colony is divided intotwo categories employed foragers and unemployed foragersaccording to each individualrsquos fitness value on the other handas a particle swarm calculate the fitness value of each particleand take the best location as the global best location

Step 3 In bee colony to evaluate the fitness value of eachsolution an employee bee is assigned using (5)The employeebee selects a new candidate solution from the nearby foodsources and then uses greedy selectionmethod by calculatingthe Rastrigin function as follows

Min119891 (119909) = 10119899 +

119899

sum

119894=1[1199092

119894minus 10 cos (2120587119909

119894)] (7)

Amultimodal function is said to contain more than one localoptimum A function of variables is separable if it can bemodified as a sum of functions of just one variable [34] Thedimensionality of the search space is another significant fac-tor in the complexity of the problem The challenge involvedin finding optimal solutions to this function is that on theway towards the global optimum an optimization problemcan be easily confined in a local optimumHence the classicalbenchmark function Rastrigin [34] is implemented using

Artificial Bee Colony algorithm and named as ModifiedArtificial BeeColony (MABC) algorithm In (1)119891

119894is Rastrigin

function whose value is 0 at its global minimum (0 0 0)This function is chosen because it is considered to be oneof the best test functions for finding the global minimumInitialization range for the function is [minus15 15]This functionis with cosine modulation to produce many local minimaThus the function is multimodal

Step 4 If the fitness value is larger than the earlier one thebee remembers the new point and forgets the previous oneotherwise it keeps the previous solution Based on the sharedinformation by employee bees an onlooker bee calculatesthe shared fitness value and selects a food source with aprobability value computed as in (6)

Step 5 An onlooker bee constructs a new solution selectedamong the neighbors of a previous solution It also checksthe fitness value and if this value is better than the previousone it will substitute the old one with the new positionotherwise it evokes the old position The objective of scoutbees is to determine new random food sources to substitutethe solutions that cannot be enhanced after reaching theldquolimitrdquo value In order to obtain the best optimized solutionthe algorithm goes through a predefined number of cycles

8 The Scientific World Journal

(MCN) After all the choices have been made the best solu-tion generated in that iteration is called MABCbest

Step 6 As there is a large effect of initial velocity in the bal-ancing of exploration and exploitation process of swarmin this proposed Enhanced Particle Swarm Optimization(EPSO) algorithm inertia weight (120596) [35] is used to controlthe velocity and hence the velocity update equation (8)becomes as follows

V119905119894119889= 120596 sdot V119905minus1

119894119889+ 1198881 sdot rand1 sdot (119901119894119889 minus119909

119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(8)

A small inertia weight facilitates a local search whereas alarge inertia weight facilitates a global search In the EPSOalgorithm linear decreasing inertia weight [36] as in (9) isused to enhance the efficiency and performance of PSO Itis found experimentally that inertia weight from 09 to 04provides the optimal results

119908119896= 119908max minus

119908max minus 119908minitermax

times 119896 (9)

In particle swarm after the comparison among the solutionsthat each particle has experienced and the comparison amongthe solutions that all the particles have ever experienced thebest location in that iteration is called EPSObest

Step 7 Theminimum of the value MABCbest and EPSObest iscalled Best and is defined as

Best =

EPSObest if EPSObest le MABCbest

MABCbest if MABCbest le EPSObest(10)

Step 8 If the termination condition is satisfied then end theprocess and report the best solution Otherwise return toStep 2

Parameter Settings The algorithms are evaluated using thetwo feature sets selected by SFSM and RFSM In ABC algo-rithm the parameters set are bee colony size 40 MCN 500and limit 5 In EPSO algorithm the inertia weight 120596 in (11)varies from 09 to 07 linearly with the iterations Also theacceleration coefficients 119888

1and 1198882are set as 2 The upper and

lower bounds for V(Vmin Vmax) are set as the maximum upperand lower bounds of 119909

V119905119894119889= 120596V119905minus1119894119889

+ 1198881rand (0 1) (119901119894119889 minus119909119905minus1119894119889)

+ 1198882rand (0 1) (119901119892119889 minus119909119905minus1119894119889)

(11)

5 Experimental Work

This section provides the performance metrics that are usedto assess the efficiency of the proposed approach It alsopresents and analyzes the experimental results of hybridapproach and compares it with the other classifiers

Table 7 Confusion matrix

Actual PredictedNormal Attack

Normal True Negative (TN) False Positive (FP)Attack False Negative (FN) True Positive (TP)True Positive (TP) the number of of attacks that are correctly identifiedTrue Negative (TN) the number of normal records that are correctlyclassifiedFalse Positive (FP) the number of normal records incorrectly classifiedFalse Negative (FN) the number of attacks incorrectly classified

51 PerformanceMetrics Theperformancemetrics like accu-racy sensitivity specificity false alarm rate and trainingtime are recorded for the intrusion detection dataset onapplying the proposed MABC-PSO classification algorithmGenerally sensitivity and specificity are the statistical mea-sures used to carry out the performance of classificationalgorithms Hence sensitivity and specificity are chosen to bethe parametric indices for carrying out the classification taskIn intrusion detection problem sensitivity can also be calleddetection rate The number of instances predicted correctlyor incorrectly by a classification model is summarized in aconfusion matrix and is shown in Table 7

The classification accuracy is the percentage of the overallnumber of connections correctly classified

Classification accuracy = (TP + TN)(TP + TN + FP + FN)

(12)

Sensitivity (True Positive Fraction) is the percentage of thenumber of attack connections correctly classified in the test-ing dataset

Sensitivity = TP(TP + FN)

(13)

Specificity (True Negative Fraction) is the percentage of thenumber of normal connections correctly classified in the test-ing dataset

Specificity = TN(TP + FN)

(14)

False alarm rate (FAR) is the percentage of the number of nor-mal connections incorrectly classified in the testing andtraining dataset

False Alarm Rate (FAR) = FP(TN + FP)

(15)

Cross-validation is a technique for assessing how the resultsof a statistical analysis will generalize to an independentdataset It is the standard way of measuring the accuracy ofa learning scheme and it is used to estimate how accuratelya predictive model will perform in practice In this work10-fold cross-validation method is used for improving theclassifier reliability In 10-fold cross-validation the originaldata is divided randomly into 10 parts During each run oneof the partitions is chosen for testing while the remaining

The Scientific World Journal 9

Table 8 Performance comparison of classification algorithms on accuracy rate

Classification Algorithms Average accuracy () Feature selection method

C45 [6]

9911 All features9869 Genetic algorithm9884 Best-first9941 Correlation feature selection

BayesNet [6]

9953 All features9952 Genetic algorithm9891 Best-first9892 Correlation feature selection

ABC-SVM [7] 92768Binary ABCPSO-SVM [7] 8388

GA-SVM [7] 8073

KNN [8] 9824 All features9811 Fast feature selection

Bayes Classifier [8] 7609 All features7194 Fast feature selection

ANN [9] 8157 Feature reductionSSO-RF [10 11] 927 SSOHybrid SSO [12] 9767 SSORSDT [13] 9788 Rough setID3 [13] 97665 All featuresC45 [13] 97582FC-ANN [14] 9671 All features

Proposed MABC-EPSO8859 All features9932 Single feature selection method9982 Random feature selection method

nine-tenths are used for training This process is repeated10 times so that each partition is used for training exactlyonceThe average of the results from the 10-fold gives the testaccuracy of the algorithm [37]

52 Results and Discussions Themain motivation is to showthat the proposed hybrid method has the advantage ofbecoming an efficient classification algorithm based on ABCand PSO To further prove the robustness of the proposedmethod other popular machine learning algorithms [38]such asNaives Bayes (NB)which is a statistical classifier deci-sion tree (j48) radial basis function (RBF) network SupportVectorMachine (SVM) that is based on the statistical learningtheory and basic ABC are tested on KDDCuprsquo99 dataset Foreach classification algorithm their default control parametersare used In Table 8 the results are reported for accuracy rateobtained by various classification algorithms using differentfeature selection methods

The performance comparison of the classifiers on accu-racy rate is given in Figures 3ndash6 The results show thaton classifiying the dataset with all features the averageaccuracy rate of 855 845 and 8859 is obtained forSVM ABC and proposed hybrid approaches When SFSMis applied accuracy rate of ABC and proposed MABC-EPSO

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 3 Accuracy comparison of classifiers for DoS dataset

is increased significantly to 9436 and 9932 The highestaccuracy (9982) is reported when the proposed MABC-EPSO with random feature selection method is employed It

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

The Scientific World Journal 3

where 1198881and 1198882denote the acceleration coefficients 119889 = 1 2

119863 and rand1 and rand2 are random numbers uniformlydistributed within [0 1]

Each particle thenmoves to a new potential position as inthe following equation

119909119905

119894119889= 119909119905minus1119894119889

+ V119905119894119889 119889 = 1 2 119863 (2)

32 Artificial Bee Colony The Artificial Bee Colony (ABC)algorithm is an optimization algorithm based on the intel-ligent foraging behaviour of honey bee swarm proposedby Karaboga and Basturk [27] The Artificial Bee Colonycomprises of three groups scout bees onlooker bees andemployed bees The bee which carries out random search isknown as scout beeThe bee which visits the food source is anemployed beeThe bee which waits on the dance region is anonlooker bee and the onlooker bee with scout is also calledunemployed beeThe employed and unemployed bees searchfor the good food sources around the hive The employedbees share the stored food source information with onlookerbees The amount of food sources is equal to the amount ofemployed bees and also is equal to the number of onlookerbees The solutions of the employed bees which cannotbe enhanced by a fixed number of bounds become scoutsand their solutions are abandoned [28] In the context ofoptimization the amount of food sources in ABC algorithmrepresents the number of solutions in the population Thepoint of a good food source indicates the location of apromising solution to the optimization problem [27]

The four main phases of ABC algorithm are as follows

Initialization Phase The scout bees randomly generate thepopulation size (SN) of food sources The input vector 119909

119898

which contains 119863 variables represents food source where 119863represents the searching space dimension of the objectivefunction to be optimized Using (3) initial sources of foodare produced randomly

119909119898= 119897119894+ rand (0 1) lowast (119906119894 minus 119897119894) (3)

where 119906119894and 119897119894are the upper and lower bounds of the solution

space of objective function and rand(0 1) is a random num-ber within the range [0 1]

Employed Bee Phase The employed bee finds a new foodsource within the region of the food source The employedbees reminisce higher quantity of food source and share itwith onlooker bees Equation (4) determines the neighbourfood source V

119898119894and is calculated by

V119898119894= 119909119898119894+120601119898119894(119909119898119894minus119909119896119894) (4)

where 119894 is a randomly selected parameter index 119909119896is a

randomly selected food source and 120601119898119894

is a random numberwithin the range [minus1 1] Suitable tuning on specific problemscan be made using this parameter range The fitness of foodsources which is needed to find the global optimal solution

is calculated by (5) And a greedy selection method is usedbetween 119909

119898and V119898

fit119894=

1119891119894+ 1

119891119894ge 0

1 + 10038161003816100381610038161198911198941003816100381610038161003816 119891119894lt 0

(5)

where 119891119894represents the objective value of 119894th solution

Onlooker Bee Phase Onlooker bees examine the effectivenessof food sources by observing the waggle dance in the danceregion and then randomly select a rich food sourceThen thebees perform a random search in the neighbourhood areaof food source using (4) The quantity of a food source isevaluated by its profitability 119875

119894using the following equation

119901119894=

fit119894

sumSN119899=1 fit119899

(6)

where fit119894denotes the fitness of the solution represented by

food source 119894 and SNdenotes the total number of food sourceswhich is equal to number of employed bees

Scout Phase If the effectiveness of food source cannot beimproved by the fixed number of trials then the scout beesremove the solutions and randomly search for new solutionsby using (3) [29]

The pseudocode of the ABC algorithm is given inAlgorithm 1

4 Methodology

41 Research Framework In this study the framework of theproposed work is given as follows

(i) Data preprocessing it prepared the data for classi-fication and removed unused features and duplicateinstances

(ii) Feature selection it determined the feature subsetusing SFSMandRFSMmethods that contribute to theclassification

(iii) Hybrid classification it performed classification usingMABC-EPSO algorithm to enhance the classificationaccuracy for the KDDCUPrsquo99 dataset

The objective of this study is to help the network administra-tor in preprocessing the network data using feature selectionmethods and to perform classification using hybrid algorithmwhich aims to fit a classifier model to the prescribed data

42 Data Source and Dataset Description In this section weprovide brief description of KDDCuprsquo99 dataset [30] which isderived fromUCIMachine Learning Repository [31] In 1998DARPA intrusion detection evaluation program to performa comparison of various intrusion detection methods asimulated environment was set up by the MIT Lincoln Labto obtain raw TCPIP dump data for a local-area network(LAN) The functioning of the environment was like a realone which included both background network traffic and

4 The Scientific World Journal

Input initial solutionsOutput optimal solutionBEGINGenerate the initial population 119909

119898119898 = 1 2 SN

Evaluate the fitness (fit119894) of the population

set cycle = 1repeatFOR (employed phase)

Produce a new solution V119898using (4)

Calculate fit119894

Apply greedy selection processCalculate the probability 119875

119894using (6)

FOR (onlooker phase) Select a solution 119909

119898depending on 119875

119894

Produce a new solution V119898

Calculate fit119894

Apply greedy selection processIf (scout phase) There is an abandoned solution for the scout depending on limitThen Replace it with a new solution which will by randomly produced by (3)

Memorize the best solution so farcycle = cycle + 1Until cycle = MCNend

Algorithm 1 Artificial Bee Colony

Table 1 Distribution of connection types in 10 KDDCuprsquo99 data-set

Label of occurrence

DoS Probe U2R R2L Totalattack

Totalnormal

Training data 7924 083 001 023 8031 1969Testing data 7390 134 007 520 8151 1949

wide variety of attacks A version of 1998 DARPA datasetKDDCuprsquo99 is now widely accepted as a standard bench-mark dataset and received much attention in the researchcommunity of intrusion detection The main motivation ofusing KDDCuprsquo99 Dataset is to show that the proposedmethod has the advantage of becoming an efficient classi-fication algorithm when applied to the intrusion detectionsystem In this paper 10 KDD Cuprsquo99 dataset is used forexperimentation The distribution of connection types andsample size in 10 KDDCUP dataset is shown in Tables 1and 2 The feature information of 10 KDDCUP dataset isshown in Table 3 The dataset consists of one type of normaldata and 22 different attack types categorized into 4 classesnamely denial of service (DoS) Probe user-to-root (U2R)and remote-to-login (R2L)

43 Data Preprocessing Data preprocessing is the time-consuming task which prepares the data for subsequentanalysis as per the requirement for intrusion detection systemmodel The main aim of data preprocessing is to transformthe raw network data into suitable form for further analysisFigure 1 illustrates the steps involved in data processing and

Table 2 Sample size in 10 KDDCUP dataset

Category of attack Attack nameNormal Normal (97277)

DoSNeptune (107201) Smurf (280790) Pod(264) Teardrop (979) Land (21) Back(2203)

Probe Portsweep (1040) IPsweep (1247) Nmap(231) Satan (1589)

U2R Bufferoverflow (30) LoadModule (9)Perl (3) Rootkit (10)

R2LGuesspassword (53) Ftpwrite (8) Imap(12) Phf (4) Multihop (7) Warezmaster(20) Warezclient (1020)

Table 3 Feature information of 10 KDDCUP dataset

Dataset characteristics MultivariateAttribute characteristics Categorical integerAssociated task ClassificationArea ComputerNumber of instances 494020Number of attributes 42Number of classes 1 normal class 4 attack classes

how raw input data are processed for further statistical meas-ures

Various statistical analyses such as feature selectiondimensionality reduction and normalization are essential toselect significant features from the dataset If the dataset con-tains duplicate instances then the classification algorithms

The Scientific World Journal 5

Data preprocessing Data analysis

Network audit data

Association mining

Classification

Clustering

Fill missing value

Remove duplicate instances

Feature selection or dimensionality reduction

Alarmalert

Figure 1 Data preprocessing

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBeginLet119883 = 119909

1 1199092 119909

119899 where 119899 represents the number of features in the dataset

for 119894 = 1 2 119899

119883(119894) = 119909

(119894)one dimensional feature vector

Apply SVM classifier

Sort features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold thenSelect the featuresEnd

Algorithm 2 Single feature selection method

Table 4 Details of instances in the dataset

Before removingduplicates

After removingduplicates Selected instances

Normal 97277 87832 8783DoS 391458 54572 7935Probe 4107 2131 2131U2R 52 52 52R2L 1126 999 999Total 494020 145586 19900

consume more time and also provide inefficient results Toachieve more accurate and efficient model duplication elim-ination is needed The main deficiency in this dataset is thelarge number of redundant instances This large amount ofduplicate instances will make learning algorithms be partialtowards the frequently occurring instances and will inhibit itfrom learning infrequent instances which are generally moreunsafe to networks Also the existence of these duplicateinstances will cause the evaluation results to be biased by themethods which have better detection rates on the frequentlyoccurring instances [32] Eliminating duplicate instanceshelps in reducing false-positive rate for intrusion detectionHence duplicate instances are removed so the classifiers willnot be partial towards more frequently occurring instancesThe details of instances in the dataset are shown in Table 4After preprocessing selected random sample of 10 normaldata and 10Neptune attack inDoS class and four new sets ofdata are generated with the normal class and four categoriesof attack [33] Moreover irrelevant and redundant attributes

of intrusion detection dataset may lead to complex intrusiondetection model and reduce detection accuracy

44 Feature Selection Feature selection is an important dataprocessing process As the dataset is large it is essentialto remove the insignificant features in order to distinguishnormal traffic or intrusions in a well-timed manner In thispaper feature subsets are formed based on single featuremethod (SFSM) random feature selection method (RFSM)and compared the two techniques The proposed methodsreduce the features in the datasets which aim to improveaccuracy rate reduce processing time and improve efficiencyfor intrusion detection

441 Single Feature Selection Method Single feature method(SFSM) uses the one-dimensional feature vector In thefirst iteration it considers only the first attribute and isevaluated for calculating the accuracy using the SupportVectorMachine classifier In the second iteration it considersonly the corresponding attribute for evaluation The processis repeated until all 41 features are evaluated After calculatingthe entire featurersquos efficiency it is sorted and vital features areselected whose accuracy and detection rate are acc thresholdand dr threshold values respectively The pseudocode ofsingle feature selection algorithm is given in Algorithm 2

442 Random Feature Selection Method In this methodthe features are removed randomly and evaluated using theclassifier In the first iteration all the features are evaluatedusing SVM classifier and then by deleting one feature updatethe dataset using the classifier efficiency The importance of

6 The Scientific World Journal

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBegin

Let119883 = 1199091 1199092 119909

119899 where 119899 represents the number of features in the dataset

Let 119878 = 119883forall119899

119894minus1119883 doDelete 119909

119894from119883

119878 = 119878 minus 119909119894update feature subset

Apply SVM classifierDelete 119909

119894from119883

endSort the features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold then119878 = 119878 minus 119909

119894selecting vital features

End

Algorithm 3 Random feature selection method

Table 5 List of features selected using SFSM methods

Dataset Selected features Number offeatures

DoS + 10 normal24 32 41 28 40 27 3435 5 17 21 4 39 11 9 714 1 30 6

20

Probe + 10 normal 11 1 15 26 10 4 21 1819 25 39 31 7 35 28 15

R2L + 10 normal16 26 30 3 7 21 6 1412 35 32 18 38 17 4110 31

17

U2R + 10 normal 27 40 26 1 34 41 7 1828 3 20 37 11 13

Table 6 List of features selected using RFSMmethods

Dataset Selected features Number offeatures

DoS + 10 normal 4 9 21 39 14 28 3 829 33 17 12 38 31 14

Probe + 10 normal 27 2 3 30 11 33 23 939 20 21 37 12

R2L + 10 normal 24 15 23 7 25 16 8 3329 38 21 30 32 13

U2R + 10 normal 6 19 22 30 21 28 3627 11 17 20 11

the provided feature is calculated In the second iterationanother feature is removed randomly from the dataset andupdated The process is repeated until only one feature isleft After calculating the entire featurersquos efficiency it is sortedin descending order of its accuracy If the accuracy anddetection rate are greater than the threshold value (accuracyand detection rate obtained using all features) then selectthose features as vital featuresThepseudocode of the randomfeature selection algorithm is given in Algorithm 3

Tables 5 and 6 show the feature subsets identified usingthe two feature selection methods and size of the subsetsidentified as a percentage of the full feature set

45 Hybrid Classification Approach Artificial intelligenceandmachine learning techniques were used to build differentIDSs but they have shown limitations in achieving highdetection accuracy and fast processing time Computationalintelligence techniques known for their ability to adaptand to exhibit fault tolerance high computational speedand resilience against noisy information compensate for thelimitations of these approaches [1] Our aim is to increase thelevel of performance of intrusion detection of the most usedclassification techniques nowadays by using optimizationmethods like PSO andABCThis work develops an algorithmthat combines the logic of both ABC and PSO to producea high performance IDS and their combination has theadvantage of providing amore reliable solution to todayrsquos dataintensive computing processes

Artificial Bee Colony algorithm is a newly proposed opti-mization algorithm and is becoming a hot topic in computa-tional intelligence nowadays Because its high probability ofavoiding the local optima it can make up the disadvantage ofParticle Swarm Optimization algorithm Moreover ParticleSwarm Optimization Algorithm can help us to find out theoptimal solutionmore easily In such circumstances we bringthe two algorithms together so that the computation processmay benefit from both of the advantagesThe flowchart of theproposed hybrid MABC-EPSO is given in Figure 2

In this hybrid model the colony is divided into twoparts one possesses the swarm intelligence of Artificial BeeColony and the other one is the particle swarm intelligenceAssuming that there is cooperation between the two parts ineach iteration one part which finds out the better solutionwill share its achievement with the other part The inferiorsolution will be replaced by the better solution and will besubstituted in the next iterationThe process ofMABC-EPSOis as follows

Step 1 (initialization of parameters) Set the number of indi-viduals of the swarm set the maximum circle index of thealgorithm set the search range of the solution set the otherconstants needed in both ABC and PSO

The Scientific World Journal 7

No

Yes

Initialize the parameters of EPSO and MABC

Evaluate the fitness value

Calculate the particle Employed bee phase

Update particle positions

Update the best

Onlooker bee phase

Scout bee phase

Determine the best of MABC

Is the termination condition satisfied

Select the best solution

EPSO MABC

Data preprocessing

Feature selection using SFSM and RFSM

Network audit data

Determine the gbest of EPSO and the best of MABC

Determine the pbest and gbest of EPSO

Figure 2 Flowchart of the proposed hybrid MABC-EPSO model

Step 2 (initialization of the colony) Generate a colony witha specific number of individuals Bee colony is divided intotwo categories employed foragers and unemployed foragersaccording to each individualrsquos fitness value on the other handas a particle swarm calculate the fitness value of each particleand take the best location as the global best location

Step 3 In bee colony to evaluate the fitness value of eachsolution an employee bee is assigned using (5)The employeebee selects a new candidate solution from the nearby foodsources and then uses greedy selectionmethod by calculatingthe Rastrigin function as follows

Min119891 (119909) = 10119899 +

119899

sum

119894=1[1199092

119894minus 10 cos (2120587119909

119894)] (7)

Amultimodal function is said to contain more than one localoptimum A function of variables is separable if it can bemodified as a sum of functions of just one variable [34] Thedimensionality of the search space is another significant fac-tor in the complexity of the problem The challenge involvedin finding optimal solutions to this function is that on theway towards the global optimum an optimization problemcan be easily confined in a local optimumHence the classicalbenchmark function Rastrigin [34] is implemented using

Artificial Bee Colony algorithm and named as ModifiedArtificial BeeColony (MABC) algorithm In (1)119891

119894is Rastrigin

function whose value is 0 at its global minimum (0 0 0)This function is chosen because it is considered to be oneof the best test functions for finding the global minimumInitialization range for the function is [minus15 15]This functionis with cosine modulation to produce many local minimaThus the function is multimodal

Step 4 If the fitness value is larger than the earlier one thebee remembers the new point and forgets the previous oneotherwise it keeps the previous solution Based on the sharedinformation by employee bees an onlooker bee calculatesthe shared fitness value and selects a food source with aprobability value computed as in (6)

Step 5 An onlooker bee constructs a new solution selectedamong the neighbors of a previous solution It also checksthe fitness value and if this value is better than the previousone it will substitute the old one with the new positionotherwise it evokes the old position The objective of scoutbees is to determine new random food sources to substitutethe solutions that cannot be enhanced after reaching theldquolimitrdquo value In order to obtain the best optimized solutionthe algorithm goes through a predefined number of cycles

8 The Scientific World Journal

(MCN) After all the choices have been made the best solu-tion generated in that iteration is called MABCbest

Step 6 As there is a large effect of initial velocity in the bal-ancing of exploration and exploitation process of swarmin this proposed Enhanced Particle Swarm Optimization(EPSO) algorithm inertia weight (120596) [35] is used to controlthe velocity and hence the velocity update equation (8)becomes as follows

V119905119894119889= 120596 sdot V119905minus1

119894119889+ 1198881 sdot rand1 sdot (119901119894119889 minus119909

119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(8)

A small inertia weight facilitates a local search whereas alarge inertia weight facilitates a global search In the EPSOalgorithm linear decreasing inertia weight [36] as in (9) isused to enhance the efficiency and performance of PSO Itis found experimentally that inertia weight from 09 to 04provides the optimal results

119908119896= 119908max minus

119908max minus 119908minitermax

times 119896 (9)

In particle swarm after the comparison among the solutionsthat each particle has experienced and the comparison amongthe solutions that all the particles have ever experienced thebest location in that iteration is called EPSObest

Step 7 Theminimum of the value MABCbest and EPSObest iscalled Best and is defined as

Best =

EPSObest if EPSObest le MABCbest

MABCbest if MABCbest le EPSObest(10)

Step 8 If the termination condition is satisfied then end theprocess and report the best solution Otherwise return toStep 2

Parameter Settings The algorithms are evaluated using thetwo feature sets selected by SFSM and RFSM In ABC algo-rithm the parameters set are bee colony size 40 MCN 500and limit 5 In EPSO algorithm the inertia weight 120596 in (11)varies from 09 to 07 linearly with the iterations Also theacceleration coefficients 119888

1and 1198882are set as 2 The upper and

lower bounds for V(Vmin Vmax) are set as the maximum upperand lower bounds of 119909

V119905119894119889= 120596V119905minus1119894119889

+ 1198881rand (0 1) (119901119894119889 minus119909119905minus1119894119889)

+ 1198882rand (0 1) (119901119892119889 minus119909119905minus1119894119889)

(11)

5 Experimental Work

This section provides the performance metrics that are usedto assess the efficiency of the proposed approach It alsopresents and analyzes the experimental results of hybridapproach and compares it with the other classifiers

Table 7 Confusion matrix

Actual PredictedNormal Attack

Normal True Negative (TN) False Positive (FP)Attack False Negative (FN) True Positive (TP)True Positive (TP) the number of of attacks that are correctly identifiedTrue Negative (TN) the number of normal records that are correctlyclassifiedFalse Positive (FP) the number of normal records incorrectly classifiedFalse Negative (FN) the number of attacks incorrectly classified

51 PerformanceMetrics Theperformancemetrics like accu-racy sensitivity specificity false alarm rate and trainingtime are recorded for the intrusion detection dataset onapplying the proposed MABC-PSO classification algorithmGenerally sensitivity and specificity are the statistical mea-sures used to carry out the performance of classificationalgorithms Hence sensitivity and specificity are chosen to bethe parametric indices for carrying out the classification taskIn intrusion detection problem sensitivity can also be calleddetection rate The number of instances predicted correctlyor incorrectly by a classification model is summarized in aconfusion matrix and is shown in Table 7

The classification accuracy is the percentage of the overallnumber of connections correctly classified

Classification accuracy = (TP + TN)(TP + TN + FP + FN)

(12)

Sensitivity (True Positive Fraction) is the percentage of thenumber of attack connections correctly classified in the test-ing dataset

Sensitivity = TP(TP + FN)

(13)

Specificity (True Negative Fraction) is the percentage of thenumber of normal connections correctly classified in the test-ing dataset

Specificity = TN(TP + FN)

(14)

False alarm rate (FAR) is the percentage of the number of nor-mal connections incorrectly classified in the testing andtraining dataset

False Alarm Rate (FAR) = FP(TN + FP)

(15)

Cross-validation is a technique for assessing how the resultsof a statistical analysis will generalize to an independentdataset It is the standard way of measuring the accuracy ofa learning scheme and it is used to estimate how accuratelya predictive model will perform in practice In this work10-fold cross-validation method is used for improving theclassifier reliability In 10-fold cross-validation the originaldata is divided randomly into 10 parts During each run oneof the partitions is chosen for testing while the remaining

The Scientific World Journal 9

Table 8 Performance comparison of classification algorithms on accuracy rate

Classification Algorithms Average accuracy () Feature selection method

C45 [6]

9911 All features9869 Genetic algorithm9884 Best-first9941 Correlation feature selection

BayesNet [6]

9953 All features9952 Genetic algorithm9891 Best-first9892 Correlation feature selection

ABC-SVM [7] 92768Binary ABCPSO-SVM [7] 8388

GA-SVM [7] 8073

KNN [8] 9824 All features9811 Fast feature selection

Bayes Classifier [8] 7609 All features7194 Fast feature selection

ANN [9] 8157 Feature reductionSSO-RF [10 11] 927 SSOHybrid SSO [12] 9767 SSORSDT [13] 9788 Rough setID3 [13] 97665 All featuresC45 [13] 97582FC-ANN [14] 9671 All features

Proposed MABC-EPSO8859 All features9932 Single feature selection method9982 Random feature selection method

nine-tenths are used for training This process is repeated10 times so that each partition is used for training exactlyonceThe average of the results from the 10-fold gives the testaccuracy of the algorithm [37]

52 Results and Discussions Themain motivation is to showthat the proposed hybrid method has the advantage ofbecoming an efficient classification algorithm based on ABCand PSO To further prove the robustness of the proposedmethod other popular machine learning algorithms [38]such asNaives Bayes (NB)which is a statistical classifier deci-sion tree (j48) radial basis function (RBF) network SupportVectorMachine (SVM) that is based on the statistical learningtheory and basic ABC are tested on KDDCuprsquo99 dataset Foreach classification algorithm their default control parametersare used In Table 8 the results are reported for accuracy rateobtained by various classification algorithms using differentfeature selection methods

The performance comparison of the classifiers on accu-racy rate is given in Figures 3ndash6 The results show thaton classifiying the dataset with all features the averageaccuracy rate of 855 845 and 8859 is obtained forSVM ABC and proposed hybrid approaches When SFSMis applied accuracy rate of ABC and proposed MABC-EPSO

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 3 Accuracy comparison of classifiers for DoS dataset

is increased significantly to 9436 and 9932 The highestaccuracy (9982) is reported when the proposed MABC-EPSO with random feature selection method is employed It

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

4 The Scientific World Journal

Input initial solutionsOutput optimal solutionBEGINGenerate the initial population 119909

119898119898 = 1 2 SN

Evaluate the fitness (fit119894) of the population

set cycle = 1repeatFOR (employed phase)

Produce a new solution V119898using (4)

Calculate fit119894

Apply greedy selection processCalculate the probability 119875

119894using (6)

FOR (onlooker phase) Select a solution 119909

119898depending on 119875

119894

Produce a new solution V119898

Calculate fit119894

Apply greedy selection processIf (scout phase) There is an abandoned solution for the scout depending on limitThen Replace it with a new solution which will by randomly produced by (3)

Memorize the best solution so farcycle = cycle + 1Until cycle = MCNend

Algorithm 1 Artificial Bee Colony

Table 1 Distribution of connection types in 10 KDDCuprsquo99 data-set

Label of occurrence

DoS Probe U2R R2L Totalattack

Totalnormal

Training data 7924 083 001 023 8031 1969Testing data 7390 134 007 520 8151 1949

wide variety of attacks A version of 1998 DARPA datasetKDDCuprsquo99 is now widely accepted as a standard bench-mark dataset and received much attention in the researchcommunity of intrusion detection The main motivation ofusing KDDCuprsquo99 Dataset is to show that the proposedmethod has the advantage of becoming an efficient classi-fication algorithm when applied to the intrusion detectionsystem In this paper 10 KDD Cuprsquo99 dataset is used forexperimentation The distribution of connection types andsample size in 10 KDDCUP dataset is shown in Tables 1and 2 The feature information of 10 KDDCUP dataset isshown in Table 3 The dataset consists of one type of normaldata and 22 different attack types categorized into 4 classesnamely denial of service (DoS) Probe user-to-root (U2R)and remote-to-login (R2L)

43 Data Preprocessing Data preprocessing is the time-consuming task which prepares the data for subsequentanalysis as per the requirement for intrusion detection systemmodel The main aim of data preprocessing is to transformthe raw network data into suitable form for further analysisFigure 1 illustrates the steps involved in data processing and

Table 2 Sample size in 10 KDDCUP dataset

Category of attack Attack nameNormal Normal (97277)

DoSNeptune (107201) Smurf (280790) Pod(264) Teardrop (979) Land (21) Back(2203)

Probe Portsweep (1040) IPsweep (1247) Nmap(231) Satan (1589)

U2R Bufferoverflow (30) LoadModule (9)Perl (3) Rootkit (10)

R2LGuesspassword (53) Ftpwrite (8) Imap(12) Phf (4) Multihop (7) Warezmaster(20) Warezclient (1020)

Table 3 Feature information of 10 KDDCUP dataset

Dataset characteristics MultivariateAttribute characteristics Categorical integerAssociated task ClassificationArea ComputerNumber of instances 494020Number of attributes 42Number of classes 1 normal class 4 attack classes

how raw input data are processed for further statistical meas-ures

Various statistical analyses such as feature selectiondimensionality reduction and normalization are essential toselect significant features from the dataset If the dataset con-tains duplicate instances then the classification algorithms

The Scientific World Journal 5

Data preprocessing Data analysis

Network audit data

Association mining

Classification

Clustering

Fill missing value

Remove duplicate instances

Feature selection or dimensionality reduction

Alarmalert

Figure 1 Data preprocessing

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBeginLet119883 = 119909

1 1199092 119909

119899 where 119899 represents the number of features in the dataset

for 119894 = 1 2 119899

119883(119894) = 119909

(119894)one dimensional feature vector

Apply SVM classifier

Sort features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold thenSelect the featuresEnd

Algorithm 2 Single feature selection method

Table 4 Details of instances in the dataset

Before removingduplicates

After removingduplicates Selected instances

Normal 97277 87832 8783DoS 391458 54572 7935Probe 4107 2131 2131U2R 52 52 52R2L 1126 999 999Total 494020 145586 19900

consume more time and also provide inefficient results Toachieve more accurate and efficient model duplication elim-ination is needed The main deficiency in this dataset is thelarge number of redundant instances This large amount ofduplicate instances will make learning algorithms be partialtowards the frequently occurring instances and will inhibit itfrom learning infrequent instances which are generally moreunsafe to networks Also the existence of these duplicateinstances will cause the evaluation results to be biased by themethods which have better detection rates on the frequentlyoccurring instances [32] Eliminating duplicate instanceshelps in reducing false-positive rate for intrusion detectionHence duplicate instances are removed so the classifiers willnot be partial towards more frequently occurring instancesThe details of instances in the dataset are shown in Table 4After preprocessing selected random sample of 10 normaldata and 10Neptune attack inDoS class and four new sets ofdata are generated with the normal class and four categoriesof attack [33] Moreover irrelevant and redundant attributes

of intrusion detection dataset may lead to complex intrusiondetection model and reduce detection accuracy

44 Feature Selection Feature selection is an important dataprocessing process As the dataset is large it is essentialto remove the insignificant features in order to distinguishnormal traffic or intrusions in a well-timed manner In thispaper feature subsets are formed based on single featuremethod (SFSM) random feature selection method (RFSM)and compared the two techniques The proposed methodsreduce the features in the datasets which aim to improveaccuracy rate reduce processing time and improve efficiencyfor intrusion detection

441 Single Feature Selection Method Single feature method(SFSM) uses the one-dimensional feature vector In thefirst iteration it considers only the first attribute and isevaluated for calculating the accuracy using the SupportVectorMachine classifier In the second iteration it considersonly the corresponding attribute for evaluation The processis repeated until all 41 features are evaluated After calculatingthe entire featurersquos efficiency it is sorted and vital features areselected whose accuracy and detection rate are acc thresholdand dr threshold values respectively The pseudocode ofsingle feature selection algorithm is given in Algorithm 2

442 Random Feature Selection Method In this methodthe features are removed randomly and evaluated using theclassifier In the first iteration all the features are evaluatedusing SVM classifier and then by deleting one feature updatethe dataset using the classifier efficiency The importance of

6 The Scientific World Journal

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBegin

Let119883 = 1199091 1199092 119909

119899 where 119899 represents the number of features in the dataset

Let 119878 = 119883forall119899

119894minus1119883 doDelete 119909

119894from119883

119878 = 119878 minus 119909119894update feature subset

Apply SVM classifierDelete 119909

119894from119883

endSort the features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold then119878 = 119878 minus 119909

119894selecting vital features

End

Algorithm 3 Random feature selection method

Table 5 List of features selected using SFSM methods

Dataset Selected features Number offeatures

DoS + 10 normal24 32 41 28 40 27 3435 5 17 21 4 39 11 9 714 1 30 6

20

Probe + 10 normal 11 1 15 26 10 4 21 1819 25 39 31 7 35 28 15

R2L + 10 normal16 26 30 3 7 21 6 1412 35 32 18 38 17 4110 31

17

U2R + 10 normal 27 40 26 1 34 41 7 1828 3 20 37 11 13

Table 6 List of features selected using RFSMmethods

Dataset Selected features Number offeatures

DoS + 10 normal 4 9 21 39 14 28 3 829 33 17 12 38 31 14

Probe + 10 normal 27 2 3 30 11 33 23 939 20 21 37 12

R2L + 10 normal 24 15 23 7 25 16 8 3329 38 21 30 32 13

U2R + 10 normal 6 19 22 30 21 28 3627 11 17 20 11

the provided feature is calculated In the second iterationanother feature is removed randomly from the dataset andupdated The process is repeated until only one feature isleft After calculating the entire featurersquos efficiency it is sortedin descending order of its accuracy If the accuracy anddetection rate are greater than the threshold value (accuracyand detection rate obtained using all features) then selectthose features as vital featuresThepseudocode of the randomfeature selection algorithm is given in Algorithm 3

Tables 5 and 6 show the feature subsets identified usingthe two feature selection methods and size of the subsetsidentified as a percentage of the full feature set

45 Hybrid Classification Approach Artificial intelligenceandmachine learning techniques were used to build differentIDSs but they have shown limitations in achieving highdetection accuracy and fast processing time Computationalintelligence techniques known for their ability to adaptand to exhibit fault tolerance high computational speedand resilience against noisy information compensate for thelimitations of these approaches [1] Our aim is to increase thelevel of performance of intrusion detection of the most usedclassification techniques nowadays by using optimizationmethods like PSO andABCThis work develops an algorithmthat combines the logic of both ABC and PSO to producea high performance IDS and their combination has theadvantage of providing amore reliable solution to todayrsquos dataintensive computing processes

Artificial Bee Colony algorithm is a newly proposed opti-mization algorithm and is becoming a hot topic in computa-tional intelligence nowadays Because its high probability ofavoiding the local optima it can make up the disadvantage ofParticle Swarm Optimization algorithm Moreover ParticleSwarm Optimization Algorithm can help us to find out theoptimal solutionmore easily In such circumstances we bringthe two algorithms together so that the computation processmay benefit from both of the advantagesThe flowchart of theproposed hybrid MABC-EPSO is given in Figure 2

In this hybrid model the colony is divided into twoparts one possesses the swarm intelligence of Artificial BeeColony and the other one is the particle swarm intelligenceAssuming that there is cooperation between the two parts ineach iteration one part which finds out the better solutionwill share its achievement with the other part The inferiorsolution will be replaced by the better solution and will besubstituted in the next iterationThe process ofMABC-EPSOis as follows

Step 1 (initialization of parameters) Set the number of indi-viduals of the swarm set the maximum circle index of thealgorithm set the search range of the solution set the otherconstants needed in both ABC and PSO

The Scientific World Journal 7

No

Yes

Initialize the parameters of EPSO and MABC

Evaluate the fitness value

Calculate the particle Employed bee phase

Update particle positions

Update the best

Onlooker bee phase

Scout bee phase

Determine the best of MABC

Is the termination condition satisfied

Select the best solution

EPSO MABC

Data preprocessing

Feature selection using SFSM and RFSM

Network audit data

Determine the gbest of EPSO and the best of MABC

Determine the pbest and gbest of EPSO

Figure 2 Flowchart of the proposed hybrid MABC-EPSO model

Step 2 (initialization of the colony) Generate a colony witha specific number of individuals Bee colony is divided intotwo categories employed foragers and unemployed foragersaccording to each individualrsquos fitness value on the other handas a particle swarm calculate the fitness value of each particleand take the best location as the global best location

Step 3 In bee colony to evaluate the fitness value of eachsolution an employee bee is assigned using (5)The employeebee selects a new candidate solution from the nearby foodsources and then uses greedy selectionmethod by calculatingthe Rastrigin function as follows

Min119891 (119909) = 10119899 +

119899

sum

119894=1[1199092

119894minus 10 cos (2120587119909

119894)] (7)

Amultimodal function is said to contain more than one localoptimum A function of variables is separable if it can bemodified as a sum of functions of just one variable [34] Thedimensionality of the search space is another significant fac-tor in the complexity of the problem The challenge involvedin finding optimal solutions to this function is that on theway towards the global optimum an optimization problemcan be easily confined in a local optimumHence the classicalbenchmark function Rastrigin [34] is implemented using

Artificial Bee Colony algorithm and named as ModifiedArtificial BeeColony (MABC) algorithm In (1)119891

119894is Rastrigin

function whose value is 0 at its global minimum (0 0 0)This function is chosen because it is considered to be oneof the best test functions for finding the global minimumInitialization range for the function is [minus15 15]This functionis with cosine modulation to produce many local minimaThus the function is multimodal

Step 4 If the fitness value is larger than the earlier one thebee remembers the new point and forgets the previous oneotherwise it keeps the previous solution Based on the sharedinformation by employee bees an onlooker bee calculatesthe shared fitness value and selects a food source with aprobability value computed as in (6)

Step 5 An onlooker bee constructs a new solution selectedamong the neighbors of a previous solution It also checksthe fitness value and if this value is better than the previousone it will substitute the old one with the new positionotherwise it evokes the old position The objective of scoutbees is to determine new random food sources to substitutethe solutions that cannot be enhanced after reaching theldquolimitrdquo value In order to obtain the best optimized solutionthe algorithm goes through a predefined number of cycles

8 The Scientific World Journal

(MCN) After all the choices have been made the best solu-tion generated in that iteration is called MABCbest

Step 6 As there is a large effect of initial velocity in the bal-ancing of exploration and exploitation process of swarmin this proposed Enhanced Particle Swarm Optimization(EPSO) algorithm inertia weight (120596) [35] is used to controlthe velocity and hence the velocity update equation (8)becomes as follows

V119905119894119889= 120596 sdot V119905minus1

119894119889+ 1198881 sdot rand1 sdot (119901119894119889 minus119909

119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(8)

A small inertia weight facilitates a local search whereas alarge inertia weight facilitates a global search In the EPSOalgorithm linear decreasing inertia weight [36] as in (9) isused to enhance the efficiency and performance of PSO Itis found experimentally that inertia weight from 09 to 04provides the optimal results

119908119896= 119908max minus

119908max minus 119908minitermax

times 119896 (9)

In particle swarm after the comparison among the solutionsthat each particle has experienced and the comparison amongthe solutions that all the particles have ever experienced thebest location in that iteration is called EPSObest

Step 7 Theminimum of the value MABCbest and EPSObest iscalled Best and is defined as

Best =

EPSObest if EPSObest le MABCbest

MABCbest if MABCbest le EPSObest(10)

Step 8 If the termination condition is satisfied then end theprocess and report the best solution Otherwise return toStep 2

Parameter Settings The algorithms are evaluated using thetwo feature sets selected by SFSM and RFSM In ABC algo-rithm the parameters set are bee colony size 40 MCN 500and limit 5 In EPSO algorithm the inertia weight 120596 in (11)varies from 09 to 07 linearly with the iterations Also theacceleration coefficients 119888

1and 1198882are set as 2 The upper and

lower bounds for V(Vmin Vmax) are set as the maximum upperand lower bounds of 119909

V119905119894119889= 120596V119905minus1119894119889

+ 1198881rand (0 1) (119901119894119889 minus119909119905minus1119894119889)

+ 1198882rand (0 1) (119901119892119889 minus119909119905minus1119894119889)

(11)

5 Experimental Work

This section provides the performance metrics that are usedto assess the efficiency of the proposed approach It alsopresents and analyzes the experimental results of hybridapproach and compares it with the other classifiers

Table 7 Confusion matrix

Actual PredictedNormal Attack

Normal True Negative (TN) False Positive (FP)Attack False Negative (FN) True Positive (TP)True Positive (TP) the number of of attacks that are correctly identifiedTrue Negative (TN) the number of normal records that are correctlyclassifiedFalse Positive (FP) the number of normal records incorrectly classifiedFalse Negative (FN) the number of attacks incorrectly classified

51 PerformanceMetrics Theperformancemetrics like accu-racy sensitivity specificity false alarm rate and trainingtime are recorded for the intrusion detection dataset onapplying the proposed MABC-PSO classification algorithmGenerally sensitivity and specificity are the statistical mea-sures used to carry out the performance of classificationalgorithms Hence sensitivity and specificity are chosen to bethe parametric indices for carrying out the classification taskIn intrusion detection problem sensitivity can also be calleddetection rate The number of instances predicted correctlyor incorrectly by a classification model is summarized in aconfusion matrix and is shown in Table 7

The classification accuracy is the percentage of the overallnumber of connections correctly classified

Classification accuracy = (TP + TN)(TP + TN + FP + FN)

(12)

Sensitivity (True Positive Fraction) is the percentage of thenumber of attack connections correctly classified in the test-ing dataset

Sensitivity = TP(TP + FN)

(13)

Specificity (True Negative Fraction) is the percentage of thenumber of normal connections correctly classified in the test-ing dataset

Specificity = TN(TP + FN)

(14)

False alarm rate (FAR) is the percentage of the number of nor-mal connections incorrectly classified in the testing andtraining dataset

False Alarm Rate (FAR) = FP(TN + FP)

(15)

Cross-validation is a technique for assessing how the resultsof a statistical analysis will generalize to an independentdataset It is the standard way of measuring the accuracy ofa learning scheme and it is used to estimate how accuratelya predictive model will perform in practice In this work10-fold cross-validation method is used for improving theclassifier reliability In 10-fold cross-validation the originaldata is divided randomly into 10 parts During each run oneof the partitions is chosen for testing while the remaining

The Scientific World Journal 9

Table 8 Performance comparison of classification algorithms on accuracy rate

Classification Algorithms Average accuracy () Feature selection method

C45 [6]

9911 All features9869 Genetic algorithm9884 Best-first9941 Correlation feature selection

BayesNet [6]

9953 All features9952 Genetic algorithm9891 Best-first9892 Correlation feature selection

ABC-SVM [7] 92768Binary ABCPSO-SVM [7] 8388

GA-SVM [7] 8073

KNN [8] 9824 All features9811 Fast feature selection

Bayes Classifier [8] 7609 All features7194 Fast feature selection

ANN [9] 8157 Feature reductionSSO-RF [10 11] 927 SSOHybrid SSO [12] 9767 SSORSDT [13] 9788 Rough setID3 [13] 97665 All featuresC45 [13] 97582FC-ANN [14] 9671 All features

Proposed MABC-EPSO8859 All features9932 Single feature selection method9982 Random feature selection method

nine-tenths are used for training This process is repeated10 times so that each partition is used for training exactlyonceThe average of the results from the 10-fold gives the testaccuracy of the algorithm [37]

52 Results and Discussions Themain motivation is to showthat the proposed hybrid method has the advantage ofbecoming an efficient classification algorithm based on ABCand PSO To further prove the robustness of the proposedmethod other popular machine learning algorithms [38]such asNaives Bayes (NB)which is a statistical classifier deci-sion tree (j48) radial basis function (RBF) network SupportVectorMachine (SVM) that is based on the statistical learningtheory and basic ABC are tested on KDDCuprsquo99 dataset Foreach classification algorithm their default control parametersare used In Table 8 the results are reported for accuracy rateobtained by various classification algorithms using differentfeature selection methods

The performance comparison of the classifiers on accu-racy rate is given in Figures 3ndash6 The results show thaton classifiying the dataset with all features the averageaccuracy rate of 855 845 and 8859 is obtained forSVM ABC and proposed hybrid approaches When SFSMis applied accuracy rate of ABC and proposed MABC-EPSO

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 3 Accuracy comparison of classifiers for DoS dataset

is increased significantly to 9436 and 9932 The highestaccuracy (9982) is reported when the proposed MABC-EPSO with random feature selection method is employed It

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

The Scientific World Journal 5

Data preprocessing Data analysis

Network audit data

Association mining

Classification

Clustering

Fill missing value

Remove duplicate instances

Feature selection or dimensionality reduction

Alarmalert

Figure 1 Data preprocessing

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBeginLet119883 = 119909

1 1199092 119909

119899 where 119899 represents the number of features in the dataset

for 119894 = 1 2 119899

119883(119894) = 119909

(119894)one dimensional feature vector

Apply SVM classifier

Sort features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold thenSelect the featuresEnd

Algorithm 2 Single feature selection method

Table 4 Details of instances in the dataset

Before removingduplicates

After removingduplicates Selected instances

Normal 97277 87832 8783DoS 391458 54572 7935Probe 4107 2131 2131U2R 52 52 52R2L 1126 999 999Total 494020 145586 19900

consume more time and also provide inefficient results Toachieve more accurate and efficient model duplication elim-ination is needed The main deficiency in this dataset is thelarge number of redundant instances This large amount ofduplicate instances will make learning algorithms be partialtowards the frequently occurring instances and will inhibit itfrom learning infrequent instances which are generally moreunsafe to networks Also the existence of these duplicateinstances will cause the evaluation results to be biased by themethods which have better detection rates on the frequentlyoccurring instances [32] Eliminating duplicate instanceshelps in reducing false-positive rate for intrusion detectionHence duplicate instances are removed so the classifiers willnot be partial towards more frequently occurring instancesThe details of instances in the dataset are shown in Table 4After preprocessing selected random sample of 10 normaldata and 10Neptune attack inDoS class and four new sets ofdata are generated with the normal class and four categoriesof attack [33] Moreover irrelevant and redundant attributes

of intrusion detection dataset may lead to complex intrusiondetection model and reduce detection accuracy

44 Feature Selection Feature selection is an important dataprocessing process As the dataset is large it is essentialto remove the insignificant features in order to distinguishnormal traffic or intrusions in a well-timed manner In thispaper feature subsets are formed based on single featuremethod (SFSM) random feature selection method (RFSM)and compared the two techniques The proposed methodsreduce the features in the datasets which aim to improveaccuracy rate reduce processing time and improve efficiencyfor intrusion detection

441 Single Feature Selection Method Single feature method(SFSM) uses the one-dimensional feature vector In thefirst iteration it considers only the first attribute and isevaluated for calculating the accuracy using the SupportVectorMachine classifier In the second iteration it considersonly the corresponding attribute for evaluation The processis repeated until all 41 features are evaluated After calculatingthe entire featurersquos efficiency it is sorted and vital features areselected whose accuracy and detection rate are acc thresholdand dr threshold values respectively The pseudocode ofsingle feature selection algorithm is given in Algorithm 2

442 Random Feature Selection Method In this methodthe features are removed randomly and evaluated using theclassifier In the first iteration all the features are evaluatedusing SVM classifier and then by deleting one feature updatethe dataset using the classifier efficiency The importance of

6 The Scientific World Journal

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBegin

Let119883 = 1199091 1199092 119909

119899 where 119899 represents the number of features in the dataset

Let 119878 = 119883forall119899

119894minus1119883 doDelete 119909

119894from119883

119878 = 119878 minus 119909119894update feature subset

Apply SVM classifierDelete 119909

119894from119883

endSort the features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold then119878 = 119878 minus 119909

119894selecting vital features

End

Algorithm 3 Random feature selection method

Table 5 List of features selected using SFSM methods

Dataset Selected features Number offeatures

DoS + 10 normal24 32 41 28 40 27 3435 5 17 21 4 39 11 9 714 1 30 6

20

Probe + 10 normal 11 1 15 26 10 4 21 1819 25 39 31 7 35 28 15

R2L + 10 normal16 26 30 3 7 21 6 1412 35 32 18 38 17 4110 31

17

U2R + 10 normal 27 40 26 1 34 41 7 1828 3 20 37 11 13

Table 6 List of features selected using RFSMmethods

Dataset Selected features Number offeatures

DoS + 10 normal 4 9 21 39 14 28 3 829 33 17 12 38 31 14

Probe + 10 normal 27 2 3 30 11 33 23 939 20 21 37 12

R2L + 10 normal 24 15 23 7 25 16 8 3329 38 21 30 32 13

U2R + 10 normal 6 19 22 30 21 28 3627 11 17 20 11

the provided feature is calculated In the second iterationanother feature is removed randomly from the dataset andupdated The process is repeated until only one feature isleft After calculating the entire featurersquos efficiency it is sortedin descending order of its accuracy If the accuracy anddetection rate are greater than the threshold value (accuracyand detection rate obtained using all features) then selectthose features as vital featuresThepseudocode of the randomfeature selection algorithm is given in Algorithm 3

Tables 5 and 6 show the feature subsets identified usingthe two feature selection methods and size of the subsetsidentified as a percentage of the full feature set

45 Hybrid Classification Approach Artificial intelligenceandmachine learning techniques were used to build differentIDSs but they have shown limitations in achieving highdetection accuracy and fast processing time Computationalintelligence techniques known for their ability to adaptand to exhibit fault tolerance high computational speedand resilience against noisy information compensate for thelimitations of these approaches [1] Our aim is to increase thelevel of performance of intrusion detection of the most usedclassification techniques nowadays by using optimizationmethods like PSO andABCThis work develops an algorithmthat combines the logic of both ABC and PSO to producea high performance IDS and their combination has theadvantage of providing amore reliable solution to todayrsquos dataintensive computing processes

Artificial Bee Colony algorithm is a newly proposed opti-mization algorithm and is becoming a hot topic in computa-tional intelligence nowadays Because its high probability ofavoiding the local optima it can make up the disadvantage ofParticle Swarm Optimization algorithm Moreover ParticleSwarm Optimization Algorithm can help us to find out theoptimal solutionmore easily In such circumstances we bringthe two algorithms together so that the computation processmay benefit from both of the advantagesThe flowchart of theproposed hybrid MABC-EPSO is given in Figure 2

In this hybrid model the colony is divided into twoparts one possesses the swarm intelligence of Artificial BeeColony and the other one is the particle swarm intelligenceAssuming that there is cooperation between the two parts ineach iteration one part which finds out the better solutionwill share its achievement with the other part The inferiorsolution will be replaced by the better solution and will besubstituted in the next iterationThe process ofMABC-EPSOis as follows

Step 1 (initialization of parameters) Set the number of indi-viduals of the swarm set the maximum circle index of thealgorithm set the search range of the solution set the otherconstants needed in both ABC and PSO

The Scientific World Journal 7

No

Yes

Initialize the parameters of EPSO and MABC

Evaluate the fitness value

Calculate the particle Employed bee phase

Update particle positions

Update the best

Onlooker bee phase

Scout bee phase

Determine the best of MABC

Is the termination condition satisfied

Select the best solution

EPSO MABC

Data preprocessing

Feature selection using SFSM and RFSM

Network audit data

Determine the gbest of EPSO and the best of MABC

Determine the pbest and gbest of EPSO

Figure 2 Flowchart of the proposed hybrid MABC-EPSO model

Step 2 (initialization of the colony) Generate a colony witha specific number of individuals Bee colony is divided intotwo categories employed foragers and unemployed foragersaccording to each individualrsquos fitness value on the other handas a particle swarm calculate the fitness value of each particleand take the best location as the global best location

Step 3 In bee colony to evaluate the fitness value of eachsolution an employee bee is assigned using (5)The employeebee selects a new candidate solution from the nearby foodsources and then uses greedy selectionmethod by calculatingthe Rastrigin function as follows

Min119891 (119909) = 10119899 +

119899

sum

119894=1[1199092

119894minus 10 cos (2120587119909

119894)] (7)

Amultimodal function is said to contain more than one localoptimum A function of variables is separable if it can bemodified as a sum of functions of just one variable [34] Thedimensionality of the search space is another significant fac-tor in the complexity of the problem The challenge involvedin finding optimal solutions to this function is that on theway towards the global optimum an optimization problemcan be easily confined in a local optimumHence the classicalbenchmark function Rastrigin [34] is implemented using

Artificial Bee Colony algorithm and named as ModifiedArtificial BeeColony (MABC) algorithm In (1)119891

119894is Rastrigin

function whose value is 0 at its global minimum (0 0 0)This function is chosen because it is considered to be oneof the best test functions for finding the global minimumInitialization range for the function is [minus15 15]This functionis with cosine modulation to produce many local minimaThus the function is multimodal

Step 4 If the fitness value is larger than the earlier one thebee remembers the new point and forgets the previous oneotherwise it keeps the previous solution Based on the sharedinformation by employee bees an onlooker bee calculatesthe shared fitness value and selects a food source with aprobability value computed as in (6)

Step 5 An onlooker bee constructs a new solution selectedamong the neighbors of a previous solution It also checksthe fitness value and if this value is better than the previousone it will substitute the old one with the new positionotherwise it evokes the old position The objective of scoutbees is to determine new random food sources to substitutethe solutions that cannot be enhanced after reaching theldquolimitrdquo value In order to obtain the best optimized solutionthe algorithm goes through a predefined number of cycles

8 The Scientific World Journal

(MCN) After all the choices have been made the best solu-tion generated in that iteration is called MABCbest

Step 6 As there is a large effect of initial velocity in the bal-ancing of exploration and exploitation process of swarmin this proposed Enhanced Particle Swarm Optimization(EPSO) algorithm inertia weight (120596) [35] is used to controlthe velocity and hence the velocity update equation (8)becomes as follows

V119905119894119889= 120596 sdot V119905minus1

119894119889+ 1198881 sdot rand1 sdot (119901119894119889 minus119909

119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(8)

A small inertia weight facilitates a local search whereas alarge inertia weight facilitates a global search In the EPSOalgorithm linear decreasing inertia weight [36] as in (9) isused to enhance the efficiency and performance of PSO Itis found experimentally that inertia weight from 09 to 04provides the optimal results

119908119896= 119908max minus

119908max minus 119908minitermax

times 119896 (9)

In particle swarm after the comparison among the solutionsthat each particle has experienced and the comparison amongthe solutions that all the particles have ever experienced thebest location in that iteration is called EPSObest

Step 7 Theminimum of the value MABCbest and EPSObest iscalled Best and is defined as

Best =

EPSObest if EPSObest le MABCbest

MABCbest if MABCbest le EPSObest(10)

Step 8 If the termination condition is satisfied then end theprocess and report the best solution Otherwise return toStep 2

Parameter Settings The algorithms are evaluated using thetwo feature sets selected by SFSM and RFSM In ABC algo-rithm the parameters set are bee colony size 40 MCN 500and limit 5 In EPSO algorithm the inertia weight 120596 in (11)varies from 09 to 07 linearly with the iterations Also theacceleration coefficients 119888

1and 1198882are set as 2 The upper and

lower bounds for V(Vmin Vmax) are set as the maximum upperand lower bounds of 119909

V119905119894119889= 120596V119905minus1119894119889

+ 1198881rand (0 1) (119901119894119889 minus119909119905minus1119894119889)

+ 1198882rand (0 1) (119901119892119889 minus119909119905minus1119894119889)

(11)

5 Experimental Work

This section provides the performance metrics that are usedto assess the efficiency of the proposed approach It alsopresents and analyzes the experimental results of hybridapproach and compares it with the other classifiers

Table 7 Confusion matrix

Actual PredictedNormal Attack

Normal True Negative (TN) False Positive (FP)Attack False Negative (FN) True Positive (TP)True Positive (TP) the number of of attacks that are correctly identifiedTrue Negative (TN) the number of normal records that are correctlyclassifiedFalse Positive (FP) the number of normal records incorrectly classifiedFalse Negative (FN) the number of attacks incorrectly classified

51 PerformanceMetrics Theperformancemetrics like accu-racy sensitivity specificity false alarm rate and trainingtime are recorded for the intrusion detection dataset onapplying the proposed MABC-PSO classification algorithmGenerally sensitivity and specificity are the statistical mea-sures used to carry out the performance of classificationalgorithms Hence sensitivity and specificity are chosen to bethe parametric indices for carrying out the classification taskIn intrusion detection problem sensitivity can also be calleddetection rate The number of instances predicted correctlyor incorrectly by a classification model is summarized in aconfusion matrix and is shown in Table 7

The classification accuracy is the percentage of the overallnumber of connections correctly classified

Classification accuracy = (TP + TN)(TP + TN + FP + FN)

(12)

Sensitivity (True Positive Fraction) is the percentage of thenumber of attack connections correctly classified in the test-ing dataset

Sensitivity = TP(TP + FN)

(13)

Specificity (True Negative Fraction) is the percentage of thenumber of normal connections correctly classified in the test-ing dataset

Specificity = TN(TP + FN)

(14)

False alarm rate (FAR) is the percentage of the number of nor-mal connections incorrectly classified in the testing andtraining dataset

False Alarm Rate (FAR) = FP(TN + FP)

(15)

Cross-validation is a technique for assessing how the resultsof a statistical analysis will generalize to an independentdataset It is the standard way of measuring the accuracy ofa learning scheme and it is used to estimate how accuratelya predictive model will perform in practice In this work10-fold cross-validation method is used for improving theclassifier reliability In 10-fold cross-validation the originaldata is divided randomly into 10 parts During each run oneof the partitions is chosen for testing while the remaining

The Scientific World Journal 9

Table 8 Performance comparison of classification algorithms on accuracy rate

Classification Algorithms Average accuracy () Feature selection method

C45 [6]

9911 All features9869 Genetic algorithm9884 Best-first9941 Correlation feature selection

BayesNet [6]

9953 All features9952 Genetic algorithm9891 Best-first9892 Correlation feature selection

ABC-SVM [7] 92768Binary ABCPSO-SVM [7] 8388

GA-SVM [7] 8073

KNN [8] 9824 All features9811 Fast feature selection

Bayes Classifier [8] 7609 All features7194 Fast feature selection

ANN [9] 8157 Feature reductionSSO-RF [10 11] 927 SSOHybrid SSO [12] 9767 SSORSDT [13] 9788 Rough setID3 [13] 97665 All featuresC45 [13] 97582FC-ANN [14] 9671 All features

Proposed MABC-EPSO8859 All features9932 Single feature selection method9982 Random feature selection method

nine-tenths are used for training This process is repeated10 times so that each partition is used for training exactlyonceThe average of the results from the 10-fold gives the testaccuracy of the algorithm [37]

52 Results and Discussions Themain motivation is to showthat the proposed hybrid method has the advantage ofbecoming an efficient classification algorithm based on ABCand PSO To further prove the robustness of the proposedmethod other popular machine learning algorithms [38]such asNaives Bayes (NB)which is a statistical classifier deci-sion tree (j48) radial basis function (RBF) network SupportVectorMachine (SVM) that is based on the statistical learningtheory and basic ABC are tested on KDDCuprsquo99 dataset Foreach classification algorithm their default control parametersare used In Table 8 the results are reported for accuracy rateobtained by various classification algorithms using differentfeature selection methods

The performance comparison of the classifiers on accu-racy rate is given in Figures 3ndash6 The results show thaton classifiying the dataset with all features the averageaccuracy rate of 855 845 and 8859 is obtained forSVM ABC and proposed hybrid approaches When SFSMis applied accuracy rate of ABC and proposed MABC-EPSO

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 3 Accuracy comparison of classifiers for DoS dataset

is increased significantly to 9436 and 9932 The highestaccuracy (9982) is reported when the proposed MABC-EPSO with random feature selection method is employed It

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

6 The Scientific World Journal

Input Dataset119883 with 119899 FeaturesOutput Vital featuresBegin

Let119883 = 1199091 1199092 119909

119899 where 119899 represents the number of features in the dataset

Let 119878 = 119883forall119899

119894minus1119883 doDelete 119909

119894from119883

119878 = 119878 minus 119909119894update feature subset

Apply SVM classifierDelete 119909

119894from119883

endSort the features based on classifier accuracy (acc)If acc gt acc threshold and detection rate gt dr threshold then119878 = 119878 minus 119909

119894selecting vital features

End

Algorithm 3 Random feature selection method

Table 5 List of features selected using SFSM methods

Dataset Selected features Number offeatures

DoS + 10 normal24 32 41 28 40 27 3435 5 17 21 4 39 11 9 714 1 30 6

20

Probe + 10 normal 11 1 15 26 10 4 21 1819 25 39 31 7 35 28 15

R2L + 10 normal16 26 30 3 7 21 6 1412 35 32 18 38 17 4110 31

17

U2R + 10 normal 27 40 26 1 34 41 7 1828 3 20 37 11 13

Table 6 List of features selected using RFSMmethods

Dataset Selected features Number offeatures

DoS + 10 normal 4 9 21 39 14 28 3 829 33 17 12 38 31 14

Probe + 10 normal 27 2 3 30 11 33 23 939 20 21 37 12

R2L + 10 normal 24 15 23 7 25 16 8 3329 38 21 30 32 13

U2R + 10 normal 6 19 22 30 21 28 3627 11 17 20 11

the provided feature is calculated In the second iterationanother feature is removed randomly from the dataset andupdated The process is repeated until only one feature isleft After calculating the entire featurersquos efficiency it is sortedin descending order of its accuracy If the accuracy anddetection rate are greater than the threshold value (accuracyand detection rate obtained using all features) then selectthose features as vital featuresThepseudocode of the randomfeature selection algorithm is given in Algorithm 3

Tables 5 and 6 show the feature subsets identified usingthe two feature selection methods and size of the subsetsidentified as a percentage of the full feature set

45 Hybrid Classification Approach Artificial intelligenceandmachine learning techniques were used to build differentIDSs but they have shown limitations in achieving highdetection accuracy and fast processing time Computationalintelligence techniques known for their ability to adaptand to exhibit fault tolerance high computational speedand resilience against noisy information compensate for thelimitations of these approaches [1] Our aim is to increase thelevel of performance of intrusion detection of the most usedclassification techniques nowadays by using optimizationmethods like PSO andABCThis work develops an algorithmthat combines the logic of both ABC and PSO to producea high performance IDS and their combination has theadvantage of providing amore reliable solution to todayrsquos dataintensive computing processes

Artificial Bee Colony algorithm is a newly proposed opti-mization algorithm and is becoming a hot topic in computa-tional intelligence nowadays Because its high probability ofavoiding the local optima it can make up the disadvantage ofParticle Swarm Optimization algorithm Moreover ParticleSwarm Optimization Algorithm can help us to find out theoptimal solutionmore easily In such circumstances we bringthe two algorithms together so that the computation processmay benefit from both of the advantagesThe flowchart of theproposed hybrid MABC-EPSO is given in Figure 2

In this hybrid model the colony is divided into twoparts one possesses the swarm intelligence of Artificial BeeColony and the other one is the particle swarm intelligenceAssuming that there is cooperation between the two parts ineach iteration one part which finds out the better solutionwill share its achievement with the other part The inferiorsolution will be replaced by the better solution and will besubstituted in the next iterationThe process ofMABC-EPSOis as follows

Step 1 (initialization of parameters) Set the number of indi-viduals of the swarm set the maximum circle index of thealgorithm set the search range of the solution set the otherconstants needed in both ABC and PSO

The Scientific World Journal 7

No

Yes

Initialize the parameters of EPSO and MABC

Evaluate the fitness value

Calculate the particle Employed bee phase

Update particle positions

Update the best

Onlooker bee phase

Scout bee phase

Determine the best of MABC

Is the termination condition satisfied

Select the best solution

EPSO MABC

Data preprocessing

Feature selection using SFSM and RFSM

Network audit data

Determine the gbest of EPSO and the best of MABC

Determine the pbest and gbest of EPSO

Figure 2 Flowchart of the proposed hybrid MABC-EPSO model

Step 2 (initialization of the colony) Generate a colony witha specific number of individuals Bee colony is divided intotwo categories employed foragers and unemployed foragersaccording to each individualrsquos fitness value on the other handas a particle swarm calculate the fitness value of each particleand take the best location as the global best location

Step 3 In bee colony to evaluate the fitness value of eachsolution an employee bee is assigned using (5)The employeebee selects a new candidate solution from the nearby foodsources and then uses greedy selectionmethod by calculatingthe Rastrigin function as follows

Min119891 (119909) = 10119899 +

119899

sum

119894=1[1199092

119894minus 10 cos (2120587119909

119894)] (7)

Amultimodal function is said to contain more than one localoptimum A function of variables is separable if it can bemodified as a sum of functions of just one variable [34] Thedimensionality of the search space is another significant fac-tor in the complexity of the problem The challenge involvedin finding optimal solutions to this function is that on theway towards the global optimum an optimization problemcan be easily confined in a local optimumHence the classicalbenchmark function Rastrigin [34] is implemented using

Artificial Bee Colony algorithm and named as ModifiedArtificial BeeColony (MABC) algorithm In (1)119891

119894is Rastrigin

function whose value is 0 at its global minimum (0 0 0)This function is chosen because it is considered to be oneof the best test functions for finding the global minimumInitialization range for the function is [minus15 15]This functionis with cosine modulation to produce many local minimaThus the function is multimodal

Step 4 If the fitness value is larger than the earlier one thebee remembers the new point and forgets the previous oneotherwise it keeps the previous solution Based on the sharedinformation by employee bees an onlooker bee calculatesthe shared fitness value and selects a food source with aprobability value computed as in (6)

Step 5 An onlooker bee constructs a new solution selectedamong the neighbors of a previous solution It also checksthe fitness value and if this value is better than the previousone it will substitute the old one with the new positionotherwise it evokes the old position The objective of scoutbees is to determine new random food sources to substitutethe solutions that cannot be enhanced after reaching theldquolimitrdquo value In order to obtain the best optimized solutionthe algorithm goes through a predefined number of cycles

8 The Scientific World Journal

(MCN) After all the choices have been made the best solu-tion generated in that iteration is called MABCbest

Step 6 As there is a large effect of initial velocity in the bal-ancing of exploration and exploitation process of swarmin this proposed Enhanced Particle Swarm Optimization(EPSO) algorithm inertia weight (120596) [35] is used to controlthe velocity and hence the velocity update equation (8)becomes as follows

V119905119894119889= 120596 sdot V119905minus1

119894119889+ 1198881 sdot rand1 sdot (119901119894119889 minus119909

119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(8)

A small inertia weight facilitates a local search whereas alarge inertia weight facilitates a global search In the EPSOalgorithm linear decreasing inertia weight [36] as in (9) isused to enhance the efficiency and performance of PSO Itis found experimentally that inertia weight from 09 to 04provides the optimal results

119908119896= 119908max minus

119908max minus 119908minitermax

times 119896 (9)

In particle swarm after the comparison among the solutionsthat each particle has experienced and the comparison amongthe solutions that all the particles have ever experienced thebest location in that iteration is called EPSObest

Step 7 Theminimum of the value MABCbest and EPSObest iscalled Best and is defined as

Best =

EPSObest if EPSObest le MABCbest

MABCbest if MABCbest le EPSObest(10)

Step 8 If the termination condition is satisfied then end theprocess and report the best solution Otherwise return toStep 2

Parameter Settings The algorithms are evaluated using thetwo feature sets selected by SFSM and RFSM In ABC algo-rithm the parameters set are bee colony size 40 MCN 500and limit 5 In EPSO algorithm the inertia weight 120596 in (11)varies from 09 to 07 linearly with the iterations Also theacceleration coefficients 119888

1and 1198882are set as 2 The upper and

lower bounds for V(Vmin Vmax) are set as the maximum upperand lower bounds of 119909

V119905119894119889= 120596V119905minus1119894119889

+ 1198881rand (0 1) (119901119894119889 minus119909119905minus1119894119889)

+ 1198882rand (0 1) (119901119892119889 minus119909119905minus1119894119889)

(11)

5 Experimental Work

This section provides the performance metrics that are usedto assess the efficiency of the proposed approach It alsopresents and analyzes the experimental results of hybridapproach and compares it with the other classifiers

Table 7 Confusion matrix

Actual PredictedNormal Attack

Normal True Negative (TN) False Positive (FP)Attack False Negative (FN) True Positive (TP)True Positive (TP) the number of of attacks that are correctly identifiedTrue Negative (TN) the number of normal records that are correctlyclassifiedFalse Positive (FP) the number of normal records incorrectly classifiedFalse Negative (FN) the number of attacks incorrectly classified

51 PerformanceMetrics Theperformancemetrics like accu-racy sensitivity specificity false alarm rate and trainingtime are recorded for the intrusion detection dataset onapplying the proposed MABC-PSO classification algorithmGenerally sensitivity and specificity are the statistical mea-sures used to carry out the performance of classificationalgorithms Hence sensitivity and specificity are chosen to bethe parametric indices for carrying out the classification taskIn intrusion detection problem sensitivity can also be calleddetection rate The number of instances predicted correctlyor incorrectly by a classification model is summarized in aconfusion matrix and is shown in Table 7

The classification accuracy is the percentage of the overallnumber of connections correctly classified

Classification accuracy = (TP + TN)(TP + TN + FP + FN)

(12)

Sensitivity (True Positive Fraction) is the percentage of thenumber of attack connections correctly classified in the test-ing dataset

Sensitivity = TP(TP + FN)

(13)

Specificity (True Negative Fraction) is the percentage of thenumber of normal connections correctly classified in the test-ing dataset

Specificity = TN(TP + FN)

(14)

False alarm rate (FAR) is the percentage of the number of nor-mal connections incorrectly classified in the testing andtraining dataset

False Alarm Rate (FAR) = FP(TN + FP)

(15)

Cross-validation is a technique for assessing how the resultsof a statistical analysis will generalize to an independentdataset It is the standard way of measuring the accuracy ofa learning scheme and it is used to estimate how accuratelya predictive model will perform in practice In this work10-fold cross-validation method is used for improving theclassifier reliability In 10-fold cross-validation the originaldata is divided randomly into 10 parts During each run oneof the partitions is chosen for testing while the remaining

The Scientific World Journal 9

Table 8 Performance comparison of classification algorithms on accuracy rate

Classification Algorithms Average accuracy () Feature selection method

C45 [6]

9911 All features9869 Genetic algorithm9884 Best-first9941 Correlation feature selection

BayesNet [6]

9953 All features9952 Genetic algorithm9891 Best-first9892 Correlation feature selection

ABC-SVM [7] 92768Binary ABCPSO-SVM [7] 8388

GA-SVM [7] 8073

KNN [8] 9824 All features9811 Fast feature selection

Bayes Classifier [8] 7609 All features7194 Fast feature selection

ANN [9] 8157 Feature reductionSSO-RF [10 11] 927 SSOHybrid SSO [12] 9767 SSORSDT [13] 9788 Rough setID3 [13] 97665 All featuresC45 [13] 97582FC-ANN [14] 9671 All features

Proposed MABC-EPSO8859 All features9932 Single feature selection method9982 Random feature selection method

nine-tenths are used for training This process is repeated10 times so that each partition is used for training exactlyonceThe average of the results from the 10-fold gives the testaccuracy of the algorithm [37]

52 Results and Discussions Themain motivation is to showthat the proposed hybrid method has the advantage ofbecoming an efficient classification algorithm based on ABCand PSO To further prove the robustness of the proposedmethod other popular machine learning algorithms [38]such asNaives Bayes (NB)which is a statistical classifier deci-sion tree (j48) radial basis function (RBF) network SupportVectorMachine (SVM) that is based on the statistical learningtheory and basic ABC are tested on KDDCuprsquo99 dataset Foreach classification algorithm their default control parametersare used In Table 8 the results are reported for accuracy rateobtained by various classification algorithms using differentfeature selection methods

The performance comparison of the classifiers on accu-racy rate is given in Figures 3ndash6 The results show thaton classifiying the dataset with all features the averageaccuracy rate of 855 845 and 8859 is obtained forSVM ABC and proposed hybrid approaches When SFSMis applied accuracy rate of ABC and proposed MABC-EPSO

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 3 Accuracy comparison of classifiers for DoS dataset

is increased significantly to 9436 and 9932 The highestaccuracy (9982) is reported when the proposed MABC-EPSO with random feature selection method is employed It

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

The Scientific World Journal 7

No

Yes

Initialize the parameters of EPSO and MABC

Evaluate the fitness value

Calculate the particle Employed bee phase

Update particle positions

Update the best

Onlooker bee phase

Scout bee phase

Determine the best of MABC

Is the termination condition satisfied

Select the best solution

EPSO MABC

Data preprocessing

Feature selection using SFSM and RFSM

Network audit data

Determine the gbest of EPSO and the best of MABC

Determine the pbest and gbest of EPSO

Figure 2 Flowchart of the proposed hybrid MABC-EPSO model

Step 2 (initialization of the colony) Generate a colony witha specific number of individuals Bee colony is divided intotwo categories employed foragers and unemployed foragersaccording to each individualrsquos fitness value on the other handas a particle swarm calculate the fitness value of each particleand take the best location as the global best location

Step 3 In bee colony to evaluate the fitness value of eachsolution an employee bee is assigned using (5)The employeebee selects a new candidate solution from the nearby foodsources and then uses greedy selectionmethod by calculatingthe Rastrigin function as follows

Min119891 (119909) = 10119899 +

119899

sum

119894=1[1199092

119894minus 10 cos (2120587119909

119894)] (7)

Amultimodal function is said to contain more than one localoptimum A function of variables is separable if it can bemodified as a sum of functions of just one variable [34] Thedimensionality of the search space is another significant fac-tor in the complexity of the problem The challenge involvedin finding optimal solutions to this function is that on theway towards the global optimum an optimization problemcan be easily confined in a local optimumHence the classicalbenchmark function Rastrigin [34] is implemented using

Artificial Bee Colony algorithm and named as ModifiedArtificial BeeColony (MABC) algorithm In (1)119891

119894is Rastrigin

function whose value is 0 at its global minimum (0 0 0)This function is chosen because it is considered to be oneof the best test functions for finding the global minimumInitialization range for the function is [minus15 15]This functionis with cosine modulation to produce many local minimaThus the function is multimodal

Step 4 If the fitness value is larger than the earlier one thebee remembers the new point and forgets the previous oneotherwise it keeps the previous solution Based on the sharedinformation by employee bees an onlooker bee calculatesthe shared fitness value and selects a food source with aprobability value computed as in (6)

Step 5 An onlooker bee constructs a new solution selectedamong the neighbors of a previous solution It also checksthe fitness value and if this value is better than the previousone it will substitute the old one with the new positionotherwise it evokes the old position The objective of scoutbees is to determine new random food sources to substitutethe solutions that cannot be enhanced after reaching theldquolimitrdquo value In order to obtain the best optimized solutionthe algorithm goes through a predefined number of cycles

8 The Scientific World Journal

(MCN) After all the choices have been made the best solu-tion generated in that iteration is called MABCbest

Step 6 As there is a large effect of initial velocity in the bal-ancing of exploration and exploitation process of swarmin this proposed Enhanced Particle Swarm Optimization(EPSO) algorithm inertia weight (120596) [35] is used to controlthe velocity and hence the velocity update equation (8)becomes as follows

V119905119894119889= 120596 sdot V119905minus1

119894119889+ 1198881 sdot rand1 sdot (119901119894119889 minus119909

119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(8)

A small inertia weight facilitates a local search whereas alarge inertia weight facilitates a global search In the EPSOalgorithm linear decreasing inertia weight [36] as in (9) isused to enhance the efficiency and performance of PSO Itis found experimentally that inertia weight from 09 to 04provides the optimal results

119908119896= 119908max minus

119908max minus 119908minitermax

times 119896 (9)

In particle swarm after the comparison among the solutionsthat each particle has experienced and the comparison amongthe solutions that all the particles have ever experienced thebest location in that iteration is called EPSObest

Step 7 Theminimum of the value MABCbest and EPSObest iscalled Best and is defined as

Best =

EPSObest if EPSObest le MABCbest

MABCbest if MABCbest le EPSObest(10)

Step 8 If the termination condition is satisfied then end theprocess and report the best solution Otherwise return toStep 2

Parameter Settings The algorithms are evaluated using thetwo feature sets selected by SFSM and RFSM In ABC algo-rithm the parameters set are bee colony size 40 MCN 500and limit 5 In EPSO algorithm the inertia weight 120596 in (11)varies from 09 to 07 linearly with the iterations Also theacceleration coefficients 119888

1and 1198882are set as 2 The upper and

lower bounds for V(Vmin Vmax) are set as the maximum upperand lower bounds of 119909

V119905119894119889= 120596V119905minus1119894119889

+ 1198881rand (0 1) (119901119894119889 minus119909119905minus1119894119889)

+ 1198882rand (0 1) (119901119892119889 minus119909119905minus1119894119889)

(11)

5 Experimental Work

This section provides the performance metrics that are usedto assess the efficiency of the proposed approach It alsopresents and analyzes the experimental results of hybridapproach and compares it with the other classifiers

Table 7 Confusion matrix

Actual PredictedNormal Attack

Normal True Negative (TN) False Positive (FP)Attack False Negative (FN) True Positive (TP)True Positive (TP) the number of of attacks that are correctly identifiedTrue Negative (TN) the number of normal records that are correctlyclassifiedFalse Positive (FP) the number of normal records incorrectly classifiedFalse Negative (FN) the number of attacks incorrectly classified

51 PerformanceMetrics Theperformancemetrics like accu-racy sensitivity specificity false alarm rate and trainingtime are recorded for the intrusion detection dataset onapplying the proposed MABC-PSO classification algorithmGenerally sensitivity and specificity are the statistical mea-sures used to carry out the performance of classificationalgorithms Hence sensitivity and specificity are chosen to bethe parametric indices for carrying out the classification taskIn intrusion detection problem sensitivity can also be calleddetection rate The number of instances predicted correctlyor incorrectly by a classification model is summarized in aconfusion matrix and is shown in Table 7

The classification accuracy is the percentage of the overallnumber of connections correctly classified

Classification accuracy = (TP + TN)(TP + TN + FP + FN)

(12)

Sensitivity (True Positive Fraction) is the percentage of thenumber of attack connections correctly classified in the test-ing dataset

Sensitivity = TP(TP + FN)

(13)

Specificity (True Negative Fraction) is the percentage of thenumber of normal connections correctly classified in the test-ing dataset

Specificity = TN(TP + FN)

(14)

False alarm rate (FAR) is the percentage of the number of nor-mal connections incorrectly classified in the testing andtraining dataset

False Alarm Rate (FAR) = FP(TN + FP)

(15)

Cross-validation is a technique for assessing how the resultsof a statistical analysis will generalize to an independentdataset It is the standard way of measuring the accuracy ofa learning scheme and it is used to estimate how accuratelya predictive model will perform in practice In this work10-fold cross-validation method is used for improving theclassifier reliability In 10-fold cross-validation the originaldata is divided randomly into 10 parts During each run oneof the partitions is chosen for testing while the remaining

The Scientific World Journal 9

Table 8 Performance comparison of classification algorithms on accuracy rate

Classification Algorithms Average accuracy () Feature selection method

C45 [6]

9911 All features9869 Genetic algorithm9884 Best-first9941 Correlation feature selection

BayesNet [6]

9953 All features9952 Genetic algorithm9891 Best-first9892 Correlation feature selection

ABC-SVM [7] 92768Binary ABCPSO-SVM [7] 8388

GA-SVM [7] 8073

KNN [8] 9824 All features9811 Fast feature selection

Bayes Classifier [8] 7609 All features7194 Fast feature selection

ANN [9] 8157 Feature reductionSSO-RF [10 11] 927 SSOHybrid SSO [12] 9767 SSORSDT [13] 9788 Rough setID3 [13] 97665 All featuresC45 [13] 97582FC-ANN [14] 9671 All features

Proposed MABC-EPSO8859 All features9932 Single feature selection method9982 Random feature selection method

nine-tenths are used for training This process is repeated10 times so that each partition is used for training exactlyonceThe average of the results from the 10-fold gives the testaccuracy of the algorithm [37]

52 Results and Discussions Themain motivation is to showthat the proposed hybrid method has the advantage ofbecoming an efficient classification algorithm based on ABCand PSO To further prove the robustness of the proposedmethod other popular machine learning algorithms [38]such asNaives Bayes (NB)which is a statistical classifier deci-sion tree (j48) radial basis function (RBF) network SupportVectorMachine (SVM) that is based on the statistical learningtheory and basic ABC are tested on KDDCuprsquo99 dataset Foreach classification algorithm their default control parametersare used In Table 8 the results are reported for accuracy rateobtained by various classification algorithms using differentfeature selection methods

The performance comparison of the classifiers on accu-racy rate is given in Figures 3ndash6 The results show thaton classifiying the dataset with all features the averageaccuracy rate of 855 845 and 8859 is obtained forSVM ABC and proposed hybrid approaches When SFSMis applied accuracy rate of ABC and proposed MABC-EPSO

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 3 Accuracy comparison of classifiers for DoS dataset

is increased significantly to 9436 and 9932 The highestaccuracy (9982) is reported when the proposed MABC-EPSO with random feature selection method is employed It

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

8 The Scientific World Journal

(MCN) After all the choices have been made the best solu-tion generated in that iteration is called MABCbest

Step 6 As there is a large effect of initial velocity in the bal-ancing of exploration and exploitation process of swarmin this proposed Enhanced Particle Swarm Optimization(EPSO) algorithm inertia weight (120596) [35] is used to controlthe velocity and hence the velocity update equation (8)becomes as follows

V119905119894119889= 120596 sdot V119905minus1

119894119889+ 1198881 sdot rand1 sdot (119901119894119889 minus119909

119905minus1119894119889) + 1198882

sdot rand2 sdot (119901119892119889 minus119909119905minus1119894119889)

(8)

A small inertia weight facilitates a local search whereas alarge inertia weight facilitates a global search In the EPSOalgorithm linear decreasing inertia weight [36] as in (9) isused to enhance the efficiency and performance of PSO Itis found experimentally that inertia weight from 09 to 04provides the optimal results

119908119896= 119908max minus

119908max minus 119908minitermax

times 119896 (9)

In particle swarm after the comparison among the solutionsthat each particle has experienced and the comparison amongthe solutions that all the particles have ever experienced thebest location in that iteration is called EPSObest

Step 7 Theminimum of the value MABCbest and EPSObest iscalled Best and is defined as

Best =

EPSObest if EPSObest le MABCbest

MABCbest if MABCbest le EPSObest(10)

Step 8 If the termination condition is satisfied then end theprocess and report the best solution Otherwise return toStep 2

Parameter Settings The algorithms are evaluated using thetwo feature sets selected by SFSM and RFSM In ABC algo-rithm the parameters set are bee colony size 40 MCN 500and limit 5 In EPSO algorithm the inertia weight 120596 in (11)varies from 09 to 07 linearly with the iterations Also theacceleration coefficients 119888

1and 1198882are set as 2 The upper and

lower bounds for V(Vmin Vmax) are set as the maximum upperand lower bounds of 119909

V119905119894119889= 120596V119905minus1119894119889

+ 1198881rand (0 1) (119901119894119889 minus119909119905minus1119894119889)

+ 1198882rand (0 1) (119901119892119889 minus119909119905minus1119894119889)

(11)

5 Experimental Work

This section provides the performance metrics that are usedto assess the efficiency of the proposed approach It alsopresents and analyzes the experimental results of hybridapproach and compares it with the other classifiers

Table 7 Confusion matrix

Actual PredictedNormal Attack

Normal True Negative (TN) False Positive (FP)Attack False Negative (FN) True Positive (TP)True Positive (TP) the number of of attacks that are correctly identifiedTrue Negative (TN) the number of normal records that are correctlyclassifiedFalse Positive (FP) the number of normal records incorrectly classifiedFalse Negative (FN) the number of attacks incorrectly classified

51 PerformanceMetrics Theperformancemetrics like accu-racy sensitivity specificity false alarm rate and trainingtime are recorded for the intrusion detection dataset onapplying the proposed MABC-PSO classification algorithmGenerally sensitivity and specificity are the statistical mea-sures used to carry out the performance of classificationalgorithms Hence sensitivity and specificity are chosen to bethe parametric indices for carrying out the classification taskIn intrusion detection problem sensitivity can also be calleddetection rate The number of instances predicted correctlyor incorrectly by a classification model is summarized in aconfusion matrix and is shown in Table 7

The classification accuracy is the percentage of the overallnumber of connections correctly classified

Classification accuracy = (TP + TN)(TP + TN + FP + FN)

(12)

Sensitivity (True Positive Fraction) is the percentage of thenumber of attack connections correctly classified in the test-ing dataset

Sensitivity = TP(TP + FN)

(13)

Specificity (True Negative Fraction) is the percentage of thenumber of normal connections correctly classified in the test-ing dataset

Specificity = TN(TP + FN)

(14)

False alarm rate (FAR) is the percentage of the number of nor-mal connections incorrectly classified in the testing andtraining dataset

False Alarm Rate (FAR) = FP(TN + FP)

(15)

Cross-validation is a technique for assessing how the resultsof a statistical analysis will generalize to an independentdataset It is the standard way of measuring the accuracy ofa learning scheme and it is used to estimate how accuratelya predictive model will perform in practice In this work10-fold cross-validation method is used for improving theclassifier reliability In 10-fold cross-validation the originaldata is divided randomly into 10 parts During each run oneof the partitions is chosen for testing while the remaining

The Scientific World Journal 9

Table 8 Performance comparison of classification algorithms on accuracy rate

Classification Algorithms Average accuracy () Feature selection method

C45 [6]

9911 All features9869 Genetic algorithm9884 Best-first9941 Correlation feature selection

BayesNet [6]

9953 All features9952 Genetic algorithm9891 Best-first9892 Correlation feature selection

ABC-SVM [7] 92768Binary ABCPSO-SVM [7] 8388

GA-SVM [7] 8073

KNN [8] 9824 All features9811 Fast feature selection

Bayes Classifier [8] 7609 All features7194 Fast feature selection

ANN [9] 8157 Feature reductionSSO-RF [10 11] 927 SSOHybrid SSO [12] 9767 SSORSDT [13] 9788 Rough setID3 [13] 97665 All featuresC45 [13] 97582FC-ANN [14] 9671 All features

Proposed MABC-EPSO8859 All features9932 Single feature selection method9982 Random feature selection method

nine-tenths are used for training This process is repeated10 times so that each partition is used for training exactlyonceThe average of the results from the 10-fold gives the testaccuracy of the algorithm [37]

52 Results and Discussions Themain motivation is to showthat the proposed hybrid method has the advantage ofbecoming an efficient classification algorithm based on ABCand PSO To further prove the robustness of the proposedmethod other popular machine learning algorithms [38]such asNaives Bayes (NB)which is a statistical classifier deci-sion tree (j48) radial basis function (RBF) network SupportVectorMachine (SVM) that is based on the statistical learningtheory and basic ABC are tested on KDDCuprsquo99 dataset Foreach classification algorithm their default control parametersare used In Table 8 the results are reported for accuracy rateobtained by various classification algorithms using differentfeature selection methods

The performance comparison of the classifiers on accu-racy rate is given in Figures 3ndash6 The results show thaton classifiying the dataset with all features the averageaccuracy rate of 855 845 and 8859 is obtained forSVM ABC and proposed hybrid approaches When SFSMis applied accuracy rate of ABC and proposed MABC-EPSO

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 3 Accuracy comparison of classifiers for DoS dataset

is increased significantly to 9436 and 9932 The highestaccuracy (9982) is reported when the proposed MABC-EPSO with random feature selection method is employed It

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

The Scientific World Journal 9

Table 8 Performance comparison of classification algorithms on accuracy rate

Classification Algorithms Average accuracy () Feature selection method

C45 [6]

9911 All features9869 Genetic algorithm9884 Best-first9941 Correlation feature selection

BayesNet [6]

9953 All features9952 Genetic algorithm9891 Best-first9892 Correlation feature selection

ABC-SVM [7] 92768Binary ABCPSO-SVM [7] 8388

GA-SVM [7] 8073

KNN [8] 9824 All features9811 Fast feature selection

Bayes Classifier [8] 7609 All features7194 Fast feature selection

ANN [9] 8157 Feature reductionSSO-RF [10 11] 927 SSOHybrid SSO [12] 9767 SSORSDT [13] 9788 Rough setID3 [13] 97665 All featuresC45 [13] 97582FC-ANN [14] 9671 All features

Proposed MABC-EPSO8859 All features9932 Single feature selection method9982 Random feature selection method

nine-tenths are used for training This process is repeated10 times so that each partition is used for training exactlyonceThe average of the results from the 10-fold gives the testaccuracy of the algorithm [37]

52 Results and Discussions Themain motivation is to showthat the proposed hybrid method has the advantage ofbecoming an efficient classification algorithm based on ABCand PSO To further prove the robustness of the proposedmethod other popular machine learning algorithms [38]such asNaives Bayes (NB)which is a statistical classifier deci-sion tree (j48) radial basis function (RBF) network SupportVectorMachine (SVM) that is based on the statistical learningtheory and basic ABC are tested on KDDCuprsquo99 dataset Foreach classification algorithm their default control parametersare used In Table 8 the results are reported for accuracy rateobtained by various classification algorithms using differentfeature selection methods

The performance comparison of the classifiers on accu-racy rate is given in Figures 3ndash6 The results show thaton classifiying the dataset with all features the averageaccuracy rate of 855 845 and 8859 is obtained forSVM ABC and proposed hybrid approaches When SFSMis applied accuracy rate of ABC and proposed MABC-EPSO

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 3 Accuracy comparison of classifiers for DoS dataset

is increased significantly to 9436 and 9932 The highestaccuracy (9982) is reported when the proposed MABC-EPSO with random feature selection method is employed It

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

10 The Scientific World Journal

Table 9 Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8257 (6) 8711 (4) 8796 (3) 847 (5) 9082 (2) 9950 (1)Probe + 10 normal 8268 (5) 826 (6) 8372 (4) 8567 (3) 9658 (2) 9927 (1)R2L + 10 normal 8615 (4) 8255 (6) 8516 (5) 9061 (3) 9272 (2) 9924 (1)U2R + 10 normal 8406 (6) 8716 (3) 8554 (5) 8597 (4) 9731 (2) 998 (1)Average rank 525 475 425 375 2 1

AllSFSMRFSM

100

96

92

88

84

80

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 4 Accuracy comparison of classifiers for probe dataset

AllSFSMRFSM

100

95

90

85

80

75

Accu

racy

()

Na iuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 5 Accuracy comparison of classifiers for R2L dataset

is also observed that on applying random feature selectionmethod the accuracy of SVMandABC is increased to 9571and 9792The accuracy rate of NB j48 and RBF classifiersis comparatively high with RFSMmethod compared to SFSMand full feature set

AllSFSMRFSM

100

95

90

85

80

Accu

racy

()

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-E

PSO

Classification algorithms

Figure 6 Accuracy comparison of classifiers for U2R dataset

In order to test the significance of the differences amongclassifiers six classification algorithms previously mentionedover four datasets are considered and performed experimentsusing Friedman test and ANOVA Tables 9 and 10 depict theclassification accuracy using two feature selection methodsand their ranks computed through Friedman test (ranking isgiven in parenthesis) The null hypothesis states that all theclassifiers perform in the same way and hence their ranksshould be equal The Friedman test ranked the algorithmsfor each dataset with the best performing algorithm gettingthe rank of 1 the second best algorithm getting the rank2 As seen in Table 9 MABC-EPSO is the best performingalgorithm whereas Naıve Bayes is the least performingalgorithm and Table 10 shows that MABC-EPSO is the bestperforming algorithm whereas Naıve Bayes and j48 are theleast performing algorithms Friedman statistic 1205942 = 15716

and 119865119865

= 11005 for SFSM and 1205942= 15712 and 119865

119865=

10992 for RFSM are computed Having four datasets andsix classification algorithms distribution of 119865

119865is based on 119865

distribution with 6minus1 = 5 and (6minus1)lowast(4minus1) = 15 degrees offreedom The critical value of 119865(5 15) for 120572 = 005 is 29013and 119875 value lt 005 So we reject the null hypothesis and thedifferences among classifiers are significant

The means of several groups by estimating the variancesamong groups and within a group are compared using theANOVA test Here the null hypothesis which is set as all

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

The Scientific World Journal 11

Table 10 Accuracy rates using RFSM feature selection method and Friedman ranks

Dataset NB J48 RBF SVM ABC MABC-EPSODoS + 10 normal 8304 (6) 9005 (4) 8883 (5) 9402 (3) 9643 (2) 9981 (1)Probe + 10 normal 8401 (5) 8272 (6) 8594 (4) 9587 (3) 9731 (2) 9986 (1)R2L + 10 normal 8632 (4) 8310 (6) 8611 (5) 9704 (3) 9896 (2) 9980 (1)U2R + 10 normal 8515 (6) 8842 (5) 8898 (4) 9591 (3) 9896 (2) 9980 (1)Average rank 525 525 45 3 2 1

Table 11 ANOVA results for accuracy rate of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM method

Between groups 7815143 5 1563029 3189498 lt005 2772853Within groups 8820985 18 4900547Total 8697241 23

RFSMmethodBetween groups 8794307 5 1758861 4854728 lt005 2772853Within groups 6521375 18 3622986Total 9446444 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

population means are equal is tested Also 119875 value and thevalue of 119865 are computed If the null hypothesis is rejectedTukeyrsquos post hoc analysis method is applied to performa multiple comparison which tests all means pairwise todetermine which ones are significantly different Table 11shows the results determined by ANOVA In SFSM methodthe ANOVA test rejected the null hypothesis as calculated119865(5 18) = 31895 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos post hoc test is performedwhich states that significantly there are differences amongMABC-EPSO and ABC with other classifiers but not amongNB j48 RBF and SVMAlso there are significant differencesbetween ABC and MABC-EPSO so ABC and MABC-EPSOare the best classifiers in this case In RFSM method therewere statistically significant differences between algorithmsand hence null hypothesis was rejected as the calculated119865(5 18) = 48547 is greater than F-critical (2773) for thesignificance level of 5 Tukeyrsquos posthoc test is performedand it reveals that there is a statistically significant differenceamong SVM ABC and MABC-EPSO with other classifiersbut not among NB j48 and RBF However there is no sta-tistically significant difference between the ABC andMABC-EPSO algorithms

In Table 12 the results are reported for detection rateobtained by various classification algorithms using differentfeature selection methods The comparison results of sen-sitivity and specificity obtained by proposed method usingthe two feature selection methods are given in Figures 7ndash10 The results show that on classifying the dataset withall features detection rate of 875 8364 and 8716is obtained for SVM ABC and proposed MABC-EPSOapproaches On applying the single feature selection methoddetection rate of SVM ABC and proposed MABC-EPSOis increased significantly to 8897 8990 and 9809respectively The highest detection rate (9867) is reported

100

95

90

85

80

75

70

Sens

itivi

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 7 Comparison on sensitivity using SFSM method

when the proposedMABC-EPSOwith random feature selec-tion method is employed MABC-EPSO with SFSM alsoshows a comparable performance than other classifier combi-nations The performance of NB j48 and RBF is better interms of specificity and sensitivity using RFSMmethod com-pared to SFSMmethod

Table 13 shows the ANOVA results of analyzing the per-formance of the classifiers based on specificity In both SFSMand RFSM methods ANOVA test determined that there aresignificant differences among the classification algorithmsand rejected null hypothesis as calculated 119865(5 18 = 52535)and 119865(5 18 = 23539) are greater than F-critical (2773)

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

12 The Scientific World Journal

Table 12 Performance comparison of classification algorithms on detection rate

Classification Algorithm Average detection rate () Feature selection methodNaıve Bayes [15] 9227

Genetic algorithm

C45 [15] 921Random forest [15] 8921Random tree [15] 8898REP tree [15] 8911Neurotree [15] 9838

GMDH Based neural network [16]937 Information gain975 Gain ratio953 GMDH

Neural network [17] 8157 Feature reductionHybrid evolutionary neural network [18] 9151 Genetic algorithmImproved SVM (PSO + SVM + PCA) [19] 9775 PCAEnsemble Bayesian combination [20] 9335 All featuresVoting + j48 + Rule [21] 9747 All featuresVoting + AdaBoost + j48 [21] 9738Rough set neural network algorithm [22] 90 All featuresPSO based fuzzy system [23] 937 All features

Proposed MABC-EPSO8716 All features9809 Single feature selection method9867 Random feature selection method

100

95

90

85

80

75

70

Sens

itivi

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 8 Comparison on sensitivity using RFSMmethod

Finally multiple comaprison test concluded that MABC-EPSO has significant differences with all the classificationalgorithms with 005 (119875 = 005) as significance level How-ever there is no statistically significant difference between theSVM and ABC algorithms

Experiment was conducted to analyze the false alarm rateand training time of each classifier using SFSM and RFSMmethods Figure 11 indicates that MABC-EPSO produceslowest FAR (ranging from 0004 to 0005) using RFSM

100

96

92

88

84

80

76

Spec

ifici

ty (

)

SFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 9 Comparison on specificity using SFSM method

for all datasets Also the proposed hybrid approach usingSFSM shows a comparable performance with SVM andABC classifiers using RFSM method Table 14 shows thatthe training time of proposed approach has been signif-icantly reduced for both feature selection methods whencompared to other classification algorithms Training time ofthe proposed hybrid classifier considering all features is alsorecorded in Figure 12The results indicate that the time takenby proposed approach is considerably more when all featuresare employed It is also observed that the time consumed bythe proposed classifier using the features of RFSM method

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

The Scientific World Journal 13

Table 13 ANOVA results for specificity of classifiers

Source of variation SS df MS 119865 119875 value 119865-critSFSM

Between groups 6596518 5 1319304 525347 lt005 2772853Within groups 4520339 18 2511299Total 7048551 23

RFSMBetween groups 617818 5 1235636 2353957 lt005 2772853Within groups 9448535 18 5249186Total 7123033 23lowastSS sum of squared deviations about mean df degrees of freedom MS variance

Table 14 Training time of classification algorithms using SFSM and RFSM feature selection methods

Dataset SFSM RFSMNaıve Bayes J48 RBF SVM ABC MABC-EPSO Naıve Bayes J48 RBF SVM ABC MABC-EPSO

DoS + 10 normal 1020 47 38 286 278 222 995 395 328 259 207 15Probe + 10 normal 533 312 305 236 224 187 415 301 319 211 197 169U2R + 10 normal 475 381 308 221 216 198 401 346 279 180 178 065R2L + 10 normal 398 497 301 246 223 20 312 323 255 142 137 146

100

95

90

85

80

75

Spec

ifici

ty (

)

RFSM

Naiuml

veBa

yes

J48

RBF

SVM

ABC

MA

BC-P

SO

DoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 10 Comparison on specificity using RFSMmethod

is comparatively lesser than SFSM method According to theperformance of MABC-EPSO with random feature selectionmethod the proposed method can be used to solve intrusiondetection as classification problem

6 Conclusion

In this work a hybrid algorithm based on ABC and PSOwas proposed to classify the benchmark intrusion detectiondataset using the two feature selection methods SFSM and

0005

01015

02025

03035

04045

SVM

ABC

MA

BC-E

PSO

SVM

ABC

MA

BC-E

PSO

SFSM RFSM

False

alar

m ra

te

Classification algorithmsDoS + 10 normalProbe + 10 normal

U2R + 10 normalR2L + 10 normal

Figure 11 Performance comparison on false alarm rate of classifiers

RFSM A study of different machine learning algorithms wasalso presented Performance comparisons amongst differentclassifiers were made to understand the effectiveness of theproposed method in terms of various performance metricsThe main goal of this paper was to show that the classifierswere significantly different and the proposed hybrid methodoutperforms other classifiers Friedman test and ANOVA testwas applied to check whether the classification algorithmswere significantly different Based on the conclusion of

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

14 The Scientific World Journal

0

3

6

9

12

15

18

21

24

normal normal normal normal

Trai

ning

tim

e (m

s)

DatasetAllSFSMRFSM

DoS + 10 Probe + 10 U2R + 10 R2L + 10

Figure 12 Training time of MABC-EPSO

ANOVA test the null hypotheses were rejected if they weresignificant Post hoc analysis using Tukeyrsquos test was appliedto select which classification algorithm was significantly dif-ferent from the others The experiments also showed that theeffectiveness of ABC is comparable to the proposed hybridalgorithm In general the proposed hybrid classifier pro-duced best results using the features of both SFSM andRFSM methods and is also significantly different from otherclassification algorithms Hence MABC-EPSO can be con-sidered as a preferable method for intrusion detection thatoutperforms its counterpart methods In the future we willfurther improve feature selection algorithm and investigatethe use of bioinspired approaches as classification algorithmin the area of intrusion detection

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] S XWu andWBanzhaf ldquoTheuse of computational intelligencein intrusion detection systems a reviewrdquo Applied Soft Comput-ing Journal vol 10 no 1 pp 1ndash35 2010

[2] E Bonabeau M Dorigo and G Theraulaz Swarm IntelligenceFrom Natural to Artificial Intelligence Oxford University PressOxford UK 1999

[3] G Zhu and S Kwong ldquoGbest-guided artificial bee colony algo-rithm for numerical function optimizationrdquoAppliedMathemat-ics and Computation vol 217 no 7 pp 3166ndash3173 2010

[4] R Kohavi and G H John ldquoWrappers for feature subset selec-tionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash324 1997

[5] W Lee and S J Stolfo ldquoA framework for constructing featuresandmodels for intrusion detection systemsrdquoACMTransactionson Information and System Security vol 3 no 4 pp 227ndash261

[6] H Nguyen K Franke and S Petrovic ldquoImproving effectivenessof intrusion detection by correlation feature selectionrdquo inProceedings of the 5th International Conference on AvailabilityReliability and Security (ARES rsquo10) pp 17ndash24 February 2010

[7] J Wang T Li and R Ren ldquoA real time IDSs based on artificialbee colony-support vector machine algorithmrdquo in Proceedingsof the 3rd International Workshop on Advanced ComputationalIntelligence (IWACI rsquo10) pp 91ndash96 IEEE Suzhou ChinaAugust 2010

[8] S Parsazad E Saboori and A Allahyar ldquoFast feature reductionin intrusion detection datasetsrdquo in Proceedings of the 35thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics (MIPRO rsquo12) pp1023ndash1029 May 2012

[9] A H Sung and S Mukkamala ldquoIdentifying important featuresfor intrusion detection using support vector machines andneural networksrdquo in Proceedings of the International Symposiumon Applications and the Internet pp 209ndash216 IEEE OrlandoFla USA January 2003

[10] S Revathi and A Malathi ldquoOptimization of KDD Cup 99dataset for intrusion detection using hybrid swarm intelligencewith random forest classifierrdquo International Journal of AdvancedResearch in Computer Science and Software Engineering vol 3no 7 pp 1382ndash1387 2013

[11] S Revathi and A Malathi ldquoData preprocessing for intrusiondetection system using swarm intelligence techniquesrdquo Interna-tional Journal of Computer Applications vol 75 no 6 pp 22ndash272013

[12] Y Y Chung and N Wahid ldquoA hybrid network intrusion detec-tion system using simplified swarm optimization (SSO)rdquo Ap-plied Soft Computing vol 12 no 9 pp 3014ndash3022 2012

[13] L Zhou and F Jiang ldquoA rough set based decision tree algorithmand its application in intrusion detectionrdquo in Pattern Recogni-tion and Machine Intelligence S O Kuznetsov D P MandalM K Kundu and S K Pal Eds vol 6744 of Lecture Notes inComputer Science pp 333ndash338 Springer Berlin Germany 2011

[14] G Wang J Hao J Mab and L Huang ldquoA new approach tointrusion detection using Artificial Neural Networks and fuzzyclusteringrdquo Expert Systems with Applications vol 37 no 9 pp6225ndash6232 2010

[15] S S Sivatha Sindhu S Geetha and A Kannan ldquoDecisiontree based light weight intrusion detection using a wrapperapproachrdquo Expert Systems with Applications vol 39 no 1 pp129ndash141 2012

[16] Z A Baig S M Sait and A Shaheen ldquoGMDH-based networksfor intelligent intrusion detectionrdquo Engineering Applications ofArtificial Intelligence vol 26 no 7 pp 1731ndash1740 2013

[17] S Mukkamala G Janoski and A Sung ldquoIntrusion detectionusing neural networks and support vector machinesrdquo in Pro-ceedings of the International Joint Conference onNeuralNetworks(IJCNN rsquo02) pp 1702ndash1707 May 2002

[18] F Li ldquoHybrid neural network intrusion detection system usinggenetic algorithmrdquo in Proceedings of the International Confer-ence on Multimedia Technology pp 1ndash4 October 2010

[19] H Wang G Zhang E Mingjie and N Sun ldquoA novel intrusiondetection method based on improved SVM by combining PCAand PSOrdquoWuhan University Journal of Natural Sciences vol 16no 5 pp 409ndash413 2011

[20] T-S Chou J Fan S Fan and K Makki ldquoEnsemble of machinelearning algorithms for intrusion detectionrdquo in Proceedings ofthe IEEE International Conference on Systems Man and Cyber-netics (SMC rsquo09) pp 3976ndash3980 IEEE San Antonio TX USAOctober 2009

[21] M Panda and M Ranjan Patra ldquoEnsemble voting systemfor anomaly based network intrusion detectionrdquo International

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 15: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

The Scientific World Journal 15

Journal of Recent Trends in Engineering vol 2 no 5 pp 8ndash132009

[22] N I Ghali ldquoFeature selection for effective anomaly-basedintrusion detectionrdquo International Journal of Computer Scienceand Network Security vol 9 no 3 pp 285ndash289 2009

[23] A Einipour ldquoIntelligent intrusion detection in computer net-works using fuzzy systemsrdquo Global Journal of Computer Scienceand Technology vol 12 no 11 pp 19ndash29 2012

[24] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 1995

[25] K Satpute S Agrawal J Agrawal and S Sharma ldquoA survey onanomaly detection in network intrusion detection system usingparticle swarm optimization based machine learning tech-niquesrdquo in Proceedings of the International Conference on Fron-tiers of Intelligent Computing Theory and Applications (FICTA)vol 199 of Advances in Intelligent Systems and Computing pp441ndash452 Springer Berlin Germany 2013

[26] Y Y Chung and N Wahid ldquoA hybrid network intrusiondetection system using simplified swarm optimization (SSO)rdquoApplied Soft Computing Journal vol 12 no 9 pp 3014ndash30222012

[27] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing Journalvol 8 no 1 pp 687ndash697 2008

[28] D Karaboga and B Akay ldquoA comparative study of artificial Beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[29] D D Kumar and B Kumar ldquoOptimization of benchmarkfunctions using artificial bee colony (ABC) algorithmrdquo IOSRJournal of Engineering vol 3 no 10 pp 9ndash14 2013

[30] httpkddicsuciedudatabaseskddcup99kddcupdata 10percentgz

[31] C B D Newman and C Merz ldquoUCI repository of machinelearning databasesrdquo Tech Rep Department of Information andComputer Science University of California Irvine Calif USA1998 httpwwwicsuciedusimmlearnMLRepository

[32] M Tavallaee E BagheriW Lu and A A Ghorbani ldquoA detailedanalysis of the KDD CUP 99 data setrdquo in IEEE Symposium onComputational Intelligence for Security and Defense Applications(CISDA rsquo09) July 2009

[33] P Amudha and H Abdul Rauf ldquoPerformance analysis of datamining approaches in intrusion detectionrdquo in Proceedings of theInternational Conference on Process Automation Control andComputing (PACC rsquo11) pp 9ndash16 July 2011

[34] R AThakker M S Baghini andM B Patil ldquoAutomatic designof low-power low-voltage analog circuits using particle swarmoptimization with re-initializationrdquo Journal of Low PowerElectronics vol 5 no 3 pp 291ndash302 2009

[35] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[36] Y Shi and R C Eberhart ldquoA modified particle swarm opti-mizerrdquo in Proceedings of the IEEE World Congress on Compu-tational Intelligence pp 69ndash73 IEEE Anchorage Alaska USAMay 1998

[37] N A Diamantidis D Karlis and E A Giakoumakis ldquoUnsuper-vised stratification of cross-validation for accuracy estimationrdquoArtificial Intelligence vol 116 no 1-2 pp 1ndash16 2000

[38] D T Larose Discovering Knowledge in DatamdashAn Introductionto Data Mining John Wiley amp Sons 2005

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 16: Research Article A Hybrid Swarm Intelligence Algorithm for ...downloads.hindawi.com/journals/tswj/2015/574589.pdfA Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014