a professional comparison of c4.5, mlp, svm for network … · a professional comparison of c4.5,...

15
A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta 1 , Amneh Alamleh 2 1 Computers and Systems Department, Electronics Research Institute, Giza, Egypt 2 Computer Science Department, Zarqa University, Zarqa, Jordan [email protected], [email protected] Abstract The volume of targeted network attacks is increas- ing continuously over time. This causes great finan- cial loss. Intrusion Detection Systems (IDSs) is one of the main solutions for computer and network secu- rity. We need IDS to identify the un-authorized access that attempt to compromise confidentiality, integrity or availability of computer or computer network. In this paper, we attempt to provide new models for in- trusion detection (ID) problem using Decision Tree (DT) based C4.5 algorithm, Multi-Layer Perceptron (MLP) and Support Vector Machine (SVM). Number of attacks were classified using the three methods. A training and testing data proposed by DARPA is used to develop and evaluate these proposed models. To enhance the performance of the proposed models and speeding up the detection process, a set of features are selected using the Best First Search (BFS) and the Genetic Search (GS). A comparison between the models developed in each case shall be provided. The proposed models were capable of reducing the com- plexity while keeping acceptable detection accuracy. Keywords: Network Security, Intrusion Detec- tion, Classification, C4.5, Artificial Neural Networks, Support Vector Machine, KDD, NSL-KDD 1 Introduction Computer security is defined as the protection of com- puting systems against threats to confidentiality, in- tegrity, and availability [1]. Information confidential- ity implies that information is revealed to authorize people with pre-defined rights. Information integrity lead to protecting information from being destroyed or corrupted under any condition. Information avail- ability means that system is capable of providing the services at any given time. IDS plays a very important role in systems security for a long time. It helps in protecting our comput- ers and network systems by detecting any new trial of systems abuse or attack. IDSs could be classified in multiple dimensions based on detection method, architecture and their post detection action. While there are multiple types of intruder attacks; tradi- tional IDSs requires a huge amount of human effort to maintain and improve their performance [2]. Two types of ID systems are defined in the literature; they are the Misuse detection and Anomaly detection. In- trusion detection of pre-defined patterns is termed misuse detection while identifying the abnormalities from the normal network behaviors is called anomaly detection [3]. The best IDS is expected to discover new types of attacks in minimum time and trigger the required action. It is almost impossible to reach one hundred percentage of IDS accuracy; research effort is focusing on raising IDS accuracy as much as possible. Classification of IDSs Systems as presented in [4] is shown in Figure 1. In this research, we are interested in network anomaly detection methods. In the present research study, an off-line intru- sion detection system is implemented using three algorithms: Decision Tree based C4.5 algorithm, Multi-Layer Perceptron, and Support Vector Ma- chine. Three intrusion detection models shall be im- plemented using the NSL-KDD data-set.The proposed models shall have 41 features (i.e. inputs) and six classes (i.e. outputs). Due to model complexity of the models, we proposed number of order reduction tech- niques using the Best First Search and the Genetic Search. These techniques shall be used to reduce the number of features used in the learning process. This paper is organized as follows: Section 2, we provide a literature review which cover various tech- niques used to solve the ID problem. Section 3 briefly explains nature of the data set used in our exper- iments. A discussion on the KDD, the NSL-KDD and the modified data adopted, in this study. The three proposed algorithms; DT, ANN and SVM are described in Sections 4, 5 and 6. Feature selection methods based BFS and GS are presented in Section

Upload: others

Post on 21-May-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

A Professional Comparison of C4.5, MLP, SVM for NetworkIntrusion Detection based Feature Analysis

Alaa F. Sheta1, Amneh Alamleh2

1Computers and Systems Department, Electronics Research Institute, Giza, Egypt2Computer Science Department, Zarqa University, Zarqa, Jordan

[email protected], [email protected]

Abstract

The volume of targeted network attacks is increas-ing continuously over time. This causes great finan-cial loss. Intrusion Detection Systems (IDSs) is oneof the main solutions for computer and network secu-rity. We need IDS to identify the un-authorized accessthat attempt to compromise confidentiality, integrityor availability of computer or computer network. Inthis paper, we attempt to provide new models for in-trusion detection (ID) problem using Decision Tree(DT) based C4.5 algorithm, Multi-Layer Perceptron(MLP) and Support Vector Machine (SVM). Numberof attacks were classified using the three methods. Atraining and testing data proposed by DARPA is usedto develop and evaluate these proposed models. Toenhance the performance of the proposed models andspeeding up the detection process, a set of featuresare selected using the Best First Search (BFS) andthe Genetic Search (GS). A comparison between themodels developed in each case shall be provided. Theproposed models were capable of reducing the com-plexity while keeping acceptable detection accuracy.

Keywords: Network Security, Intrusion Detec-tion, Classification, C4.5, Artificial Neural Networks,Support Vector Machine, KDD, NSL-KDD

1 Introduction

Computer security is defined as the protection of com-puting systems against threats to confidentiality, in-tegrity, and availability [1]. Information confidential-ity implies that information is revealed to authorizepeople with pre-defined rights. Information integritylead to protecting information from being destroyedor corrupted under any condition. Information avail-ability means that system is capable of providing theservices at any given time.

IDS plays a very important role in systems securityfor a long time. It helps in protecting our comput-

ers and network systems by detecting any new trialof systems abuse or attack. IDSs could be classifiedin multiple dimensions based on detection method,architecture and their post detection action. Whilethere are multiple types of intruder attacks; tradi-tional IDSs requires a huge amount of human effortto maintain and improve their performance [2]. Twotypes of ID systems are defined in the literature; theyare the Misuse detection and Anomaly detection. In-trusion detection of pre-defined patterns is termedmisuse detection while identifying the abnormalitiesfrom the normal network behaviors is called anomalydetection [3]. The best IDS is expected to discovernew types of attacks in minimum time and trigger therequired action. It is almost impossible to reach onehundred percentage of IDS accuracy; research effort isfocusing on raising IDS accuracy as much as possible.Classification of IDSs Systems as presented in [4] isshown in Figure 1. In this research, we are interestedin network anomaly detection methods.

In the present research study, an off-line intru-sion detection system is implemented using threealgorithms: Decision Tree based C4.5 algorithm,Multi-Layer Perceptron, and Support Vector Ma-chine. Three intrusion detection models shall be im-plemented using the NSL-KDD data-set.The proposedmodels shall have 41 features (i.e. inputs) and sixclasses (i.e. outputs). Due to model complexity of themodels, we proposed number of order reduction tech-niques using the Best First Search and the GeneticSearch. These techniques shall be used to reduce thenumber of features used in the learning process.

This paper is organized as follows: Section 2, weprovide a literature review which cover various tech-niques used to solve the ID problem. Section 3 brieflyexplains nature of the data set used in our exper-iments. A discussion on the KDD, the NSL-KDDand the modified data adopted, in this study. Thethree proposed algorithms; DT, ANN and SVM aredescribed in Sections 4, 5 and 6. Feature selectionmethods based BFS and GS are presented in Section

Page 2: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

Figure 1: Classification of IDSs Systems [4].

7. The methods of evaluating the developed modelare presented in Section 8. Experimental results arepresented in Section 9. Finally, we present the con-clusions of this work.

2 Related Research

Soft-Computing (SC) techniques has a different char-acteristics from conventional (i.e. hard) computingsuch that, it can handle fuzziness, uncertainty, in-complete certainty, and approximation. Most softcomputing techniques are inspired from the way hu-man mind works and natural biological system. Theabove stated advantages of soft computing techniquesmakes them useful in solving the intrusion detectionproblem. Furthermore, soft computing may be con-sidered as a foundation element for the growing field ofconceptual intelligence. Some well-know branches ofSC are Fuzzy Logic (FL), Artificial Neural Networks(ANNs), Evolutionary Computation (EC), MachineLearning (ML) and Probabilistic Reasoning (PR).

2.1 Applications of MLP

ANN were used to solve intrusion detection problemin [5], the proposed model was able to identify threeclasses of attacks: Normal and two other attack types.The developed ANN model achieved high accuracy.Authors suggested including more attack scenarios inthe data set, they also suggested reducing the num-ber of records as a trial to minimize the complexity ofthe system. Another ANN model was proposed in [6].Authors defined the output of the ANN to be either1 or 0 based on the fact that the packet is infected ornot with intrusion. They explored the issue of reduc-ing the domain of feature set by using rough set theoryperformed on just one type of attack. Authors claimedthat their model was 20.5 times faster than the pre-vious ones. They suggested applying their method onother classes of attack as a future work.

In [7], authors presented four different algorithms

to solve the intrusion detection problem. They includethe Multilayer Linear Perceptron (MLP), Radial BaseFunction (RBF), Logistic Regression (LR) and VotedPerception (VP) using NSL-KDD data. All these al-gorithms were implemented in Weka [8], a softwarefor data mining, to evaluate the performance. To en-hance their results, feature reduction techniques wereapplied. The results showed that the MLP neural net-work algorithm provided more accurate results thanother algorithms.

2.2 Applications of ML

Various aspects of anomaly based intrusion detectionin computer security using Machine Learning (ML)was explored [9]. A Review of Intrusion detection so-lution using machine learning was presented in [10].This work presented a revision for 55 related researchstudies between 2000 and 2007 focusing on develop-ing single, hybrid, and ensemble classifiers. Recently,ten ML approaches which include Decision Tree J48,Bayesian Belief Network, Hybrid Naïve Bayes withDecision Tree, Rotation Forest, Hybrid J48 withLazy Locally weighted learning, Discriminative multi-nomial Naïve Bayes, Combining random Forest withNaïve Bayes and finally ensemble of classifiers usingJ48 and NB with AdaBoost AB to detect networkintrusions using the NSL-KDD data set [11]. Intru-sion detection on mobile ad hoc networks (MANETs)is challenging process. The reason is because of theirdynamic nature, and their highly resource-constrainednodes. In [12], author explored the use of Evolution-ary Computation (EC) techniques, specifically Ge-netic Programming (GP) and Grammatical Evolution(GE), to evolve intrusion detection programs.

2.3 Applications of DT

Classification based unsupervised and supervised MLtechniques in detecting intrusions using network audittrails was presented in [13]. Authors investigated well-known techniques such as the Frequent Pattern Treemining (FP-tree), classification and regression trees(CART), multivariate regression splines (MARS) andTreeNet for solving ID problem. Classification accu-racy based the ROC curve analysis was used to mea-sure the performance of each developed model. Theresults show that classification accuracies are better inthe cases of SVM and ANN. Farid et al. [14] proposeda new learning algorithm for anomaly base IDS us-ing DT. Their method modified the splitting weightsof the dataset. Their method involved changing theweights relative to posterior probabilities. The resultsof their work illustrate a better performance than thetraditional DT algorithm. An ensemble neural deci-sion tree was used in [15] for feature selection andmodel reduction. The proposed model was comparedto 6 types of decision trees. They used specificity and

Page 3: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

sensitivity as evaluation metrics. The results showedthat the proposed model performed better than othermethods. In [2], three types of decision trees: ID3,C4.5, and BFS were tested on NSL-KDD network in-trusion data set. Feature selection was performed us-ing Consistency Subset Evaluator (CSE). NSL-KDDdata set and 10-fold cross validation test mode wereused to train and test the three DT algorithms. Theanalysis of the results concluded that C4.5 performsbetter than BFS and ID3 in terms of prediction ac-curacy. Also, they used the ROC curve as evaluationcriteria.

2.4 Applications of SVM

Yao et. al. [16] proposed an enhanced SVM modelfor intrusion detection, they used rough set theory toreduce the number of features by removing the lessweighted ones. They evaluated the proposed modelusing KDD99 and UMN data sets against precision,recall, false positives, and false negatives criteria. Theresults showed that their model was more accurateand needs less time to perform.

Chen et al. [17] proposed a model for IDS usingSVM based system on a Rough Set Theory (RST).RST was used to reduce the number of features from41 to 29. The authors compared RST based SVMwith that of a full features and Entropy. Their pro-posed RST-SVM model resulted in a better accuracycompared to the other two mothods.

An integrated model of SVM model and DT modelfor multiclass classification proposed in [18]. Firstthey separated the classes by binary tree structure,then each class were fed to a number of SVMs as thenumber of the classes. The authors supposed that bycombining the two models the results will be moreaccurate, and the classification process will be fasterthan individual models. But they didn’t prove or sim-ulate their model.

A comparison between three types of Support Vec-tor Machine (SVM) kernel functions: Gaussian Kernel(Radial Basis Function-RBF), polynomial kernel, andSigmoid kernel was implemented in [19]. Using crossvalidation classifier and proper data set pre-processingshowed that RBF kernel function can achieve betterperformance that the two other kernel functions.

3 Data Set

3.1 KDD Data Set

KDD Cup 1999 is the most widely used data set inthe ID research. The data is accessible from [20]. Itcontains about 4,900,000 connection records. Eachrecord consists of 41 features. A statistical analysison this data set was presented in [21]. There are fourmajor categories of attacks in the KDD data set. Theyare:

• Denial of Service (DoS): Denial of Service isa type of attack where an attacker access thecomputing facility or memory supply and makeit too busy or too full such that it cannot levergenuine requests, thus rejecting users access toa machine.

• Surveillance and Other Probing (Prob-ing) Probing is a type of attack where an at-tacker scans a network to identified vulnerabil-ities. Thus, he/she can use the gathered infor-mation to look for exploits.

• Unauthorized Access from a Remote Ma-chine (R2L) A remote to user (R2L) attack is atype of attack where a packet is sent by attackerto a machine over a network, then pursuits themachine’s weakness to unlawfully access the net-work as a regular user.

• Unauthorized Access to Local Super User(U2R): User to root are a type of attacks wherean attacker access to network as a regular userthen exploit the network susceptibility to getroot access.

Many ML and pattern classification algorithmswere used to solve the intrusion detection problembased the KDD data set and failed to identify most ofthe user-to-root and remote-to-local attacks. In [22],authors introduced the deficiencies and limitations ofthe KDD data set to argue that this data set shouldnot be used to train pattern recognition or ML algo-rithms for misuse detection for these two attack cate-gories.

3.2 NSL-KDD Data Set

It was reported that the KDD data set is has manyproblems. For example: it contains a very huge num-ber of redundant records, and the difficulty level ofthe different records was not inversely proportional tothe percentage of records in the original KDD dataset. These deficits results in a very poor evaluation ofdifferent ID proposed techniques. NSL-KDD data setwas suggested to solve some of the inherent problemsof the KDD Cup 1999 data set. The proposed newdata set consists of selected records of the completeKDD data set and it recovering these problems [21].Table 1 shows the NSL-KDD data variables [23]. InTable 2, we show the distribution of attack recordsper attack category. The following are some of theNSL-KDD advantages over the original KDD data setas presented in [21]:

1. Redundant records are excluded in the trainingset. Thus, biased towards more frequent recordsis eliminated.

2. In the original KDD data set the number of se-lected records from each group level is inverselyproportional to the percentage of records.

Page 4: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

3. Provided that the number of records in thetraining and testing portion are sound, exper-iments on the whole set can be economicallytested without the necessity for a random sam-ple at a reduced scale.

Table 1: NSL-KDD Data Set Features [23]Variable No. Description Type1 duration continuous2 protocol_type symbolic3 service symbolic4 flag symbolic5 src_bytes continuous6 dst_bytes continuous7 land symbolic8 wrong_fragment continuous9 urgent continuous10 hot continuous11 num_failed_logins continuous12 logged_in symbolic13 num_compromised continuous14 root_shell continuous15 su_attempted continuous16 num_root continuous17 num_file_creations continuous18 num_shells continuous19 num_access_files continuous20 num_outbound_cmds continuous21 is_host_login symbolic22 is_guest_login symbolic23 count continuous24 srv_count continuous25 serror_rate continuous26 srv_serror_rate continuous27 rerror_rate continuous28 srv_rerror_rate continuous29 same_srv_rate continuous30 diff_srv_rate continuous31 srv_diff_host_rate continuous32 dst_host_count continuous33 dst_host_srv_count continuous34 dst_host_same_srv_rate continuous35 dst_host_diff_srv_rate continuous36 dst_host_same_src_port_rate continuous37 dst_host_srv_diff_host_rate continuous38 dst_host_serror_rate continuous39 dst_host_srv_serror_rate continuous40 dst_host_rerror_rate continuous41 dst_host_srv_rerror_rate continuous

4 Decision Tree

Decision tree is one of the most well-known andused classification algorithms. Decision tree algorithmknown as ID3 was known since 1970. C4.5 algorithmwas presented later by Quinlan [24]. C4.5 became abenchmark to which newer supervised learning algo-rithms are often compared. A classification and Re-gression Trees (CART) which was used to generate abinary decision trees as presented in [25]. ID3, C4.5,and CART adopt a greedy approach in which decision

Table 2: Distribution of attack records per attack cat-egory of the NSL-KDD

Attack Attack Number ofCategory Name Records

Back 956Land 18Neptune 41214Pod 201Smurf 2646teardrop 892

DoS 45927Satan 3633Ipsweep 3599Nmap 1493Portsweep 2931

Probe 11656Guess_Password 53Ftp_write 8Imap 11Phf 4Multihop 7Warezmaster 20Warezclient 890Spy 2

R2L 995Buffer_overflow 30Loadmodule 9Rootkit 10Perl 3

U2R 52Normal 67343Total 125973

trees are constructed in a top-down recursive divide-and-conquer way [24]. Unlike ID3; C4.5 can deal withcontinuous attributes and handles missing values, buta little slower than the other DT algorithms [2].

4.1 How to develop a DT?

Decision tree is a directed tree, conforms its struc-ture by recursively separates the set of observations.It consists of a root with no incoming edges, inter-nal or test nodes with exactly one outgoing edge foreach, and leaves which represent the decision nodeand have no outgoing edges [26]. The decision treedevelopment algorithm is a greedy algorithm whichis a top-down recursive divide-and-conquer in nature.The algorithm can be summarized as follows:

To reduce tree complexity, pruning algorithmswere presented. Pruning is a general technique togo against over fitting has a huge effect on the treesize, and a slight effect on the accuracy. It resultsin better accuracy as reported in [27]. Using DecisionTree, network connections can be classified as normal,anomaly, or other predefined types of attack.

4.2 How to Select Tree Root?

We want to determine which attribute can work as aroot of a tree given a set of training feature vectors.Information gain (IG) define how important certainattribute of the feature vectors is. IG helps deciding

Page 5: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

Algorithm 1: Basic steps of DT generation

1 Create a node N2 If samples are all of the same class, C then3 return N as a leaf node labeled with class C;4

5 If attribute-list is empty then6 return N as a leaf node labeled with the most7 common class in samples;8

9 Select test-attribute, the attribute among10 attribute-list with the highest information11 gain based the Entropy;12 Label node N with test-attribute;13

14 for each known value ai of test-attribute do15 Let si be the set of samples for which16 test-attribute= ai;17 If si is empty then18 Attach a leaf labeled with the most19 common class in samples;20 else attach the node returned by21 generate decision tree22 end if23 end for

the ordering of attributes in the nodes of a decisiontree. Equations 1 and 2 show how entropy and infor-mation gain are calculated [24].

IG = E(Parent)−AE(Children) (1)

Entropy =∑

i

−pi log2 pi (2)

E, AE are the entropy and the average entropy, re-spectively. pi is the probability of class i. Entropycomes from information theory. The higher the en-tropy the more the information content. For example,given a training data set in Table 4.2. The table hasthree features f1, f2 and f3 and the two classes A andB. Assuming that f1 is the split best attribute, thisnode would be further split.

f1 f2 f3 Class1 1 1 A1 1 0 A0 0 1 B1 0 0 B

Thus, the entropy of children and the gain can becomputed as in Equations 3 and 4.

Echild1= −

1

3log2(

1

3)−

2

3log2(

2

3)

= 0.5284 + 0.39

= 0.9184

Echild2= 0 (3)

Figure 2: Simple Tree Structure

Eparent = 1

IG = 1−3

4× (0.9184)−

1

4× (0)

= 0.3112 (4)

If we split using the feature f2, we get Equation 5and 6.

Echild1= 0

Echild2= 0 (5)

Eparent = 1

IG = 1−1

2× (0)−

1

2× (0)

= 1 (6)

Splitting using feature f2 shall produce the best gain.The developed tree structure in this case can be pre-sented as in Figure 2. This tree was developed usingWeka software [28].

5 Artificial Neural Network

Classification is one of the most active research andapplication areas of neural networks. A classifica-tion problem arises when an object needs to be al-located into a predefined group or class based on anumber of observed attributes associated to that ob-ject. ANN was successfully used to handle multi-classpattern classification problem [29,30], medical diagno-sis [31], bankruptcy prediction [32], handwritten char-acter recognition [33,34], and speech recognition [35].

ANN usually consist of many hundreds of simpleprocessing units which are connected together in acomplex communication network. Each unit or nodeis a simplified model of a real neuron which fires (sendsoff a new signal) if it receives a sufficiently strong inputsignal from the other nodes to which it is connected.The strength of these connections may be varied inorder for the network to perform different tasks corre-sponding to different patterns of node firing activity.ANN model consists of a set of synapses each of whichis characterized by a weight or strength of its own.

Page 6: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

5.1 Perceptron

Neuron is the basic processing unit in ANN. Each neu-ron has number of inputs and a single output. Eachinput has an assigned factor or parameter called theweight. The way how a neuron works, is as follows:an input signal to each neuron is multiplicity by thecorresponding weight then the result from the mul-tiplication is summed and passes through a transferfunction. This transfer function is most likely to be asigmoid function (see Equation 7). The most simpleneural network unit is called "Perceptron". If the re-sult of the summation is over a certain threshold, theneuron output will be activated else the output is not.

f(x) =1

1 + e−x(7)

For example, given a set of inputs xj and a set ofcorresponding weights wj between the input and hid-den neurons, the outputs of all neurons in the hiddenlayer are calculated by the summation function (seeEquation 8).

yi = f(n∑

j=1

wjxj + w0) (8)

5.2 Multilayer Perceptron

ANN consist of three layers named as: input layer,hidden layer, and output layer. Neurons are mostlikely fully connected. Each connection is signified bya weight. This weight is computed based on whatis called a learning algorithm. These neurons aregrouped together to form a layer.

MLP is a fully connected network because all in-puts/units in one layer are connected to all units inthe following layer. The input layer gets the initialdata, the hidden layer calculates several interim val-ues which are used to calculate output values in theOutput layer. The MLP can be represented mathe-matically as given in Equation 9 [36,37].

Figure 3: Proposed MLP architecture for the NSL-KDD data classification

yi = gi[Φ, θ]

= Fi

nh∑

j=1

Wi,jfj

(

nΦ∑

l=1

wj,lΦl + wj,0

)

+Wi,0

(9)

where yi is the output signal, gi is the functionrealized by the neural network and θ specifies theparameter vector, which contains all the adjustableparameters of the network (weights wj,l, and biasesWi,j). MLP is trained by using the Backpropagation(BP) learning algorithm. Training means adjustingthe network weights such that the objective criteria isminimized (i.e. minimize the error difference betweenthe network output y and the input Φ).

The ANN achieve a good match when the MeanSquare Error (MSE) is minimized (See Equation 10).Figure 3 shows the architecture of MLP with 41 in-puts which are the features of NSL-KDD and six out-puts which are the types of attacks. We used MLP todetect the six types of attacks available, in our datasamples.

MSE =1

n

n∑

i=1

(yi − yi)2 (10)

6 Support Vector Machines

Support Vector Machines (SVMs) are one of the latestdevelopment of supervised machine learning technique[38]. A survey of SVMs can be found in [39, 40]. Al-though SVM were known since late seventies [41, 42],it started to receive attention on late nineties [41]. Itwas applied basically to pattern recognition, also usedfor pattern classification problems like image recogni-tion, text recognition, face detection, etc [43]. How-ever many research was implemented based SVM insolving intrusion detection problem such as in [44]–[47]. SVMs works mainly by deriving a hyper planethat maximizes the separating margin between twoclasses [48]. The feature vectors that lie on the bound-ary of separation vectors are called support vectors[48]. SVM are fantastic because they are very resilientto over fitting [27].

6.1 How SVM works

To see how SVM works, assume we are having a set oftraining examples in a pair format (xi, yi), i = 1, . . . , lwhere xi ∈ Rn and y ∈ {1,−1}l. Thus, our objectiveis to learn a classifier:

f(x) = wTφ(x) + b (11)

The classifier’s output for a new x is sign(f(x)).If the training data are linearly-separable in the fea-ture space of φ(x) (See Figure 4), the two classes of

Page 7: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

Figure 4: Optimal hyperplane in Support Vector Machine [49]

training examples are sufficiently well separated in thefeature space that one can draw a hyperplane betweenthem. We need to maximize the margin (i.e. the dis-tance from the hyperplane to the closest data pointin either class) such that we maximize the margin oferror. Many data sets might not be linearly sepa-rable. This means, there will be no solution whichcould satisfy all the constraints. One way to handlethis problem is to release some of the constraints byintroducing slack variables. Slack variables are pre-sented to permit certain constraint to be violated. Itmeans that, certain training points could be withinthe margin.

SVM maps the training vector xi into a higherdimension space using the function φ by finding lin-ear separator hyperplane with the maximum margin.ζ > 0 is a penalty coefficient for the error term. Ourobjective is to minimize the number of points withinthe margin as much as possible. In this case, theSVM [50, 51] require the solution of the following op-timization problem:

minw,b,ζ

∑N

i=1ζi +

1

2wTw

∀i yi(wTφ(xi) + b) ≥ 1− ζ

ζi ≥ 0 (12)

K(xi, yi) ≡ φ(xi)Tφ(xi) is called the kernel func-

tion. Now a day, many kernels were proposed for theSVM. Some are listed below:

• linear: K(xi, yi) = xTi xj

• polynomial: K(xi, yi) = (γxTi xj + r)d > 0

• sigmoid: K(xi, yi) = tanh(γxTi xj + r)

where γ, r, and d are kernel parameters. Slack vari-ables characteristics with various values are shown inFigure 5.

Figure 5: Optimal hyperplane with slack variables [49]

7 Feature Selection

Feature selection was defined as the process of select-ing a subset of originally defined features based on apre-defined evaluation criteria. Feature selection wassuccessfully used to enhance the process of modelingfor input output system. In many cases of modeling,various attributes are gathered during data collectionprocess although they might not be significant. Themore irrelevance data might increase the model com-plexity and increase the convergence time of the bestmodel structure. There are number of advantages forfeature selection process. They include:

• Feature selection was frequently used for modeldimension reduction.

• Feature selection helps reducing the features do-main, removes redundant features. This waywill help in speeding up a learning/modelingprocess [24,52].

Studying the relevance between the 41 featuresand the attack types was studied in [23]. The au-thor concluded that not all the 41 features are neededto classify types of attacks. They recommended thatmore studies are required based machine learning al-gorithms.

Page 8: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

Figure 6: Summary of feature selection methods [55]

In [53], a performance analysis of differentfeature selection methods in intrusion detectionwas presented. Number of feature-selection al-gorithms were compared including: SVM-wrapper,Markovblanket and Classification And Regres-sion Trees (CART) algorithms and generic-feature-selection (GeFS) method. Developed experimental re-sults using the KDD CUP’99 data set show that thegeneric-feature-selection (GeFS)method for intrusiondetection outperforms the existing approaches by re-moving more than 30% of redundant features fromthe original data set, while keeping a better classifica-tion accuracy. A summary of feature selection meth-ods [54] is presented in Figure 6.

7.1 Process of Feature Selection

Feature selection processes comprise four simple steps.Different methods for attribute search and evaluationwere analyzed in [55, 56]. A typical feature selectionmethod was presented in [53]–[55]. These four stepsare: 1) Generation procedure to generate the nextcandidate subset 2) Evaluation function to evaluatethe subset 3) Stopping criterion to decide when tostop and 4) Validation procedure to check whetherthe subset is valid.

7.2 Best First Search

Best first search strategy allows backtracking alongthe search path. It moves through the search space bymaking local changes to the current feature subset. Ifthe path being explored begins to look less promising,best first search can back-track to a more promisingprevious subset and continue the search from there.Best first search algorithm works as given in Algo-rithm 2.

Algorithm 2: Best first search algorithm [52]

1 Begin with the OPEN list containing the2 start state, the CLOSED list empty, and3 BEST ←start state.4 Let s = argmaxe(x) (get the state from5 OPEN with the highest evaluation).6 Remove s from OPEN and add to CLOSED.7 If e(s) ≥ e(BEST ), then BEST ←s.8 For each child t of s that is not in the9 OPEN or CLOSED list,

10 evaluate and add to OPEN.11 If BEST changed in the last12 set of expansions, goto 1.13 Return BEST.

7.3 Genetic Search

Genetic Algorithms (GA) are search algorithmsadopting the principle of natural selection [52] [57].Using GA robust and adaptable systems can be de-veloped [57] [58]. GA works on an individual calledchromosome. Initial population is a set of randomlycreated chromosomes. Each chromosome represents apossible solution to the problem [59] [57]. The gener-ated solutions evolve over time to produce an optimalsolution in an iterative process.

In feature selection problem, a solution usually is afixed length binary string representing a feature sub-set. Each position value in the string represents thepresence or absence of a particular feature [52]. Initialsubset is selected randomly from the all features set.Successive generations are produced using genetic op-erators called crossover and mutation applied on thecurrent selected subset. The new generated subsetmembers are evaluated using what called fitness func-tion according to defined fitness criteria. The bettersubsets have a stronger chance to be selected for anew subset formation. By this way, newer evolved

Page 9: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

subsets potentially has higher quality. Generally ge-netic search strategy works as given in Algorithm 3.

Algorithm 3: Genetic search algorithm [52].

1 Begin by randomly generating an2 initial population P .3 Calculate e(x) for each member x ∈ P .4 Define a probability distribution p over the5 members of P where p(x)αe(x).6 Select two population members x and y with7 respect to p.8 Apply crossover to x and y to produce new9 population members x and y.

10 Apply mutation to x and y.

11 Insert x and y into P (the next generation).

12 If |P | < |P |, goto 4.

13 Let P ← P .14 If there are more generations, goto 2.15 Return x ∈ P for which e(x) is highest.

8 Model Evaluation

In order to check the performance of the developedmodels, we explored set performance evaluation func-tions such as: Correctly Classified Instances (CCI),Incorrectly Classified Instances (ICI), Mean AbsoluteError (MAE), Root Mean Square Error (RMSE), andRelative Absolute Error (RAE). These performanceevaluation functions are used to measure how accu-rate the predicted intrusion types by the learned al-gorithms to the actual intrusion types. The equationswhich described are computed as follows:

CCI =TP + TN

TP + TN + FP + FN(13)

ICI =FP + FN

TP + TN + FP + FN(14)

where TP is the proportion of correctly classified in-stances as positives, TN the proportion of correctlyclassified instances as negatives, FP proportion ofnegative instances that were incorrectly classified aspositives, FN the proportion of positive instancesthat were incorrectly classified as negatives. Confu-sion matrix shown in Table 3 is used to evaluate theperformance of the classification system.

Table 3: Confusion matrix.Predicted

Positive Negative

ActualPositive TP FNNegative FP TN

MAE =1

n

n∑

i=1

|y − y| (15)

RMSE =

1

n

n∑

i=1

(y − y)2 (16)

RAE =

∑n

i=1|y − y|

∑n

i=1|y − y|

(17)

y is the actual class of connection, y is the pre-dicted class and y is the mean of the type y using n

instances.

9 Experimental Results

In our experiments, we selected randomly 6000records from NSL-KDD data. The selected set con-tains 5 types of attack and normal type, 100 recordsfor each type. Table 4 shows the type of data usedand the number of samples for each attack type.

Table 4: Experimental dataAttack type No. of records

Normal 1000ipsweep 1000neptune 1000nmap 1000smurf 1000satan 1000Sum 6000

In the following sections, we show our experimentson developed three intrusion detection models basedMLP, C4.5, and SVM classifiers. A data mining soft-ware tool Waikato Environment for Knowledge Anal-ysis (WEKA) [28] was used to develop our results.Sample data of NSL-KDD data set shown in Table4 was adopted. For all experiments we used 10-foldcross validation test mode since it reduces the varianceof estimate. A block diagram which shows the exper-imental setup for the Weka environment for the threeproposed models are shown in Figure 7, Figure 8 andFigure 9. The experimental results will be explainedin details in the following subsections.

Figure 7: Weka Setup for the C4.5 ClassificationModel

Page 10: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

Figure 8: Weka Setup for the MLP ClassificationModel

Figure 9: Weka Setup for the SVM ClassificationModel

9.1 C4.5 Model

C4.5/J48 is a very popular machine learning algo-rithm. It is a new variant of ID3 algorithm. The out-put of this classification algorithm is an understand-able tree. To get the tree small as possible informationgain during building the tree is used. Pruning also canbe used to get smaller tree. Without pruning we geta tree of 456 nodes and 400 leaves. The classifica-tion accuracy computed was 99%. Using pruning weget tree of 229 nodes size and 188 leaves and 99.05%classification accuracy. Confidence factor of 0.25 wasused. The confusion matrix developed based the C4.5model is given in Table 5.

Table 5: Confusion matrix for the C4.5 modelPred. ipsweep% neptune% nmap % normal % satan % smurf %Actualipsweep 99.20 0.00 0.30 0.30 0.20 0.00neptune 0.00 99.8 0.00 0.20 0.00 0.00nmap 0.70 0.00 99.0 0.20 0.10 0.00normal 0.60 0.10 0.60 97.40 1.30 0.00satan 0.10 0.10 0.00 0.90 98.9 0.00smurf 0.00 0.00 0.00 0.00 0.00 100.0

Average of correctly classified instances = 99.05 %

9.2 ANN Model

A MLP was used as a classification model to solve theintrusion detection problem. A setup for the devel-oped MLP model is given in Table 6. In Weka, thedefault number of neurons in the hidden layer was

used. This number is computed as the number ofinputs plus the number of of classes to be identifieddivided by 2. This number is 60, in our case.

Table 6: The Setting of ANNMaximum number of epochs 500Number of Hidden layer 1Number of neurons in hidden layer 60Learning rate 0.3Momentum 0.2

The structure of the proposed MLP was presentedin Figure 3. The confusion matrix developed basedthe MLP model is given in Table 7.

Table 7: Confusion matrix for the MLP modelPred. ipsweep% neptune% nmap % normal % satan % smurf %Actualipsweep 98.70 0.00 1.10 0.10 0.10 0.00neptune 0.00 99.90 0.00 0.10 0.00 0.00nmap 1.60 0.00 98.0 0.20 0.20 0.00normal 0.60 0.00 0.30 98.50 0.50 0.10satan 0.30 0.00 0.80 1.30 97.40 0.20smurf 0.20 0.00 0.00 0.00 0.00 99.80

Average of correctly classified instances = 98.72 %

9.3 SVM Model

In this section, we provide the results of the devel-oped SVM classification model. We explored varioustypes of kernels: Gaussian Kernel (RBF), Polynomialkernel, and sigmoid kernel. We found that SVM withRBF kernel can achieve the best accuracy rate overthe other kernels. The RBK kernel is given in Equa-tion 18. The confusion matrix developed based theSVM model is given in Table 8.

K(xi, yi) = exp(−γ||xi − xj ||2, γ > 0 (18)

Table 8: Confusion matrix for the SVM modelPred. ipsweep% neptune% nmap % normal % satan % smurf %Actualipsweep 84.30 0.00 15.00 0.70 0.00 0.00neptune 0.10 92.90 0.30 3.70 3.00 0.00nmap 25.60 0.00 73.90 0.50 0.00 0.00normal 0.50 0.00 0.10 99.30 0.00 0.10satan 0.40 3.40 0.50 4.20 91.50 0.00smurf 0.00 0.00 0.00 4.40 0.00 95.60

Average of correctly classified instances = 99.05 %

10 Best Attribute Selected

Feature selection was implemented using BFS and GSalgorithms for attribute subset selection. The Corre-lation based Feature Selection (CFS) was used to eval-uate the models developed based selected attributes.The selected features subset was then used to developa new set of models based the C4.5, MLP and SVM.

Page 11: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

The proposed flow diagram in this case is shown inFigure 10. In Tables 9 and 10, we show the best fea-tures selected based BFS and GS algorithms, respec-tively.

Table 9: BFS Selected FeaturesNo. Description Type

3 service symbolic

5 src_bytes continuous

6 dst_bytes continuous

23 count continuous

30 diff_srv_rate continuous

37 dst_host_srv_diff_host_rate continuous

38 dst_host_serror_rate continuous

Table 10: GS Selected FeaturesNo. Description Type

2 protocol_type symbolic

3 service symbolic

5 src_bytes continuous

6 dst_bytes continuous

23 count continuous

24 srv_count continuous

25 serror_rate continuous

30 diff_srv_rate continuous

36 dst_host_same_src_port_rate continuous

37 dst_host_srv_diff_host_rate continuous

10.1 Some Observations

Performance of each one of the three built models us-ing C4.5, MLP, and SVM where tested and the ob-tained results are shown in Table 11. Figure 11 showsthat C4.5 achieved the highest classification accuracycompared to other techniques.

• Results showed that C4.5 was the best methodin terms of detection accuracy and minimumtraining time. It achieved the top accuracy rateof (99.05%).

• After applying feature selection using Best Firstand Genetic Search methods; C4.5 still occupythe top accuracy percentage.

• C4.5 and MLP perform better with GeneticSearch attribute selection method. SVM is theonly algorithm which performs better with BestFirst feature selection method. We concludethat feature selection can reduce the model com-plexity by minimizing the number of attributesand model building time.

11 Conclusions

In this paper, we developed three models to solve theintrusion detection problem using decision tree based

Figure 10: Feature Selection Process Flow Diagram

Original Data BF Search Genetic Search0

10

20

30

40

50

60

70

80

90

100Correctly Classified Instances for C4.5, MLP and SVM

Figure 11: Correctly Classified Instances for C4.5,MLP and SVM

Page 12: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

Table 11: Performance evaluation for C4.5, MLP and SVM modelsALGORITHM CCI ICI MAE RMSE RAE Time Taken(s)C4.5 (J48) 99.05% 0.95% 0.0039 0.0534 1.39% 0.47C4.5+BestFirst 97.35% 2.65% 0.0122 0.0903 4.41% 0.06C4.5+Genetic Search 98.80% 1.20% 0.005 0.0573 1.80% 0.11

MLP 98.72% 1.28% 0.0061 0.0619 2.18% 485.68MLP+BestFirst 93.05% 6.95% 0.0302 0.1299 10.86% 218.2MLP+Genetic Search 94.77% 5.23% 0.0218 0.1151 7.83% 235.3

SVM 89.58% 10.42% 0.0347 0.1863 12.50% 14.66SVM+Best First 93.80% 6.20% 0.0207 0.1438 7.44% 4.08SVM+Genetic Search 86.77% 13.23% 0.0441 0.21 15.88% 5.32

C4.5 algorithm, Multi-Layer Perceptron, and SupportVector Machine. Number of attacks were classified us-ing the three methods. To enhance the performanceof the proposed models and speeding up the detec-tion process, a set of features were selected using theBest First Search and the Genetic Search methods. Acomparison between the developed models before andafter feature selection were provided. The developedmodels were capable of reducing the complexity whilekeeping acceptable detection accuracy. The decisiontree based C4.5 algorithm achieved the highest classi-fication accuracy compared to other search techniquesexplored in this work.

References

[1] R. C. Summers. Secure computing: Threats andsafe-guards. McGraw Hill, New York, 2010.

[2] Shih Yin Ooi, Yew Meng Leong, Meng Foh Lim,Hong Kuan Tiew, and Ying Han Pang. Net-work intrusion data analysis via consistency sub-set evaluator with ID3, C4.5 and best-first trees.IJCSNS, 13(2):7, 2013.

[3] M. M. Pillai, Jan H. P. Eloff, and H. S. Venter. Anapproach to implement a network intrusion de-tection system using genetic algorithms. In Pro-ceedings of the 2004 Annual Research Conferenceof the South African Institute of Computer Sci-entists and Information Technologists on IT Re-search in Developing Countries, pages 221–221,Republic of South Africa, 2004. South African In-stitute for Computer Scientists and InformationTechnologists.

[4] Al-Sakib Khan Pathan, editor. The State of theArt in Intrusion Prevention and Detection. CRCpress, 2014.

[5] Sammany Mohammed, Sharaw Marwa, El-beltagy Mohammed, and Saroit Imane. Artificialneural networks architecture for intrusion detec-tion systems and classification of attacks. In Fac-ulty of Computers and Information Cairo Uni-versity, 2007.

[6] Dilip Kumar Barman and Guruprasad Khata-niar. Design of intrusion detection system basedon artificial neural network and application ofrough set. International Journal of ComputerScience and Communication Networks, pages548–552, 2012.

[7] Singh Sahilpreet and Bansal Meenakshi. Im-provement of intrusion detection system in datamining using neural network. International Jour-nal of Advanced Research in Computer Scienceand Software Engineering, 2013.

[8] Mark Hall, Eibe Frank, Geoffrey Holmes, Bern-hard Pfahringer, Peter Reutemann, and Ian H.Witten. The weka data mining software: Anupdate. ACM SIGKDD Exploration Newsletter,11(1):10–18, November 2009.

[9] Yihua Liao. Machine Learning in Intrusion De-tection. PhD thesis, University of California atDavis, Davis, CA, USA, 2005.

[10] Chih-Fong Tsai, Yu-Feng Hsu, Chia-Ying Lin,and Wei-Yang Lin. Intrusion detection by ma-chine learning: A review. Expert Systems Appli-cations, 36(10):11994–12000, December 2009.

[11] Mrutyunjaya Panda, Ajith Abraham, SwagatamDas, and Manas Ranjan Patra. Network intru-sion detection system: A machine learning ap-proach. Int. Dec. Tech., 5(4):347–356, October2011.

[12] Sevil Sen and John A. Clark. Evolutionarycomputation techniques for intrusion detectionin mobile ad hoc networks. Comput. Netw.,55(15):3441–3457, October 2011.

[13] Srinivas Mukkamala, Dennis Xu, and Andrew H.Sung. Intrusion detection based on behavior min-ing and machine learning techniques. In Pro-ceedings of the 19th International Conference onAdvances in Applied Artificial Intelligence: In-dustrial, Engineering and Other Applications ofApplied Intelligent Systems, IEA/AIE’06, pages

Page 13: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

619–628, Berlin, Heidelberg, 2006. Springer-Verlag.

[14] Dewan Md Farid, Nouria Harbi, Emna Bahri,Mohammad Zahidur Rahman, and Chowd-hury Mofizur Rahman. Attacks classification inadaptive intrusion detection using decision tree.In International Conference on Computer Sci-ence (ICCS’10), Rio De Janeiro, Brazil, March2010.

[15] Siva S. Sivatha Sindhu, S. Geetha, and A. Kan-nan. Decision tree based light weight intrusiondetection using a wrapper approach. Expert Syst.Appl., 39(1):129–141, 2012.

[16] JingTao Yao, Songlun Zhao, and Lisa Fan. An en-hanced support vector machine model for intru-sion detection. In Proceedings of the First Inter-national Conference on Rough Sets and Knowl-edge Technology, pages 538–543. Springer-Verlag,2006.

[17] Rung-Ching Chen, Kai-Fan Cheng, Ying-HaoChen, and Chia-Fen Hsieh. Using rough setand support vector machine for network intru-sion detection system. In Intelligent Informationand Database Systems, 2009. ACIIDS 2009. FirstAsian Conference on, pages 465–470, April 2009.

[18] Snehal A. Mulay, P.R. Devale, and G.V. Garje.Intrusion detection system using support vectormachine and decision tree. International Jour-nal of Computer Applications, 3(3):40–43, 6 2010.Published By Foundation of Computer Science.

[19] Yogita B Bhavsar and Kalyani C Waghmare. In-trusion detection system using data mining tech-nique: Support vector machine. InternationalJournal of Emerging Technology and AdvancedEngineering, 3(3):581–586, 2013.

[20] KDD Cup 1999 Data. http://kdd.ics.uci.

edu/databases/kddcup99/kddcup99.html. Ac-cessed: 2015-04-24.

[21] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu,and Ali A. Ghorbani. A detailed analysis of thekdd cup 99 data set. In Proceedings of the Sec-ond IEEE International Conference on Computa-tional Intelligence for Security and Defense Ap-plications, CISDA’09, pages 53–58, Piscataway,NJ, USA, 2009. IEEE Press.

[22] Maheshkumar Sabhnani and Gursel Serpen. Whymachine learning algorithms fail in misuse detec-tion on KDD intrusion detection data set. In-telligent Data Analysis, 8(4):403–415, September2004.

[23] N. Kayacik and M. Heywood. Selecting Fea-tures for Intrusion Detection: A Feature Rele-vance Analysis on KDD 99 Intrusion DetectionDatasets. In The 3rd Annual Conference on Pri-vacy, Security and Trust (PST), 2005.

[24] Jiawei Han, Micheline Kamber, and Jian Pei.Data Mining: Concepts and Techniques. MorganKaufmann Publishers Inc., San Francisco, CA,USA, 3rd edition, 2012.

[25] L. Breiman, J. Friedman, C.J. Stone, and R.A.Olshen. Classification and Regression Trees.The Wadsworth and Brooks-Cole statistics-probability series. Taylor & Francis, 1984.

[26] Oded Maimon and Lior Rokach, editors. DataMining and Knowledge Discovery Handbook, 2nded. Springer, 2010.

[27] Ian H. Witten, Eibe Frank, and Mark A. Hall.Data Mining: Practical Machine Learning Toolsand Techniques. Morgan Kaufmann PublishersInc., 3rd edition, 2011.

[28] Mark Hall, Eibe Frank, Geoffrey Holmes, Bern-hard Pfahringer, Peter Reutemann, and Ian H.Witten. The weka data mining software: anupdate. ACM SIGKDD Exploration Newsletter,11:10–18, 2009.

[29] G. P. Zhang. Neural networks for classifica-tion: A survey. Trans. Sys. Man Cyber Part C,30(4):451–462, November 2000.

[30] Guobin Ou and Yi Lu Murphey. Multi-class pat-tern classification using neural networks. PatternRecogn., 40(1):4–18, January 2007.

[31] Rüdiger W. Brause. Medical analysis and diagno-sis by neural networks. In Proceedings of the Sec-ond International Symposium on Medical DataAnalysis, ISMDA ’01, pages 1–13, London, UK,UK, 2001. Springer-Verlag.

[32] Philippe du Jardin. Predicting bankruptcy usingneural networks and other classification meth-ods: The influence of variable selection tech-niques on model accuracy. Neurocomputing,73(10-12):2047–2060, June 2010.

[33] Dayashankar Singh, Maitreyee Dutta, and Sarv-pal H. Singh. Neural network based handwrittenhindi character recognition system. In Proceed-ings of the 2Nd Bangalore Annual Compute Con-ference, New York, NY, USA, 2009. ACM.

[34] Soni Chaturvedi, Rutika N. Titre, and NehaSondhiya. Review of handwritten pattern recog-nition of digits and special characters using feedforward neural network and izhikevich neuralmodel. In Proceedings of the 2014 International

Page 14: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

Conference on Electronic Systems, Signal Pro-cessing and Computing Technologies, pages 425–428, Washington, DC, USA, 2014. IEEE Com-puter Society.

[35] Dariusz Król and Boguslaw Szlachetko. Auto-matic image and speech recognition based onneural network. Journal of Information Technol-ogy Research (JITR), 3(2):1–17, April 2010.

[36] M. Norgaard, O. Ravn, Poulsen, and L. K.Hansen. Neural Networks for Modelling and Con-trol of Dynamic Systems. Springer, London,2000.

[37] Heba Al-Hiary, Alaa Sheta, and Aladdin Ayesh.Identification of a chemical process reactor us-ing soft computing techniques. In Proceedingsof the 2008 International Conference on FuzzySystems (FUZZ2008) within the 2008 IEEEWorld Congress on Computational Intelligence(WCCI2008), Hong Kong, 1-6 June, pages 845–653, 2008.

[38] Andrew Ng. Cs229 lecture notes, Autumn 2014.

[39] Christopher Burges. A tutorial on support vec-tor machines for pattern recognition. Data Min.Knowl. Discov., 2(2), 1998.

[40] Nello Cristianini and John Shawe-Taylor. AnIntroduction to Support Vector Machines: AndOther Kernel-based Learning Methods. Cam-bridge University Press, New York, NY, USA,2000.

[41] Christopher J. C. Burges. A tutorial on supportvector machines for pattern recognition. DataMining and Knowledge Discovery, 2(2):121–167,June 1998.

[42] Vladimir Vapnik. Estimation of DependencesBased on Empirical Data: Springer Seriesin Statistics (Springer Series in Statistics).Springer-Verlag New York, Inc., 1982.

[43] Ashis Pradhan. Support vector machines - a sur-vey. International Journal of Emerging Technol-ogy and Advanced Engineering, 2(8), 2012.

[44] Latifur Khan, Mamoun Awad, and Bhavani Thu-raisingham. A new intrusion detection systemusing support vector machines and hierarchicalclustering. The VLDB Journal, 16(4):507–521,2007.

[45] Jiaqi Jiang, Ru Li, Tianhong Zheng, Feiqin Su,and Haicheng Li. A new intrusion detection sys-tem using class and sample weighted c-supportvector machine. In Proceedings of the 2011Third International Conference on Communica-tions and Mobile Computing, CMC ’11, pages

51–54, Washington, DC, USA, 2011. IEEE Com-puter Society.

[46] P. Kola Sujatha, C. Suba Priya, and A. Kannan.Network intrusion detection system using geneticnetwork programming with support vector ma-chine. In Proceedings of the International Con-ference on Advances in Computing, Communica-tions and Informatics, ICACCI ’12, pages 645–649, New York, NY, USA, 2012. ACM.

[47] Jayshree Jha and Leena Ragha. Intrusion de-tection system using support vector machine.IJAIS Proceedings on International Conferenceand workshop on Advanced Computing 2013,ICWAC(3):25–30, June 2013. Published by Foun-dation of Computer Science, New York, USA.

[48] Wenjie Hu, Yihua Liao, and Rao Vemuri. Ro-bust anomaly detection using support vector ma-chines. In In Proceedings of the InternationalConference on Machine Learning. Morgan Kauf-mann Publishers Inc, 2003.

[49] Alaa Sheta, Sara Elsir M. Ahmed, and HossamFaris. A comparison between regression, artifi-cial neural networks and support vector machinesfor predicting stock market index. InternationalJournal of Advanced Research in Artificial Intel-ligence (IJARAI), 4(4):55–63, 2015.

[50] Bernhard E. Boser and et al. A training algo-rithm for optimal margin classifiers. In In Pro-ceedings of the 5 th Annual ACM Workshop onComputational Learning Theory, pages 144–152.ACM Press, 1992.

[51] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Mach. Learn., 20(3):273–297,September 1995.

[52] Mark Hall. Correlation-based Feature Selectionfor Machine Learning. PhD thesis, University ofWaikato, 1999.

[53] Hai Thanh Nguyen, Slobodan Petrović, and Ka-trin Franke. A comparison of feature-selectionmethods for intrusion detection. In Proceedings ofthe 5th International Conference on Mathemati-cal Methods, Models and Architectures for Com-puter Network Security, MMM-ACNS’10, pages242–255, Berlin, Heidelberg, 2010. Springer-Verlag.

[54] M. Dash and H. Liu. Feature selection for clas-sification. Intelligent Data Analysis, 1:131–156,1997.

[55] Aggarwal Megha and Amrita. Performance anal-ysis of different feature selection methods in in-trusion detection. International Journal of Sci-entific and Technology Research, 2(6), 2013.

Page 15: A Professional Comparison of C4.5, MLP, SVM for Network … · A Professional Comparison of C4.5, MLP, SVM for Network Intrusion Detection based Feature Analysis Alaa F. Sheta1, Amneh

[56] Huan Liu and Lei Yu. Toward integrating featureselection algorithms for classification and cluster-ing. IEEE Transactions on Knowledge and DataEngineering, 17:491–502, 2005.

[57] Swati Sharma, Santosh Kumar, and MandeepKaur. Recent trend in intrusion detection usingfuzzy-genetic algorithm. International Journal ofAdvanced Research in Computer and Communi-cation Engineering, 3(5), 2014.

[58] Kuldeep Kumar and Ramkala Punia. Improvingthe performance of ids using genetic algorithm.International Journal of Computer Science andCommunication, 4(2), 2013.

[59] Mohammad Sazzadul Hoque, Md Abdul Mukit,and Md. Bikas. An implementation of intrusiondetection system using genetic algorithm. Inter-national Journal of Network Security & Its Ap-plications, 4(2), 2012.

Biographies

Alaa F. Sheta is currently a Pro-fessor at the Computers and SystemsDepartment, Electronics Research In-stitute (ERI), Egypt. He received hisPhD degree from the Computer Sci-ence Department, George Mason Uni-versity, Fairfax, VA, USA in 1997. Hereceived his B.E., M.Sc. degrees in

Electronics and Communication Engineering from theFaculty of Engineering, Cairo University in 1988 and1994, respectively. His main research area is in Evolu-tionary Computation, with a focus on Genetic Algo-rithms, Genetic Programming and applications. He isalso interested in Particle Swarm Optimization, Dif-ferential Evolutions, Cuckoo Search, etc. Alaa Shetaauthored/co-authored over 100 publications in peerreviewed international journals, proceedings of the in-ternational conferences and book chapters. He is co-author of two books in the field of Landmine Detec-tion and Classification and Image Reconstruction ofa Manufacturing Process. He is the co-editor of thebook: Business Intelligence and Performance Man-agement - Theory, Systems and Industrial Applica-tions by Springer Verlag, United Kingdom, publishedin March 2013.

Amneh Alamleh is currently Amnah is a labo-ratory Assistant with Zarqa University, Jordan. Shereceived her B.Sc. and M.Sc degrees in Computer Sci-ence from Zarqa University in 2003, 2015, respectively.Amnah is M.Sc. candidate with the Computer ScienceDepartment, College of Computer Science and Infor-mation Technology, Zarqa University, Jordan. Cur-rently, Here research interests include Network Secu-rity, Artificial Neural Networks, Evolutionary Compu-

tation, Classification, Genetic Algorithms, Data Min-ing, and Software Engineering.