[ieee 2008 ieee international conference on industrial technology - (icit) - chengdu, china...

4
Method for Anomaly Detection Based on Classifier With Time Function LIU Tao, QI Ai-ling, HOU Yuan-bin, CHANG Xin-tan Xi’an University of Science and Technology., Xi’an 710054, China Abstract In this paper, a method combining Bayesian statistical model with function of time slicing is presented, which is used for network anomaly detection. By using Bayesian statistical model with time function, the method is intended to find and determine anomaly in the computer network. Combining the advantages of Bayesian theorem when solving uncertain problems with the function whose network traffic change with time, the purpose is to establish anomaly intrusion detection model for the network activity so as to determine the occurrence of network anomaly by discovering the relationship among mass events and classifying network system behavior. It has been proved by a simulation experiment that anomaly behavior will be effectively analyzed by Bayesian statistical model with time slicing. Keywords Network security Anomaly detection Bayesian classifier Time Slicing . INTRODUCTION Along with normal traffics in the network, come various abnormal traffics, which affect the normal operation of the network and threaten the security of computer users. Therefore, the primary problem to be solved in the network security is to carry out real time monitoring and administration, find the known and unknown anomaly in the network, which is of great significance to improve the reliability and availability of the network. In recent years, a lot of work has been done in the field of network anomaly detection. Since the appearance of the first network intrusion detection system-NSM [1], there have been various methods to detect network anomaly, such as Probability Statistics [2 4], Data Mine [5], Neural Network [6], Fuzzy Mathematics [7], Artificial Immunity [8] and Support Vector Machine (SVM) [9], etc. However, the above methods have deficiencies, such as the difficult to determining the scope of parameter standard, the lack of flexibility and high rate of false alarm, etc. Furthermore, analyzing simply from the data, these methods fail to take actual network This work is partially supported by 863 Plan # environment into consideration. The method for network traffic detection is analyzes and studies the traffic information in the network for a long time to establish the scope of parameter standard for normal network behavior and determines network anomaly when there is departure between network activity and the normal baseline. The rate of false positive alarm and false negative alarm can be effectively reduced by employing different thresholds during different periods of time, and the intrusion detection system based on Bayesian method can establish good models and distinguish normal and abnormal behaviors to raise the rate of detection and reduce false alarm [10]. Therefore, this paper puts forward a new method of detecting network anomaly which combines time slicing and Bayesian statistical model. The method finds and determines network anomaly by using Bayesian statistical model with time function. . BAYESIAN STATISTICAL MODEL Bayesian method is characterized by using probability to show indeterminacy of all modalities. Studying and reasoning of other modalities are realized by the rule of probability. The application of Bayesian theorem to Data Mine mainly includes classification and regression analysis, causal reasoning, the expression of uncertain knowledge and the discovery of cluster model, etc. The classification based on Bayesian method is to find the category that makes posterior probability largest if data set is given. At present, the successful models to solve has been found, such as Näve Bayesian, Bayesian Network and Bayesian neural Network, etc. Among them, Näve Bayesian is a practical and simple classification algorithm which can deal with medium-scale and large-scale training data set. It combines prior information with sample information and uses it in statistical deduction. Bayesian equations can be used to integrate prior probability with sample information to get posterior probability which can be used as prior probability in the next circulation, and can be integrated with new sample information to get the next posterior probability. With the 978-1-4244-1706-3/08/$25.00 ©2008 IEEE.

Upload: xin-tan

Post on 21-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Method for Anomaly Detection Based on Classifier With Time Function

LIU Tao, QI Ai-ling, HOU Yuan-bin, CHANG Xin-tanXi’an University of Science and Technology., Xi’an 710054, China

Abstract In this paper, a method combining Bayesian statistical

model with function of time slicing is presented, which is used for

network anomaly detection. By using Bayesian statistical model with

time function, the method is intended to find and determine anomaly

in the computer network. Combining the advantages of Bayesian

theorem when solving uncertain problems with the function whose

network traffic change with time, the purpose is to establish anomaly

intrusion detection model for the network activity so as to determine

the occurrence of network anomaly by discovering the relationship

among mass events and classifying network system behavior. It has

been proved by a simulation experiment that anomaly behavior will

be effectively analyzed by Bayesian statistical model with time slicing.

Keywords Network security Anomaly detection Bayesian

classifier Time Slicing

. INTRODUCTION

Along with normal traffics in the network, come various abnormal traffics, which affect the normal operation of the network and threaten the security of computer users. Therefore, the primary problem to be solved in the network security is to carry out real time monitoring and administration, find the known and unknown anomaly in the network, which is of great significance to improve the reliability and availability of the network. In recent years, a lot of work has been done in the field of network anomaly detection. Since the appearance of the first network intrusion detection system-NSM [1], there have been various methods to detect network anomaly, such as Probability Statistics [2 4], Data Mine [5], Neural Network [6], Fuzzy Mathematics [7], Artificial Immunity [8] and Support Vector Machine (SVM) [9], etc. However, the above methods have deficiencies, such as the difficult to determining the scope of parameter standard, the lack of flexibility and high rate of false alarm, etc. Furthermore, analyzing simply from the data, these methods fail to take actual network

This work is partially supported by 863 Plan #

environment into consideration. The method for network traffic detection is analyzes and studies the traffic information in the network for a long time to establish the scope of parameter standard for normal network behavior and determines network anomaly when there is departure between network activity and the normal baseline.

The rate of false positive alarm and false negative alarm can be effectively reduced by employing different thresholds during different periods of time, and the intrusion detection system based on Bayesian method can establish good models and distinguish normal and abnormal behaviors to raise the rate of detection and reduce false alarm [10]. Therefore, this paper puts forward a new method of detecting network anomaly which combines time slicing and Bayesian statistical model. The method finds and determines network anomaly by using Bayesian statistical model with time function.

. BAYESIAN STATISTICAL MODEL

Bayesian method is characterized by using probability to show indeterminacy of all modalities. Studying and reasoning of other modalities are realized by the rule of probability. The application of Bayesian theorem to Data Mine mainly includes classification and regression analysis, causal reasoning, the expression of uncertain knowledge and the discovery of cluster model, etc. The classification based on Bayesian method is to find the category that makes posterior probability largest if data set is given. At present, the successful models to solve has been found, such as Näve Bayesian, Bayesian Network and Bayesian neural Network, etc. Among them, Näve Bayesian is a practical and simple classification algorithm which can deal with medium-scale and large-scale training data set. It combines prior information with sample information and uses it in statistical deduction. Bayesian equations can be used to integrate prior probability with sample information to get posterior probability which can be used as prior probability in the next circulation, and can be integrated with new sample information to get the next posterior probability. With the

978-1-4244-1706-3/08/$25.00 ©2008 IEEE.

continuation of the process, posterior information is getting closer to the true value. The whole process of study is an iterative one. The classification prediction of this algorithm is very effective since only the category with the largest probability is obtained after posterior probabilities of different events are calculated. When the classifier is at work, there is no need to calculate accurately which posterior probability a certain event belongs to. Thus, the problem of calculating posterior probability in the classifier can be largely simplified.

Usually, the occurrence of actual network anomaly is uncertain. However, the method for network anomaly detection based on Bayesian statistical model can find the relationship between network events. It can also predict and classify data and analyze network anomaly to determine the behavior of network intrusion. In practice, attribute vector (x1,x2, ... xn) can be used to show a object event X ready to be classified. xi stands for the i attribute of X, Vi shows the attribute value of event X on xi, ( )p X represents the total

probability of X, ( | )jp c X shows the probability of jc in the

condition of X, n express the number of attributes and

jc shows the j category, then, the following equation (1) is the

probability that event X belongs to jc category.

1( ) ( | )( | ) ( )

( | )( ) ( )

n

j i i jj j k

j

p c p x v cp X c p cp c X

p X p X=

== =

∏ (1)

After calculating the probabilities of all the categories that X belongs to, event X can be classified into the category with the greatest probability. In the intrusion detection system, all the states can be set into two state categories by means of Bayesian classifier as an intrusion detection device:

1c =“normality” and 2c =“anomaly”. In equation (1),

( )jp c represents prior probability of the certain network event,

so 1( )p c is the prior probability of the normality in the system,

2( )p c is the prior probability of the anomaly in the

system, ( | )jp X c is the condition probability density of

attribute vector X in the active state of the system.

. MODELING PROCESS

Data package in the network can be captured by means of bypassing interception. This method does not occupy network bandwidth and affect the performance of the network. Then, according to attribute item, the network data package captured will be broken down and classified to form data matrix, as data preprocessing for the next operation. In the data set DARPA1999, every connect record is described by 41 attributes, 10 of which are related to network traffic. Therefore, the 10 related attributes are selected to record occurrences as follows:

R C, T, SIP, Sport, DIP, DPort, P, L, ARP, FLAG

In the above record format, C represents connection count, T stands for the time when the link begins, SIP is the occurrence probability of source IP, SPort indicates the occurrence probability of source port, DIP stands for the occurrence probability of destination IP, DPort means the occurrence probability of destination port, P represents protocol type, L is the length of package, ARP stands for the probability of broadcast package, FLAG means the connection status of TCP/IP. All the attributes in recordset R are independent and irrelevant. Through the above attribute items, the system will record every connection in one recordset R. The structure of Bayesian anomaly classifier:

Fig.1. the Process of Classification of Traditional Bayesian Classifier

As is shown in equation (1), in traditional Bayesian

classifier, the values of ( )jp c and ( | )jp X c can be studied and

obtained from training set with presorts. Based on Bayesian equation (1), posterior probability value of categories can be calculated, from which the category with the largest value corresponding to discriminant function can be singled out as decisive result; namely, a given event will be classified into j

category according to Max( ( | )jp X c ). The process of

classification is shown as follows in Fig.1. By researching the relationship between time and network

traffic, it is found that network traffic is different at different times and some traffic does not occur at a particular time.

TCP/IP

connect

record

Bayesian

classifier MAX ( ) jc

Therefore, time can be sliced, and then the function of time slicing, as weigh taken into consideration in the process of classification, can be added to Bayesian classifier model, on the basis of which events can be classified so that the detection rate of classification can be raised to effectively reduce the rate of false positive alarm and false negative alarm. The classing process of the Bayesian classifier based on time slicing is shown in Fig.2 as follows:

Fig.2. the classing process of the Bayesian classifier based on time slicing

The improved Bayesian equation is as follows:

1

( ) ( | )( | ) ( )( | )

( ) ( ) ( )j

n

j i i jj j k

jC

p c p x v cp X c p cp c X

p X p X f T=

== =

∏2

Where ( )jCf T is a function of time slicing which represents the

constraint threshold of X in a certain period of time. First, studying the function of time slicing through the

above steps, the system will store the trained time slicing functions in the database. And then, probability transformation will be done by converting Bayesian equation (2). Different thresholds will be given to Bayesian classifier according to different times to transform the traditional Bayesian classifier into the one based on time slicing. In the following, an experiment will be done to verify whether the classifier can work with higher detection rate.

. SIMULATION EXPERIMENT

For the sake of comparison, 4 kinds of time slicing functions are chosen in this experiment They are manually seted according to the actual network application and the classing effect of the 4 functions are compared. Time slicing function (on the basis of one day) is shown in the following

(the broken line indicate1( )Cf T and the real line indicate

2( )Cf T ):

Fig.3. Normal Fig.4. the 1st Time Function

Fig.5. the 2nd Time Function Fig.6. the 3rd Time Function

With 10 attributes employed, the Bayesian classifier designed and improved by adding threshold of time function can be used to detect network attacks The Bayesian classifier has been trained beforehand by using data of KDD CUP 10%

dataset . In Fig.3, the time function is ( ) 1jCf T = (namely, the

normal and unchanged function); figure 4 to 6 are the functions at different times chosen according to the actual network environment. In the simulation experiment, one-day data will be collected, and then verified by using the improved Bayesian classifier. The network traffic in the simulation data is made up of the normal background traffic (that is, ’pure’ network traffics without attacks) and many abnormal attacks which include Scan attacks, Denial of Service (DoS), ARP attacks, Fragment attacks and Comprehensive attacks. The detection results of the experiment are as follows in TABLE

.TABLE

DETECTION RESULTS (%) Scan DoS ARP Fragment Comprehensive

Normal 55 50 45 60 57.3

The 1st Function 52.1 73.5 88.8 85.6 75.8

The 2nd Function 82.2 61.9 75.4 54.1 74.6

The 3rd Function 90.5 92.1 87.1 82.3 84.8

The analysis of the experiment results are as follows: (1) In the process of employing different time functions, the

average detection rates to different attacks are different, in which the average detection rate of the third time series is stablest, generally above 80%. Although the detection rate of other two time functions are higher than that of the third time

Function of

Time Slicing

TCP/IP

connect

record

Bayesian

classifier

MAX ( ) jc

slicing function on some individual items, their detection rates to other attacks are obviously lower, which shows that the rational distribution of slicing functions is critical to the success ratio of detection. The classifier works on the following sieving principle. In the third time slicing function,

( | )p c X in the normal condition is weakened while ( | )p c X in the anomaly condition is strengthened by the slicing function so that anomaly will not tend to be submerged by a great number of normal data packages. In the meantime, ( | )p c X in the anomaly condition is weakened while ( | )p c X in the normal condition is strengthened in the periods of time when the network is not busy so that the classifier will not classify some normalities into the anomaly and the ratio of false positive alarm and false negative alarm will be reduced. In terms of the data of comprehensive attacks, the detection ratio of the third time function is 10% higher than those of the other two time functions.

(2) From the results of the third time slicing function, it can be seen that the detection ratios of scanning attack and denial of service are higher, which indicates that the two attacks have a greater effect on traffic attribute. It can also be seen that the detection ratio of the fragment attack is low, which shows that this kind of attack does not affect traffic greatly. In the actual network, there exist some devices (e.g. gateway) that discard fragments or fragment recombination is being undergone to very inaccessibility raising effect of traffic from fragment attacks.

. CONCLUSION

In this paper, the method for Bayesian network anomaly detection based on time slicing is investigated, with focus on Bayesian classification method and the statistical model based on time slicing on the basis of the experimental method and simulation results. In terms of the simulation results, complicated slicing functions can reflect the actual work of the network and increase the average detection ratio. As far as detection ratio is concerned, the improved Bayesian classifier is nearly 30% higher than the traditional one, and it can also determine the sensitivity of the attribute setting in the model to attacks. With its merits of flexibility, high intelligent level and accurate judgment, this method is suitable for complicated network application environment.

REFERENCES

[1] Heberlein L, Dias GV, Levitt KN, Mukherjee B, Wood J, Wolber D. A

network security monitor [A]. In: Proc. of the IEEE Computer Society

Symp. Research in Security and Privacy. 1990. 296 304.

[2] Staniford S, Hoagland JA, McAlerney JM. Practical automated

detection of stealthy portscans[J]. Journal of Computer Security, 2002,

10 (1/2):105 136.

[3] Mahoney VM. A machine learning approach to detecting attacks by

identifying anomalies in network traffic [D]. Melbourne: Florida

Institute of Technology, 2003.

[4] Wang K, Stolfo SJ. Anomalous payload-based network intrusion

detection [A]. In: Jonsson E, Valdes A, Almgren M, eds. Proc. of the

7th Int’l Symp. On Recent Advances in Intrusion Detection (RAID

2004). LNCS 3224, Heidelberg: Springer-Verlag, 2004. 203 222.

[5] Lee W, Stolfo SJ. A framework for constructing features and models

for intrusion detection systems [J]. ACM Trans. on Information and

System Security, 2000, 3 (4):227 261.

[6] Manikopoulos C, Papavassiliou S. Network intrusion and fault

detection: A statistical anomaly approach [J]. IEEE Communications

Magazine, 2002, 40 (10):76 82.

[7] Zhang J, Gong J. An Anomaly Detection Method Based on Fuzzy

Judgment [J]. Journal of Computer Research and Development, 2003,

June 40(6):776 783.

[8] Aickelin U, Greensmith J, Twycross J. Immune system approaches to

intrusion detection—A review [A]. In: Nicosia G, et al., eds. Proc. of

the 3rd Int’l Conf. on Artificial Immune Systems. LNCS 3239,

Heidelberg: Springer-Verlag, 2004. 316 329.

[9] Xiao Y, Han ChZh, Zheng QH, Wang Q. Network Intrusion Detection

Method Based on Multi-Class Support Vector Machine [J]. Journal of

Xi'an Jiaotong University, 2005, 39 (6):562 565.

[10] Bai YH, Chen M, Wang JQ. Naive Bayes Approaches for Anomaly

Detection [J]. Computer Engineering and Applications, 2005

(34):131-132.

The author: LiuTao (1972- ), male, born in Xi’an Shaanxi, a PhD student in the field of network security and artificial intelligence at the Xi’an University of Science and Technology. Postcode: 710054 E-mail [email protected]