applied anomaly based ids
DESCRIPTION
Applied Anomaly Based IDS. Craig Buchanan University of Illinois at Urbana-Champaign CS 598 MCC 4/30/13. Outline. K-Nearest Neighbor Neural Networks Support Vector Machines Lightweight Network Intrusion Detection (LNID). K-Nearest Neighbor. - PowerPoint PPT PresentationTRANSCRIPT
Applied Anomaly Based IDSCraig BuchananUniversity of Illinois at Urbana-ChampaignCS 598 MCC4/30/13
Outline• K-Nearest Neighbor• Neural Networks• Support Vector Machines• Lightweight Network Intrusion Detection (LNID)
K-Nearest Neighbor• “Use of K-Nearest Neighbor classifier for intrusion detection”
[Liao, Computers and Security]
K-nearest neighbor on text
1. Categorize training documents into vector space model, A• Word-by-document matrix A
• Rows = words• Columns = documents• Represents weight of each word in set of documents
2. Build vector for test document, X3. Classify X into A using K-nearest neighbor
Text categorization• Create vector space model A• – weight of word i in document j
• Useful variables• N – number of documents in the collection• M – number of distinct words in the collection• – frequency of word i in document j• – total number of times word i in the collection
Text categorization• Frequency weighting
• Term frequency – inverse document frequency (tf*idf)
Text categorization• System call = “word”• Program execution = “document”
• Close, execve, open, mmap, open, mmap, munmap, mmap, mmap, close, …, exit
Document Classification• Distance measured by Euclidean distance
• – test document• – jth training document• – word shared by and • – weight of word in • – weight of word in
Anomaly detection• If X has unknown system call then abnormal• If X is the same as any Dj then normal
• K-nearest neighbor• Calculate sim_avg for k-nearest neighbors• If sim_avg > threshold then normal• Else abnormal
Results
Results
Neural Networks• Intrusion Detection with Neural Networks [Ryan, AAAI
Technical Report 1997]
• Learn user profiles (“prints”) to detect intrusion
NNID System
1. Collect training data• Audit logs from each user
2. Train the neural network3. Obtain new command distribution vector4. Compare to training data
• Anomaly if:• Associated with a different user• Not clearly associated with any user
Collect training data • Type of data• as, awk, bc, bibtex, calendar, cat, chmod, comsat, cp, cpp, cut,
cvs, date, df, diff, du, dvips, egrep, elm, emacs, …, w, wc, whereis, xbiff++, xcalc, xdvi, xhost, xterm
• Type of platform• Audit trail logging• Small number of users• Not a large target
Train Neural Network• Map frequency of command to nonlinear scale• 0.0 to 1.0 in 0.1 increments• 0.0 – never used• 0.1 – used once or twice• 1.0 – used > 500x
• Concatenate values to 100-dimensional command distribution vector
Neural Network• 3-layer backpropagation architecture
Input(x100)
Hidden(x30)
Output(x10)
Results
Results• Rejected 63% random user vectors• Anomaly detection rate 96%
• Correctly identified user 93%• False alarm rate 7%
Support Vector Machines• Intrusion Detection Using Neural Networks and Support
Vector Machines [Mukkamala, IEEE 2002]
SVM IDS
1. Preprocess randomly selected raw TCP/IP traffic2. Train SVM
• 41 input features• 1 – normal• -1 – attack
3. Classify new traffic as normal or anomaly
SVM IDS FeaturesFeature name Description Type
Duration Length of the connection Continuous
Protocol type TCP, UDP, etc. Discrete
Service HTTP, TELNET, etc. Discrete
Src_bytes Number of data bytes from source to destination
Continuous
Dst_bytes Number of data bytes to source from destination
Continuous
Flag Normal or error status Discrete
Land If connection is from/to the same host/port
Discrete
Wrong_fragment Number of “wrong” fragments
Continuous
… … …
Results
Series1
-1.5
-1
-0.5
0
0.5
1
1.5
SVM predictionActual
Recent Anomaly-based IDS• An efficient network intrusion detection [Chen, Computer
Communications 2010]
• Lightweight Network Intrusion Detection (LNID) system
LNID Approach• Detect R2L and U2R
• Assume attack is in first few packets• Calculate anomaly score of packets
LNID System Architecture
Anomaly Score• Based on Mahoney’s network IDS [21-24]• M.V. Mahoney, P.K. Chan, PHAD: packet header anomaly
detection for identifying hostile network traffic, Florida Institute of Technology Technical Report CS-2001-04, 2001.
• M.V. Mahoney, P.K. Chan, Learning nonstationary models of normal network traffic for detecting novel attacks, in: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002a, pp. 276-385.
• M.V. Mahoney, P.K. Chan, Learning models of network traffic for detecting novel attacks, Florida Institute of Technology Technical Report CS-2002-08, 2002b.
• M.V. Mahoney, Network traffic anomaly detection based on packet bytes, in: Proceedings of the 2003 ACM Symposium on Applied Computing, 2003, pp. 346-350.
Anomaly Score (Mahoney)
• = time elapsed since last time attribute was anomalous• = number of training or observed instances• = number of novel values of attribute
Anomaly Score (revised)
• = number of training or observed instances• = number of novel values of attribute
Anomaly Scoring Comparison
Attributes• Attribute = packet byte• 256 possible values• 48 attributes (packet bytes)
• 20 bytes of IP header• 20 bytes of TCP header• 8 bytes of payload
Results• Detection rate
• Workload• LNID – 0.3% of traffic• NETAD – 3.16% of traffic• Lee et. al. – 100% of traffic
Total (%) U2R (%) R2L (%) # FA/Day
LNID 73 70 77 2
NETAD 68 55 78 10
Lee et. al. 78 18 10
Results• Hard detected attacks
Attack name Description LNID PHAD DARPA
loadmodule U2R, SunOS, set IFS to call trojan suid program
1/3 0/3 1/3
ncftp R2L, FTP exploit 4/5 0/5 0/5
sechole U2R, NT bug exploit 3/3 1/3 1/3
perl U2R, Linux exploit 2/3 0/3 0/4
sqlattack U2R, excape from SQL database shell
3/3 0/3 0/3
xterm U2R, Linux buffer overflow in suid root prog.
3/3 0/3 1/3
Detection rate 16/20(80%)
1/20(5%)
3/21(14%)
Questions or Comments