application of machine learning and cognitive computing in intrusion detection systems

Application of Hardware-based Machine learning For Intrusion Detection using Cognitive Processors

Mahdi Hosseini MoghaddamPurdue University Calumet

Table of Content

• Introduction

• Why new IDS

• Significance of Problem

• Definitions

• Literature review

• Architecture

• Methodology

• Analysis

• Conclusion

• Future work

• Questions

• Cost

• Timeline

• References

Introduction• New technologies come to market, and with them new

vulnerabilities add to our systems.

• Nowadays lots of devices connect to the internet not only computers but also devices like TV, refrigerator, cell phones, doors and even small sensors.

• Our today’s markets are less tolerant to down time due to security issues or attacks.

• Attacks likes Denial of Service can cause a big problem by make the service unavailable and increase the down time

Why New IDS• Intrusion detection systems use two approaches in order to detect

the malicious traffic :

• signature based which rely on the previously created list of known attacks

• Anomaly detection

• Signature-based approach can not detect Novell attack and zero-day attack.

• Anomaly detection uses machine learning algorithm, however most of them are resource intensive.

• Performance and response time is crucial, fast detection is a MUST

Significance of Problem• Signature based intrusion detection

systems need to check the traffics with thousands or even millions of pattern gathered from previously executed attacks

• novel attacks or previous attack with even a minor changes are almost impossible to detect in run time

• In order to add the signature of an attack to the base system the attack first needs to detect and analyze and then its pattern should be created

Definitions• Machine learning: that we refer to this as ML, is a system that can

learn from data

• Embedded System: is a sort of computer system often with real-time computing constraints.

• Cognitive Processor: it uses the idea of neural network to build a processing unit works like Human Brain. As the Brain it’s consist of small unit called neuron. Neurons in this computational model have its own memory and logic for operating on that memory.

• IDS: intrusion detection system

• RCE: Restricted Coulomb Energy is a Hyperspherical classifiers.

• KNN: K-Nearest Neighbor is a non-parametric method for classification and regression

Definitions – KNN • An object is classified by a

majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors

• The neighbors are taken from a set of objects for which the class (for k-NN classification) is known.

• If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

Definitions – RCE• The architecture of the RCE

network contains two layers: A hidden layer and an output layer.

• The hidden layer is fully interconnected to all components of an input pattern

• The output layer is sparsely connected to the hidden layer; each hidden unit projects its output to one and only one output unit.

Literature Review• A signature based IDS watches for network packets then compares that

traffic to a database of known attacks, called signatures. However, there will be a time gap between the attack and the time the system can detect that attack (Barman 2012).

• In 2010, Stuxnet, a computer worm, affected nuclear facility in a country. It was designed to harm PLC system (Falliere, 2011).

• Baker and Prasanna in 2004, proposed a methodology for building an efficient IDS using FPGA. They showed that this methodology results in 8 times faster computing time in comparison with shift-and-compare architecture. Although they reached high throughput, the amount of false-positive errors was increased.

• In 2013 Yoon et al, suggested a Multicore-based IDS. Shared resources in processors create a lot of problem and also add a lot complexity to development of system using those processors. They tried to detect malicious behavior using statistical analysis.

Architecture• Data Collector :

Raspberry PI Board

• Interface Board: Arduino Due

• Cognitive Processor : CM1K – Cognimem

Architecture (2) – CM1K• It features 1024 neurons working in parallel implementing two non-

linear classifiers.

• Learn and recognize patterns up to 256 bytes ( 1 Byte for each)

• Classify patterns up to 32,768 categories

• Choice of Restricted Coulomb Energy (RCE) or K-Nearest Neighbor (KNN) classifiers

• Low cost, small footprint, low power consumption (0.5w)

• Recognition time independent of the number of neurons

Methodology – Data Collection• A small packet sniffer has been

developed. The sniffer is based on libpcap library.

• The developed packet sniffer is installed on an embedded device which is a Raspberry PI.

• The sniffer is based on libpcap library. Once it reads the packet header, it stores it into CSV format.

Author

add slide for method (RCE, KNN)

Methodology – Data Collection (2)• In order to have required

samples a small isolated LAN has been set up.

• Normal packets like ping trace route and other TCP stream have been generated in this network.

• Anomaly Packets were gathered by running some network attack using Netwox toolset.

• The dataset has 10 features

Methodology – Data Collection (3)Features• src_ip

• dst_ip

• Tos

• Len

• Id

• off

• ttl

• prt

• src_p

• dst_p

Methodology - Data Normalization• There is only 1 byte available for each feature. 1 byte cannot store

numbers higher than 255.

• CM1K chip only accepts integer values so the values were rounded.

• Collected data should be normalized to fit in this range. This was achieved by using this formula:

Methodology - Classification and Training• Another column for class was added to dataset. For the normal

data, the class is ‘1’ and for data gathered from anomaly traffic the class is ‘2’.

• 10 pairs of Test/Train file were prepared. Each file contained 512 samples for normal traffic and 512 for anomaly traffic.

• The data must sent form the Arduino board to the CM1K.

• After the CM1K was trained The Arduino board loaded the test file into chip.

• The chip sends back the distance between the test samples and the trained model starting from shortest distance.

Methodology - Classification and Training Using CM1K• The algorithm can be

chosen before training part. RCE and KNN can be selected by changing a data register on the Arduino board.

Methodology - Classification and Training Using Software SDK• This SDK simulate the hardware algorithms and provide some

report and testing functionality.

Methodology - Classification and Training Using NSL-KDD Dataset• The KDD Cup '99 dataset was created by processing the tcpdump

portions of the 1998 DARPA Intrusion Detection System (IDS) Evaluation dataset

• NSL-KDD suggested in order solving some problem of KDD’99 dataset.

• NSL-KDD dataset has 41 features and provided thousands of data sample for both training and testing.

• By using the same method used before, the CM1K was trained and then tested with both KNN and RCE algorithm.

• From test and train samples 10 pairs of completely identical data were created. Each sample file has 1024 samples.

Author

Add slide about NSL features

Methodology - Classification and Training Using NSL-KDD Dataset (2)

• protocol_type

• service

• flag

• src_bytes

• dst_bytes

• wrong_fragment

• num_failed_logins

• num_shells

• srv_count

• rerror_rate

• dst_host_count

• dst_host_same_srv_rate

• dst_host_diff_srv_rate

• su_attempted

NSL-KDD Dataset features:

Analysis- CM1K Result• Simply by comparing the actual class with the determined class it

is possible to calculate accuracy

Sample # RCE RCE_N RCE_TIME KNN KNN_N KNN_TIME

1 76.76% 3 110249 71.68% 1024 110163

2 82.03% 3 110112 80.66% 1024 110261

3 83.59% 3 110207 85.35% 1024 110453

4 63.78% 5 110003 30.37% 1024 110446

5 85.44% 4 110075 87.21% 1024 110463

6 77.54% 3 110136 87.40% 1024 110240

7 61.82% 3 111890 87.79% 1024 110335

8 58.89% 3 111331 77.25% 1024 110322

9 66.31% 3 110177 32.91% 1024 110486

10 76.17% 3 110259 69.14% 1024 110256

Analysis- CM1K Result (2)• Although the accuracy for both RCE and KNN are somehow close

but RCE showed less diversity and hence more consistency in the accuracy.

RCE KNN

Average 73.23% 70.98%

Variance 0.00940922 0.04730602

Standard Deviation 0.09700113 0.21749947

Analysis- Software Result• The result for RCE algorithm obtained from hardware and

software gathered in below table. As it is shown below from the accuracy points of view both are same however surprisingly software solution was much faster that hardware ones.

Sample # Hardware SoftwareAccuracy # of Neurons TIME Accuracy # of Neurons TIME

1 76.76% 3 110249 76.76% 3 12302 82.03% 3 110112 82.03% 3 9103 83.59% 3 110207 83.59% 3 8904 63.78% 5 110003 64.06% 5 11705 85.44% 4 110075 85.45% 4 8806 77.54% 3 110136 77.54% 3 7807 61.82% 3 111890 61.82% 3 8008 58.89% 3 111331 58.89% 3 9009 66.31% 3 110177 66.31% 3 820

10 76.17% 3 110259 76.17% 3 880

Analysis- CM1K ResultNSL-KDD Dataset• Because the same amount of data was used, the result is in same

structure with the dataset created as part of this project. Sample # RCE RCE_N RCE_TIME KNN KNN_N KNN_TIME

1 79.39% 2 123728 87.01% 1024 123195

2 58.40% 3 123522 59.67% 1024 123500

3 79.59% 2 123853 87.40% 1024 123146

4 50.88% 7 123188 84.86% 1024 123430

5 57.91% 2 123662 86.72% 1024 123505

6 80.57% 2 123824 84.47% 1024 123338

7 79.88% 5 123362 88.77% 1024 123691

8 80.66% 3 123611 81.05% 1024 123448

9 58.30% 2 123678 83.40% 1024 123426

10 78.81% 2 123974 87.30% 1024 123286

Conclusion

• CM1K provides parallelism with low cost and energy consumption

• CM1K provides classification algorithm in hardware level

• Although KNN showed more accuracy but RCE used less Neuron.

• Having good data is a big challenge

• This project can be used for any classification problem

• is not a good communication bus as it creates bottleneck

Future Work• Having more features regarding network packets

• Using a chain of chips

• Using USB instead of

• Developing alarming method

• Create a general classifier

Questions

Cost

Cost for required equipment

Item Price

Arduino Due 40 $

Raspberry PI Model B 40$

Cognimem CM1K Chip 150$

Bread Board 20$

Memory SD 8 GB 12 $

Wire & resistor & oscillator 5 $

AC Adapter 5.0 V Out 20 $

USB Cable – A Male to B Male 7 $

Soldering Kit 90$

Time Line

120 Days dedicated for project accomplishment

Developing Packet Sniffer

Get the components

Design of the system

Installing Packet Sniffer on Raspberry PI

Soldering complete and approved by advisor

Gathering Sample from Network

Developing Classifier Code On Arduion

Training the Chip

Testing the IDS with random Data

Post testing modification

0 20 40 60 80 100 120 140

Timeline

Start Days Completed

Author

move after question

References

• Cheng (2006). On-Time and Scalable Intrusion Detection in Embedded Systems. Albert Mo Kim Cheng, Real-Time Systems Laboratory Department of Computer Science University of Houston.

• Axelsson (1999). Research in intrusion-detection systems: A survey. TR 98-17, Department of Computer Engineering, Chalmers University of Technology, G ¨ oteborg, Sweden, December 1998. Revised August 19, 1999.

• Kerschbaum (2001) Florian Kerschbaum, Eugene H. Spafford, Diego Zamboni. Using internal sensors and embedded detectors for intrusion detection. Center for Education and Research in Information Assurance and Security 1315 Recitation Building Purdue University.

• Tavallaee (2009) Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. A Detailed Analysis of the KDD CUP 99 Data Set.

• Hripcsak, G., & Rothschild, A. (2005). Agreement, the F-Measure, and the Reliability in Information Retrieval. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090460/pdf/296.pdf

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090460/pdf/296.pdf