network attack analysis via - computer scienceark/fall2015/654/team/11/presentation2.pdf · network...

28
Network attack analysis via k-means clustering Dept. of Computer Science B. Thomas Golisano College of Computing and Information Sciences - By Team Cinderella Chandni Pakalapati [email protected] Priyanka Samanta [email protected] 10/27/2015

Upload: tranhanh

Post on 18-Apr-2018

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

Network attack analysis viak-means clustering

Dept. of Computer Science

B. Thomas Golisano College of Computing and Information Sciences

- By Team Cinderella

Chandni [email protected]

Priyanka [email protected]

10/27/2015

Page 2: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

CONTENTS

Analysis of Research Paper 3

Analysis of Research Paper 2

Analysis of Research Paper 1

Recap of projectoverview

10/27/2015

Page 3: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

PROJECT OVERVIEW : SUMMARY

Detection of Network Intrusion

The network is prone to the following kinds of attacks: DoS ( Denial of Services) Probing or Surveillance attacks User-to-root Remote-to-local

K- Means Clustering Algorithm – To find thehidden pattern in the dataK-Means Clustering is a NP Hard problem

To parallelize the K-Means algorithm

Our approach

Our goal

Problem Statement

NSL-KDD Cup 1999 DatasetLink:https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

Data to be used

10/27/2015

Page 4: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

Title: K-Means Clustering Approach to Analyze NSL-KDD Intrusion Detection Dataset

Author: Vipin Kumar, Himadri Chauhan, Dheeraj Panwar

Journal Name: International journal of soft computing and engineering

ISSN: 2231-2307 Volume -3, Issue -4,

Date: 09/01/2013

Page Number: 1-4

URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.413.589&rep=rep1&type=pdf

RESEARCH PAPER 1

10/27/2015

Page 5: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Detection of Network Intrusion by performing unsupervised data mining technique of clustering onunlabeled data.

• Analysis of the NSL-KDD dataset using K-Means Clustering.

• Clustering the dataset into normal and the following network attack types: DoS (Denial of Services) Probe R2L U2R

RESEARCH PAPER 1 : PROBLEM STATEMENT

10/27/2015

Page 6: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• K-Means is few of the simplest, unsupervised clustering algorithm that can cluster unlabeled data.

• Initialization of k prototypes (w1, …, wk) [ k is the number of clusters ] [1]Each cluster Ci is associated with prototype wi

Repeatfor each input data record di, where i ϵ {1,… n},

doAssign di to the cluster Cj* with nearest prototype wj

for each cluster Cj, where j ϵ {1,…, k},do

Update the prototype wj to be the centroid of all samples currently in Ci,so that wi = ∑di ϵ Ci / | Ci|

RESEARCH PAPER 1 : APPROACH TO THE PROBLEM

10/27/2015

Page 7: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Understanding the multi dimension NSL-KDD dataset.Records from the dataset contains the following 41 attributes. [1]

Class attributes specifies the class of the network traffic.It may be normal of on of the following 37 different kinds of attacks.

• Categorization of 37 different attack types into 4 attack categories. [1]

RESEARCH PAPER 1 : CONTRIBUTION

DoS Back, Land, Neptune, Pod, Smurf, teardrop, Mailbomb, Processtable, Udpstorm, Apache2, Worm

Probe Satan, Ipsweep, Nmap, Portsweep, Mscan, Saint

R2L Guess_Password, Ftp_write, Imap, Phf, Multihop, Warezmaster, Xlock, xsnoop, Snmpguess, Snmpgetattack, Httptunnel, Sendmail, Named

U2R Buffer_overflow, Loadmodule, Rootkit, Perl, Sqlattack, Xterm, Ps

duration src_bytes dst_bytes land wrong_fragment urgent

num_failed_logins logged_in num_compromised root_shell su_attempted num_root

num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count

serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate

dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate

dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate protocol_type service flag

hot num_file_creations srv_count srv_diff_host_rate dst_host_serror_rate class

10/27/2015

Page 8: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• It requires the entire dataset to be stored in the main memory.

• Involves high computational complexity. [ O(n*k*i*d) n=number of data records, k=number of clusters,i= number of iterations, d= dimensionality of the record ]

RESEARCH PAPER 1 : DRAWBACKS

10/27/2015

Page 9: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Knowing the attributes to work with.

• Understanding the types of attacks and their belongings to one of the 4 attack categories.

• Prototype to be considered is the centroid of the cluster.

• With the cluster size of 4, the optimal result can be achieved.

• Using Euclidean distance metric to compute the dissimilarity between the data records.

RESEARCH PAPER 1 : TAKEAWAY FOR OUR IMPLEMENTATION

10/27/2015

Page 10: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

Title: Parallelizing k-means Algorithm for 1-d Data Using MPI

Author: Ilias K. Savvas and Georgia N. Sofianidou

Journal Name: 2014 IEEE 23rd International WETICE Conference

Date: 2014

Page Number: 179 – 184

URL: http://ieeexplore.ieee.org.ezproxy.rit.edu/stamp/stamp.jsp?tp=&arnumber=6927046

RESEARCH PAPER 2

10/27/2015

Page 11: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Reducing the computational complexity of the K-Means by implementing parallel K-means.

RESEARCH PAPER 2 : PROBLEM STATEMENT

10/27/2015

Page 12: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• It follows the Single Instruction Multiple Data (SIMD) paradigm.

• It works on the concept of master and worker nodes.

• Communication between the master and worker nodes are performed using MPI ( Message PassingInterface ).

RESEARCH PAPER 2 : APPROACH TO THE PROBLEM

10/27/2015

Page 13: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Parallel K-Means Algorithm [2]

Master node, M finds the number of available worker nodes, assume : (N − 1)M: splits input dataset D into D/(N − 1) subsetsM: transfers data to worker nodesfor all Worker nodes wi, wi where i ∈ {1, 2, 3,...,N −1} , do in parallel

Receive the data chunk from the masterPerform K-Means clusteringSend the local centroids and the number of data records assigned to each one of them to M

end forM receives centroids and the corresponding data records assigned to each of the centroids.M sorts the centroid listM calculates global centroids by applying the weighted arithmetic mean and produce the final clustering.

RESEARCH PAPER 2 : CONTRIBUTION

10/27/2015

Page 14: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences 9/22/2015

• Parallel K-Means Algorithm [2]Master node, M finds the number of available worker nodes, assume: (N − 1)M: splits input dataset D into D/(N − 1) subsetsM: transfers data to worker nodesfor all Worker nodes wi, wi where i ∈ {1, 2, 3,...,N −1} , do in parallel

Receive the data chunk from the masterPerform K-Means clusteringSend the local centroids and the number of data records assigned to

each one of them to Mend forM receives centroids and the corresponding data records assigned to each

of the centroids.M sorts the centroid listM calculates global centroids by applying the weighted arithmetic mean and

produce the final clustering.

RESEARCH PAPER 2 : CONTRIBUTION

M

W4W3W2W1

C1 C2 C3 C4

M

Collecting local centroids C

Page 15: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

Creation of the chunks:

RESEARCH PAPER 2 : UNDERSTANDING K-MEANS WITH AN EXAMPLE

Subset1 Subset2 Subset3 Subset4

12 13 39 20

3 7 19 21

30 31 32 26

8 2 16 27

18 4 17 28

15 25 22 29

23 10 38 34

24 9 33 35

5 14 40 36

6 1 11 37

Input: Dataset with 40 data records, k=2

Centroids with the number of data records assigned to each centroid

Centroids Data records assigned

15.5 7

22.5 3

8.2 6

16.7 4

35.4 5

27 5

23.5 6

36 4 Calculating global centroid

C1 = 8.2*6 + 15.5*7 + 16.7*4 + 22.5*3 / (7+3+6+4)= 14.6

C2 = 23.5*6 + 27*5 + 35.4*5 + 36*4 / (5+5+6+4)= 29.9

Sorting the centroids

Centroids Data records assigned

8.2 6

15.5 7

16.7 4

22.5 3

23.5 6

27 5

35.4 5

36 4

10/27/2015

Page 16: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Parallel K-Means efficiency increases with anincreasing number of clusters.

RESEARCH PAPER 2 : RESULTS

• Parallel K-Means is highly scalable.

Fig 1: Increasing K and number of nodes [2] Fig 2: With K=2 increasing number of nodes [2]

10/27/2015

Page 17: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Message Passing overhead is involved in sending and receiving information across the master andworker nodes.

RESEARCH PAPER 2 : DRAWBACK

10/27/2015

Page 18: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Improved performance of K-Means clustering by divide and merge mechanism (parallelizing).

• The results obtained are same to the results generated from the original K-Means clustering.

RESEARCH PAPER 2 : TAKEAWAY FOR OUR IMPLEMENTATION

10/27/2015

Page 19: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

Title: High performance parallel k-means clustering for disk-resident datasets on multi-core CPUs

Author: Ali Hadian, Saeed Shahrivari

Journal Name: The Journal of Supercomputing

ISSN: 0920-8542

Date: 08/2014

Page Number: 845 – 863

URL: http://link.springer.com.ezproxy.rit.edu/article/10.1007%2Fs11227-014-1185-y

RESEARCH PAPER 3

10/27/2015

Page 20: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Implementation of parallel K-Means clustering utilizing the maximum capabilities of multiple cores of acomputer.

RESEARCH PAPER 3 : PROBLEM STATEMENT

10/27/2015

Page 21: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• The algorithm takes a parallel processing approach utilizing the available cores in the CPU.

• Divide the dataset into multiple chunks.

• Cluster each of the chunks.

• Aggregate the chunk clustering and make the final cluster.

• This algorithm does only single pass over the dataset.

RESEARCH PAPER 3 : APPROACH TO THE PROBLEM

10/27/2015

Page 22: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Concepts of master thread and chunk-clustering thread are integral part of the algorithm.

RESEARCH PAPER 3 : CONTRIBUTION

Input: K, chunk_size, dataset

Initialize chunks_queue [queues shared between threads]Initialize centroids[] [centroids shared between threads]while Not End_Of_Dataset do

new_chunk←load_next_chunk(data_set, chunk_size) enqueue(new_chunk, chunks_queue) if (chunks_queue is full) then

wait(chunks_queue) [wait until a consumer thread dequeues a chunk]end if

end whilewhile chunks_queue is not empty do

wait(chunks_queue) [wait for all chunks in queue to be clustered]end while C←Cluster [using K-means++ clustering]

Algorithm:Master

thread [3]

10/27/2015

Page 23: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Concepts of master thread and chunk-clustering thread are integral part of the algorithm.

RESEARCH PAPER 3 : CONTRIBUTION (contd..)

Input: chunks_queue, centroids[], K(Number of clusters for each chunk)

while queue is not empty & main thread is alive dochunk_instances←Dequeue(queue)C←Clustercentroids[]←centroids[]∪Cif (queue is empty) then

wait(queue) [wait until a consumer thread dequeues a chunk]end if

end while

Algorithm:Chunk

Clustering thread [3]

• With the clustered chunks, the master thread loads the centroid list.• With the centroid list the master thread performs k-means++ clustering to find the final cluster

10/27/2015

Page 24: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• The algorithm results in higher speedups when the data size is large enough.

RESEARCH PAPER 3 : RESULTS

Fig 3: Speedup of the algorithm [3]

10/27/2015

Page 25: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

• Utilizing multiple cores in the machine to cluster the dataset at the fastest.

RESEARCH PAPER 3 : TAKEAWAY FOR OUR IMPLEMENTATION

10/27/2015

Page 26: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences

REFERENCE

[1] K-Means Clustering Approach to Analyze NSL-KDD Intrusion Detection DatasetVipin Kumar, Himadri Chauhan, Dheeraj PanwarJournal Name: International journal of soft computing and engineeringISSN: 2231-2307 Volume -3, Issue -4,Date: 09/01/2013Page Number: 1-4URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.413.589&rep=rep1&type=pdf

[2] Parallelizing k-means Algorithm for 1-d Data Using MPIIlias K. Savvas and Georgia N. SofianidouJournal Name: IEEE 23rd International WETICE ConferenceDate: 2014Page Number: 179 – 184URL: http://ieeexplore.ieee.org.ezproxy.rit.edu/stamp/stamp.jsp?tp=&arnumber=6927046

[3] High performance parallel k-means clustering for disk-resident datasets on multi-core CPUsAli Hadian, Saeed ShahrivariJournal Name: The Journal of SupercomputingISSN: 0920-8542Date: 08/2014Page Number: 845 – 863URL: http://link.springer.com.ezproxy.rit.edu/article/10.1007%2Fs11227-014-1185-y

10/27/2015

Page 27: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences 10/27/2015

Page 28: Network attack analysis via - Computer Scienceark/fall2015/654/team/11/presentation2.pdf · Network attack analysis via k-means clustering ... M receives centroids and the corresponding

B. Thomas Golisano College of Computing and Information Sciences 10/27/2015