1 selecting features for intrusion detection: a feature relevance analysis on kdd 99 benchmark h....

12
1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

Upload: douglas-carroll

Post on 20-Jan-2016

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

1

Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99

Benchmark

H. Güneş KayacıkNur Zincir-Heywood Malcolm I. Heywood

Page 2: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

2

Motivation

• Machine learning in detection.• Raw data High level events• Need a set of features• Not “any” feature, “good” features• How do we quantify “good”?

Page 3: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

3

The Data

• DARPA 98 and 99 datasets.

• Simulated activity.

• Network traffic connection records

• 41 feature per connection.

107201

97277

280790DoS1

DoS2

Normal

Page 4: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

4

The Data

• 494,000 connections in dataset.• 23 Class Labels

22 Attacks (DoS, probe, content based) “Normal”

• 41 Features (few examples) Duration Service Protocol Data transfer

Failed login attempts FTP commands Root shells “Su” attempts

Page 5: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

5

Previous IDS Work

• Decision trees, neural nets, clustering, SVM, EC

• High detection (98%) Low FP (0.5%)• Some attacks are detected better

than others.• Our task: Substantiate the

performance of detectors.

Page 6: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

6

Information Gain

• Used in decision trees.• Which feature leads to the purest

branching?Gain (“Temperature”) = 0.571

Gain (“Windy”) = 0.02Gain (“Humidity”) = 0.971

From Data Mining Course at KDNuggets site [http://www.kdnuggets.com/dmcourse/data_mining_course]

Page 7: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

7

Methodology

• Classes: 22 Attacks + 1 Normal

• Binary classification(Why?)

• 23 Info. Gains per feature(vs. 1 Info Gain per feature)

1, 0.5, 90, 8 Class A

3, 0.01, 7, 9 Class B

2, 0.1,, 7, 10 Class A

5, 0.2, 10, 1 Class C

1

0

1

0

For Class A:

Page 8: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

8

Max. Information Gain

• Some relevant some not

• Features 20 and 21

Page 9: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

9

For each class…

Info. Gain

0

0.2

0.4

0.6

0.8

1

back

buffer

_ove

rflow

ftp_w

rite

gues

s_pa

sswd

imap

ipsw

eep

land

load

mod

ule

mul

tihop

nept

une

nmap

norm

alpe

rlph

fpo

d

ports

weep

root

kit

sata

n

smur

fsp

y

tear

drop

warez

clie

nt

warez

mas

ter

• Neptune (DoS) + smurf (DoS) + normal = 98%

Page 10: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

10

Relevant Classes

11

10

10

1

1

12

1 1 1

normal smurf

neptune land

teardrop ftp_write

back guess_pwd

buffer_overflow warezclient

• 31/41 most relevant for 3 major classes.

• 9 features contributed very little.

• Relevant Features Connection Size Diff. Service Rate Connection state

Page 11: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

11

Conclusions

• Relevance analysis on KDD 99 dataset.• Relevance Information gain.• Key Points

Easy to classify 3 major classes. Few features highly useful. Few features completely useless.

• New measures and extended analysis.

Page 12: 1 Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark H. Güneş Kayacık Nur Zincir-Heywood Malcolm I. Heywood

12

Thank You!

• You can find more information about our research at: www.cs.dal.ca/projectx.