testing intrusion detection systems: a critic for the 1998 and 1999 darpa intrusion detection system...

30
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By John Mchugh Presented by Hongyu Gao Feb. 5, 2009

Post on 22-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory

By John Mchugh

Presented by Hongyu Gao

Feb. 5, 2009

Page 2: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Outline

Lincoln Lab’s evaluation in 1998 Critic on data generation Critic on taxonomy Critic on evaluation process Brief discussion on 1999 evaluation Conclusion

Page 3: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

The 1998 evaluation

The most comprehensive evaluation of research on intrusion detection systems that has been performed to date

Page 4: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

The 1998 evaluation cont’d

Objective: “To provide unbiased measurement of current

performance levels.” “To provide a common shared corpus of

experimental data that is available to a wide range of researchers”

Page 5: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

The 1998 evaluation, cont’d

Simulated a typical air force base network

Page 6: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

The 1998 evaluation, cont’d

Collected synthetic traffic data

Page 7: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

The 1998 evaluation cont’d

Researchers tested their system using the traffic

Receiver Operating Curve (ROC) was used to present the result

Page 8: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

1. Critic on data generation

Both background (normal) and attack data are synthesized.

Said to represent traffic to and from a typical air force base.

It is required that such synthesized data should reflect system performance in realistic scenarios.

Page 9: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on background data

Counter point 1 Real traffic is not well-behaved. E.g. spontaneous packet storms that are

indistinguishable from malicious attempts at flooding.

Not considered in background traffic

Page 10: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on background data, cont’d

Counter point 2 Low average data rate

Page 11: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on background data, cont’d

Possible negative consequences System may produce larger amount of FP in

realistic scenario. System may drop packets in realistic scenario

Page 12: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on attack data

The distribution of attack is not realisitic The number of attacks, which are U2R, R2L,

DoS, Probing, is of the same order

U2R R2L DoS Probing

114 34 99 64

Page 13: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on attack data, cont’d

Possible negative consequences The aggregate detection rate does not reflect

the detection rate in real traffic

Page 14: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on simulated AFB network

Not likely to be realistic 4 real machines 3 fixed attack target Flat architecture

Possible negative consequence IDS can be tuned to only look at traffic targeting

to certain hosts Preclude the execution of “smurf” or ICMP echo

attack

Page 15: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

2. Critic on taxonomy

Based on the attacker’s point of view Denial of service Remote to user User to root probing

Not useful describing what an IDS might see

Page 16: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on taxonomy, cont’d

Alternative taxonomy Classify by protocol layer Classify by whether a completed protocol

handshake is necessary Classify by severity of attack Many others…

Page 17: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

3. Critic on evaluation

The unit of evaluation Session is used Some traffic (e.g. message originating with

Ethernet hubs) are not in any session Is “session” an appropriate unit?

Page 18: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

3. Critic on evaluation

Scoring and ROC Denominator?

Page 19: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on evaluation, cont’d

An non-standard variation of ROC --Substitue x-axis with false alarms per day

Possible problem The number of false alarms per unit time may

increase significantly with data rate increasing Suggested alternative

The total number of alert (both TP and FP) Use the standard ROC

Page 20: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Evaluation on Snort

Page 21: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Evaluation on Snort, cont’d

Poor performance on Dos and Probe Good performance on R2L and U2R Conclusion on Snort:

Not sufficient to get any conclusion

Page 22: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on evaluation, cont’d

False alarm rate A crucial concern The designated maximum value (0.1%) is

inconsistent with the maximum operator load set by Lincoln lab (100/day)

Page 23: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Critic on evaluation, cont’d

Does the evaluation result really mean something? ROC curve reflects the ability to detect attack

against normal traffic What does a good IDS consist of?

Algorithm Reliability Good signatures …

Page 24: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Brief discussion on 1999 evaluation

Have some superficial improvements Additional hosts and host types are added New attacks are added

None of these addresses the flaws listed above

Page 25: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Brief discussion on 1999 evaluation, cont’d

Security policy is not clear What is an attack, what is not? Scan, probe

Page 26: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Conclusion

The Lincoln lab evaluation is a major and impressive effort.

This paper criticizes the evaluation from different aspects.

Page 27: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Follow-up Work DETER - Testbed for network security technology.

Public facility for medium-scale repeatable experiments in computer security

Located at USC ISI and UC Berkeley. 300 PC systems running Utah's Emulab software. Experimenter can access DETER remotely to develop,

configure, and manipulate collections of nodes and links with arbitrary network topologies.

Problem with this is currently that there isn't realistic attack module or background noise generator plugin for the framework. Attack distribution is a problem.

PREDICT - Its a huge trace repository. It is not public and there are several legal issues in working with it.

Page 28: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Follow-up Work KDD Cup - Its goal is to provide data-sets from

real world problems to demonstrate the applicability of dierent knowledge discovery and machine learning techniques. The 1999 KDD intrusion detection contest uses a

labelled version of this 1998 DARPA dataset, Annotated with connection features. There are several problems with KDD Cup.

Recently, people have found average TCP packet sizes as best correlation metrics for attacks, which is clearly points out the inefficacy.

Page 29: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

Discussion

Can the aforementioned problems be addressed? Dataset Taxonomy Unit for analysis Approach to compare between IDSes …

Page 30: Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By

The End

Thank you