an analysis of the 1999 darpa/lincoln laboratory evaluation data for network anomaly detection

27
An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matthew V. Mahoney and Philip K. Chan

Upload: takoda

Post on 04-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection. Matthew V. Mahoney and Philip K. Chan. Data Mining for Computer Security Workshop at ICDM03 Melbourne, FL Nov 19, 2003. www.cs.fit.edu/~pkc/dmsec03/. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

An Analysis of the 1999 DARPA/Lincoln LaboratoryEvaluation Data for Network

Anomaly Detection

Matthew V. Mahoney and Philip K. Chan

Page 2: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Data Mining for Computer Security Workshop at ICDM03

Melbourne, FLNov 19, 2003

www.cs.fit.edu/~pkc/dmsec03/

Page 3: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Outline• DARPA/Lincoln Laboratory IDS evaluation

(IDEVAL)

• Analyze IDEVAL with respect to network anomaly detection

• Propose a remedy for identified simulation artifacts

• Measure effects on anomaly detection algorithms

Page 4: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

1999 IDEVAL

Solaris SunOS Linux NT

Router

SimulatedInternet

Inside Sniffer201 Attacks

Outside Sniffer

BSM Audit Logs, Directory and File System Dumps

Page 5: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Importance of 1999 IDEVAL• Comprehensive

– signature or anomaly– host or network

• Widely used (KDD cup, etc.)• Produced at great effort• No comparable benchmarks are available• Scientific investigation

– Reproducing results– Comparing methods

Page 6: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

1999 IDEVAL ResultsTop 4 of 18 systems at 100 false alarms

System Attacks detected/in spec

Expert 1 85/169 (50%)

Expert 2 81/173 (47%)

Dmine 41/102 (40%)

Forensics 15/27 (55%)

Page 7: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Partially Simulated Net Traffic

• tcpdump records sniffed traffic on a testbed network

• Attacks are “real”—mostly from publicly available scripts/programs

• Normal user activities are generated based on models similar to military users

Page 8: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Related Work

• IDEVAL critique (McHugh, 00) mostly based on methodology of data generation and evaluation– Did not include “low-level” analysis of

background traffic

• Anomaly detection algorithms– Network based: SPADE, ADAM, LERAD– Host based: t-stide, instance-based

Page 9: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Problem Statement

• Does IDEVAL have simulation artifacts?

• If so, can we “fix” IDEVAL?

• Do simulation artifacts affect the evaluation of anomaly detection algorithms?

Page 10: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Simulation Artifacts?

• Comparing two data sets:– IDEVAL: Week 3 – FIT: 623 hours of traffic from a university

departmental server

• Look for features with significant differences

Page 11: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

# of Unique Values & % of Traffic

Inbound client packets IDEVAL FITClient IP addresses 29 24,924

HTTP user agents 5 807

SSH client versions 1 32

TCP SYN options 1 103

TTL values 9 177

Malformed SMTP None 0.1%

TCP checksum errors None 0.02%

IP fragmentation None 0.45%

Page 12: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Growth Rate in Feature Values

Number ofvalues observed

Time

IDEVAL

FIT

Page 13: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Conditions for Simulation Artifacts

1. Are attributes easier to model in simulation (fewer values, distribution fixed over time)?

• Yes (to be shown next).

2. Do simulated attacks have idiosyncratic differences in easily modeled attributes?

• Not examined here

Page 14: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Exploiting Simulation Artifacts

• SAD – Simple Anomaly Detector

• Examines only one byte of each inbound TCP SYN packet (e.g. TTL field)

• Training: record which of 256 possible values occur at least once

• Testing: any value never seen in training signals an attack (maximum 1 alarm per minute)

Page 15: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

SAD IDEVAL Results• Train on inside sniffer week 3 (no attacks)• Test on weeks 4-5 (177 in-spec attacks)• SAD is competitive with top 1999 results

Packet Byte Examined Attacks Detected

False Alarms

IP source third byte 79/177 (45%) 43

IP source fourth byte 71 16

TTL 24 4

TCP header size 15 2

Page 16: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Suspicious Detections

• Application-level attacks detected by low-level TCP anomalies (options, window size, header size)

• Detections by anomalous TTL (126 or 253 in hostile traffic, 127 or 254 in normal traffic)

Page 17: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Proposed Mitigation

1. Mix real background traffic into IDEVAL

2. Modify IDS or data so that real traffic cannot be modeled independently of IDEVAL traffic

Page 18: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Mixing Procedure

• Collect real traffic (preferably with similar protocols and traffic rate)

• Adjust timestamps to 1999 (IDEVAL) and interleave packets chronologically

• Map IP addresses of real local hosts to additional hosts on the LAN in IDEVAL (not necessary if higher-order bytes are not used in attributes)

• Caveats:– No internal traffic between the IDEVAL hosts and the

real hosts

Page 19: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

IDS/Data Modifications

• Necessary to prevent independent modeling of IDEVAL– PHAD: no modifications needed– ALAD: remove destination IP as a conditional

attribute– LERAD: verify rules do not distinguish

IDEVAL from FIT– NETAD: remove IDEVAL telnet and FTP rules– SPADE: disguise FIT addresses as IDEVAL

Page 20: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Evaluation Procedure

• 5 network anomaly detectors on IDEVAL and mixed (IDEVAL+FIT) traffic

• Training: Week 3• Testing: Weeks 4 & 5 (177 “in-spec” attacks)• Evaluation criteria:

– Number of detections with at most 10 false alarms per day

– Percentage of “legitimate” detections (anomalies correspond to the nature of attacks)

Page 21: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Criteria for Legitimate Detection

• Anomalies correspond to the nature of attacks

• Source address anomaly: attack must be on a password protected service (POP3, IMAP, SSH, etc.)

• TCP/IP anomalies: attack on network or TCP/IP stack (not an application server)

• U2R and Data attacks: not legitimate

Page 22: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Mixed Traffic: Fewer Detections, but More are Legitimate

Detections out of 177 at 100 false alarms

0

20

40

60

80

100

120

140

PHAD ALAD LERAD NETAD SPADE

Total

Legitimate

Page 23: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Concluding Remarks

• Values of some IDEVAL attributes have small ranges and do not continue to grow continuously. Lack of “crud” in IDEVAL.

• Artifacts can be “masked/removed” by mixing in real traffic.

• Anomaly detection models from the mixed data achieved fewer detections, but a higher percentage of legitimate detections.

Page 24: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Limitations

• Traffic injection requires careful analysis and possible IDS modification to prevent independent modeling of the two sources.

• Mixed traffic becomes proprietary. Evaluations cannot be independently verified.

• Protocols have evolved since 1999.• Our results do not apply to signature detection.• Our results may not apply to the remaining

IDEVAL data (BSM, logs, file system).

Page 25: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Future Work

• One data set of real traffic from a university--analyze headers in publicly available data sets

• Analyzed features that can affect the evaluated algorithms--more features for other AD algorithms

Page 26: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Final Thoughts

• Real data– Pros: Real behavior in real environment– Cons: Can’t be released because of privacy concerns

(i.e., results can’t be reproduced or compared)

• Simulated data– Pros: Can be released as benchmarks– Cons: Simulating real behavior correctly is very

difficult

• Mixed data– A way to bridge the gap

Page 27: An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Tough Questions fromJohn & Josh?