specification-based anomaly detection: a new approach for detecting new intrusion r. sekar, a....

Specification-based Anomaly Detection: A New Approach for

Detecting New IntrusionR. Sekar, A. Gupta, J. Frullo, T. Shanbhag, A.

Tiwari, H. Yang, and S. Zhou

Stony Brook University

20080418, by Mike Hsiao

ACM Conference on Computer and Communications Security (CCS), 2002

2

Outline

• Introduction, Related Work• Overview, Benefits

• State-Machine Language• Specification Development• Anomaly Detection

• Sequence and statistical property, Detecting Anomaly

• Experimental Results• 1999 Lincoln, Email Virus

• Conclusions and Comments

3

Introduction: IDS approaches

• Misuse Detection• detect attack as an instance of attack signature• inefficient against unknown attack

• Anomaly Detection• any deviations from normal system behavior

(profile) are flagged as potential attack• legitimate but previously unseen behavior may

exist

• Selecting appropriate signature or profile is a hard problem.

4

Introduction: specification-based anomaly

• Detect attacks as deviations from a norm• Manually develop specification that capture

legitimate (rather than pervious seen) system behaviors• Avoid legitimate-but-unseen behavior• Time-consuming but decreased false negatives

• This paper, “specification-based anomaly detection”: combination of two approaches.

5

Overview

• Develop specifications of hosts and routers in terms of packets received or transmitted by them.• derived from RFCs or other description of protocols

such as IP, ARP, TCP and UDP.

(a specification characterizing the gateway behavior)

6

Overview: example• No IP fragmentation is modeled, and only packets from the

Interne (but not those sent to the Internet) are captured.• These packets may be destined for the gateway itself, in

which case the state machine makes a transition from the INIT to DONE state.

• Otherwise, a packet may be destined for an internal machine, in which case the gateway will first receive it on its external network interface, and make a transition from the INIT to PKT RCVD state.

• Next, it will relay the packet on its internal network interface, making a transition to the DONE state.

• Occasionally, the relay may not take place. We model such situations with a timeout transition from the PKT RCVD state to the DONE state.

a) the gateway could not resolve the MAC address corresponding to the IP address of the target machine,

b) the gateway machine is malfunctioning, etc.

7

Overview: EFSA

• Extended Finite State Automata• transition events having arguments• using state variables storing values

• e.g., src, dst

• For each IP packet received on the external network interface, it create an instance of the IP state machine that is in the INIT state.

• Each of instances that can make a transition of a given packet is permitted to do so.

8

Overview: statistical machine learning

• Based on learning the statistical properties associated with the IP-sate machine, the authors can detect several kinds of attacks.• the frequency with which a particular

transition in the EFSA is taken• the most commonly encountered value of a

particular control state of the EFSA• the distribution of values a state variables

9

Overview: IP sweep• Typically, IP sweep specification can be like that with

fairly accuracy.• the number of different IP addresses for which packets were

received in the late t seconds

• In this paper,• The attacker does not know legitimate IP addresses in the

target domain.• This implies that several packets will be sent by the attacker

to nonexistent hosts which result in a sudden spurt of timeout transitions being taken.

• Thus, the statistics on the frequency of timeout transitions from the PKT RCVD state can serve as a reliable indicator of the IP sweep attack.

10

Benefits

• provides accurate attack detection• detect known and unknown attacks,• low false alarm rates

• simplifies feature selections

• employs redundancy to improve attack detection

• support unsupervised learning

11

State-machine language

• EFSA M = (∑, Q, s, f, V, D, d)• ∑: event• Q: finite set of states• s: the start state• f: the finial state

• V: a finite tuple (v1, …, vn) of state variables

• D: a finite tuple (D1, …, Dn) denote the domain of values for the state variable

• d: Q x D x ∑-> (Q, D) is the transition relation

12

State Machine

Specificationevent(x1, …,xn)|cond -> action

13

State-machine: non-deterministic

• In general, protocol state machine are non-deterministic• It can make one of k different transitions.• They clone k copies of the state machine

whenever it can make one of k different transitions.

• They delete the state machine instances that are reach their finial state.

• Finial states are some what different from “accepting states” of an FSA - they are similar to sink states.

14

State-machine: instances

• There can be many instances of a state machine at runtime.• For each incoming event, they may have to search

through all of these instances.• “Sessions”:

• map event(eventArgs) when condition• condition is a conjunction tests• left-hand side is event arguments, right-hand side is state

variables

• map rx(ifc, pkt) when (ifc == ext)• The condition can implement a hash-table lookup of state

machine instance ID.

15

Specification Development

• They only capture the essential deteails of most protocols.• developing precise specifications would entail

more effort.• there might be minor difference in implementation.

• Fig.3, A specification of the TCP state machine, as observed on a gateway connecting an organization’s internal network to the Internet.

16

Figure 3: TCP Protocol State Machine. (Certain abnormal transitions are not shown.)

17

Anomaly Detection: mapping packet sequence properties to properties of transitions

• Traces: corresponding to a path in the state machine• rx(ext, pkt)• rx(ext, ptk1) rx(int, pkt2)• rx(ext, pkt1)

• A trace has fewer properties than long packet sequences.

• A trace provide concrete clues• unexpected packets, absence of expected

packets, timeout event.

18

Anomaly Detection: Two categories of properties

• Type 1: whether a particular transition on the state machine is taken by a trace.• Example: is the timeout transition taken by a trace?

• Type 2: the value of a particular state variable or a packet field when a transition is traversed by a trace.• Example: what is the size of IP packet when the transition

from INIT to PKT RCVD state is taken?

• More complex properties that involve multiple transitions• e.g.,whether a trace traverses a particular combination of

transitions

19

Anomaly Detection: learning statistical properties

• how frequently a transition is taken (for type 1), or the encountered values of state variables on a transition (for type 2)• use distribution, rather than average• use recent traces, rather than long time in the past,

or• use traces from interested host and/or to a particular

host, or all fragmented packets, e.g.,• on all frequency timescale (0.001, 0.002, 0.5, 10, 100, 1000)

• on all frequency wrt(src) size 100 timescale (0.001, 0.002, 0.5, 10, 100, 1000)

20

Anomaly Detection: detecting anomalies

• If the statistics (in detection phase) vary substantially from what was learnt, then an anomaly is raised.• They are currently investigating ways to

precisely control what is considered “substantial difference.”

• Meanwhile, their implementation uses a simple thresholding scheme.

21

Experimental Results: Lincoln 1999

• Their experiments have focused on attacks on lower layers of protocols such as IP and TCP, due to the fact that they have so far developed state machine models of only these two protocols.

• Since their approach recognizes anomalies based on repetition, at least two packets must be involved in an attack before the attack can be expected to be detected by their approach.• They remove six attacks that only need one packets,

and some instances without complete TCP traces.

22

Experimental Results: results

• Excellent attack detection• All of the attacks within the scope of the prototype

were detected.• Their approach has no knowledge about sweeps

encoded into it.

• Low false positives• 5.5 false alarms per day

• Adequate processing capacity• excluding I/O time, 700MB data within ten minutes

on a 700MHz Pentium III with 1 GB memory.

23

Experimental Results: Attacks detected by IP machine

• ts = (0.001, 0.01, 0.1, 1, 10, 100 and 1000)• 1) on all frequency timescale ts• 2) on all frequency wrt (src) size 100 timescale ts• 3) on all frequency wrt (dst) size 100 timescale ts• 4) on all frequency wrt (src, dst) size 100 timescale ts

• IP sweep(by using statistic in 2, 1), Ping to Death(3, 2, 4), Smurf(1, 3) can be detected.

24

Experimental Results: Attacks detected by TCP machine

• 5) on all frequency timescale ts• 6) on all frequency wrt (ext_ip) size 1000

timescale ts• 7) on all frequency wrt (int_ip) size 1000

timescale ts• 8) on all frequency wrt (ext_ip, int_ip)

size 1000 timescale ts• 9) on all frequency wrt (int_ip, int_port)

size 1000 timescale ts• 10) on all frequency wrt (ext_ip, int_ip,

int_port) size 1000 timescale ts• 11) on all frequency wrt (ext_ip, ext_port,

int_ip, int_port) size 1000 timescale ts

25

Experimental Results: Attacks detected by TCP machine (cont’d)

• Portsweep (7, 8)• Quwso (abnormal transition, LISTEN to LISTEN)• Neptune (SYN Flood, 6-11)• Satan/Saint (similar to portsweep)• Mscan (similar to portsweep)• Mailbomb (sending number of email to overflow the

server mail queue, 7-11)• Apache2 (sending large number of MIME headers,

increase the frequency of packets received at ESTABLISH, 9,10)

• Back (DoS, similar to Apache2)

26

Experimental Results: Email Virus Propagation in an intranet

• ts = (10, 30, 120, 500, 2000, 8000, 25000)• 1) on all frequency timescale ts• 2) on all frequency wrt (sender) timescale ts

• 400 email clients and one sendmail server, hundreds of runs about 10 different virus.

• Their approach can detect all virus, which other defense mechanism lost 7 runs.

27

Conclusuions

• Specification-based anomaly detection.• benefit from both approach

• Simply monitoring the frequency distribution information associated with state machine transitions

• Specifications can be easily extended.

28

Comments

• Construction of FSA is too vague• IP protocol can not directly map the Figure 1. It relies on the

knowledge of gateway operation, so as Email virus propagation.

• The learning mechanism is good, but still relies on the traditional concept of anomaly detection (frequency or distribution)

• We use the deviation of protocol behavior as a basis, and construct the FSM for all the temporal status leading to the abnormality.

• We focus on the exploiting phase, rather than probing, scanning, or propagation.

• We have inference model for attack assessment.

specification-based anomaly detection: a new approach for detecting new intrusion r. sekar, a....

Documents

ip state machine

init state

ip packet

gateway machine

state variablesoverview

ipsate machine

ip sweep specification

ip sweeptypically