1 network-level malware detection mike mcnett, matthew spear, richard barnes cs-851 – malware 23...

1

Network-level Malware Detection

Mike McNett, Matthew Spear, Richard Barnes

CS-851 – Malware

23 October 2004

2

Outline

Introduction: Design of a System for Real-Time Worm Detection

Example 1: Detecting Early Worm Propagation through Packet Matching (DEWP)

Example 2: Fast Detection of Scanning Worm Infections

Example Application: Therminator

Conclusions

3

Introduction

Questions Being Considered:

1. Why network level detection?

2. What are the alternatives?

3. Are there reasonable solutions?

4. What are the limitations, advantages, disadvantages compared to the alternatives?

4

Introduction

1. Malware Detection Options?

a) Prevention vs. Treatment

b) Signature vs. Anomaly

c) Host-based containment

d) Network containment

e) Packet Header vs. Packet Payload

2. What are the advantages, disadvantages, and limitations of the above?

5

Network-level Detection

6

Design of a System for Real-Time Worm Detection

Hash Count Vector Character Filter SRAM Analyzer Alert Generator Periodic Subtraction

of Time Averages

7

Design of a System for Real-Time Worm Detection

Scalable to high throughput Solution depends on specialized hardware Low false positive rate

What are the problems? What are the advantages? Are there other, more simplistic signatures? Can similar attacks be detected at the host

level?

8

Detecting Early Worm Propagation through Packet

Matching

Xuan Chen and John Heidemann ISI-TR-2004-585February 2004

9

DEWP

Router-based system: automatically detects and quarantines Internet

worm propagation matches destination port numbers between

incoming and outgoing connections (automated signature creation)

detects and suppresses worms due to unusual traffic patterns

detects worm propagation within about 4 seconds protects > 99% hosts from random-scanning

worms

10

DEWP Thesis

Matches destination port numbers between incoming and outgoing connections. Two observations on worm traffic:

Worms usually exploit vulnerabilities related to specific network port numbers

Infected hosts will probe other vulnerable hosts exploiting the same vulnerability

So… high levels of bi-directional probing traffic with the same destination port number new worm

Scalable: Matching destination port numbers consumes low computational power

11

DEWP

Two components of DEWP: worm detector and packet filter

Two step detection: destination port matching and destination address counting

Uses packet filtering to suppress worm spreading Can deploy at different levels of network

12

Worm Containment

DEWP uses traffic filtering – routers drop packets with the automatically discovered destination port

Worm containment: protect internal hosts from internal and external threats; notify other networks about attacks

13

Design

Maintains one port-list for each direction (incoming and outgoing): records number of connections to different destination ports

Timer for each entry in port-lists: If port has not been accessed for certain time interval, reset

corresponding list entry Monitor outgoing destination addresses of non-zero entries in both

port-lists Every T seconds, check number of unique addresses observed within

last time interval. Worm traffic detected with the following condition:

N is the number of unique addresses observed. Long-term average: is the system sensitivity to changes

14

Effectiveness of Worm Detection and Quarantine

Random scanning worm: detects worm traffic in 4.8 seconds when fully deployed with a 1 second detection interval.

Always detects worm probing traffic in 4-5 seconds when deployed to different layers.

Number of infected hosts in the protected network – primarily determined by the number of probing packets received from outside

Can protect almost all hosts from infection when only deployed on the access router.

15

Local Scanning Local scanning: Can detect worm probing traffic in 3.87 seconds. But,

almost all vulnerable hosts in the protected network are compromised

Deployment has little impact on either detection delay or infection percentage.

The infection percentage increases as number DEWP deployed layers are reduced: When only on the access router all vulnerable hosts compromised within 10 seconds

More frequent detection reduces vulnerability to local-scanning worms

DEWP quickly detects worm attacks regardless probing techniques.

With full deployment about 9% vulnerable hosts compromised in the protected network

Due to difficulty to effectively quarantine local-scanning worms a very small detection interval and wide deployment is critical to protect vulnerable hosts

16

Effect of Detection Intervals

Address-counting with an interval of T seconds. Different detection intervals affect detection delay and infection

percentage Random-scanning worm. Detection delay and the number of

infected hosts increases with detection intervals. Local-scanning worms: 1) No significant difference in detection

delay; 2) Infection percentage increases dramatically at larger intervals:

So, automatic system needs to react to worm traffic within small time intervals

17

False Detections

No false positives

Discovered ~10 suspicious destination ports including 21 (FTP), 53 (DNS), and 80 (Web)

Depends on address-counting to reduce false positives

Worm scan rate C affects false negatives: when worm scan at low rate, probing traffic has less effect on overall traffic. DEWP routers have more difficulty distinguishing them from normal traffic.

With C = 500 worm traffic stands out compared to regular traffic

DEWP is not able to detect worms with scanning rate lower than C = 25.

18

Conclusions

Detects and quarantines propagation of Internet worms

Uses port-matching and address-counting as the signature.

Detects worm attack within 4-5 seconds

By automatically blocking worm traffic, it protects most vulnerable hosts from random-scanning worms.

Authors believe that an automatic worm detection and containment system should be widely deployed and have very small detection intervals

Not realistic to deploy DEWP on all routers – for random

scanning worms – sufficient to put on access router.

19

Worm Detection

Fast Detection of Scanning Worm Infection

20

Detection Techniques

1. Reverse Sequential Hypothesis Testing (TH)

• Detects worms based upon number of failed connection attempts

• Uses probability to determine if a local host is scanning

• Designed to be tied into a containment system

2. Signature Based Analysis (Early Bird System (EBS))

• Detects worms based upon Rabin signatures of content/port

• Used in conjunction with a containment system

21

Definitions

Local Host

d Destination Address

FCC First contact connection

Yi Indicator variable of ith FCC

H0 Hypothesis that is not scanning

H1 Hypothesis that is scanning

θk Pr(Yi = 0 | Hk) k є {0,1}

η0 Upper bound to accept H1

η1 Lower bound to accept H0

22

Definitions

PD Probability of detecting an infected host

PF Probability of host as infected when it is not

α Upper bound on PF (α ≥ PF)

β Lower bound on PD (β ≤ PD)

C Credits for

)()()( 1

1

nn

n

i

n YYY

)( nY ))()(,1max( 1 nn YY

)( nY

23

Basic Algorithm Maintain separate state information for each host () being monitored

( ), the hosts that have been previously contacted, and an FCC queue (FCCQ) of first contact attempts that have been attempted but have not been recorded in the observation (PCH).

1. When a packet is observed check to see if d is in the PCH of , if not then add d PCH and add the attempt to FCCQ as PENDING.

2. When an incoming packet is sent to and the source address exists in FCCQ update the record to SUCCESS in the FCCQ unless the packet is a TCP RST.

3. When the head entry of FCCQ has status of PENDING and has been in queue for longer than a predefined time limit set its status to FAILURE.

1. If the entry at the head of FCCQ has status other than PENDING update and compare it to η1

)( nY

)( nY

24

Basic AlgorithmCredit Based Connection Rate Limiting

(CBCRL)• Simple scheme to limit the amount of connections can make in a

given slot of time by allotting each a set number of credits (C) that

is modified given events.

Event Change to C

Initial C 10

FCC issued by

C C - 1

FCC succeeds C C + 2

Every second C if C > 10

Allowance C 1 if C= 0 for 4 seconds

)3

2,10max( lC

• Used in conjunction with TH to limit number of connections a host can make allowing TH time to determine if a host is infected.

25

Experiment

isp-03 isp-04

Date APR 10 2003

JAN 28 2004

Duration 627 min 66 min

Total Outbound connection attempts

1402178 178518

Total Active Local hosts 404 451

θ0 .7 .7

θ1 .1 .1

α 5E-5 5E-5

β .99 .99

• Conducted two experiments in 2003 (isp-2003) and 2004 (isp-04).• Worms identified via comparing traffic to known worm descriptions.

26

Resultsisp-03 isp-04

Worms Detected (Total) 5 6

CodeRed II 2 0

Blaster 0 1

MyDoom 0 3

Minmail.j 0 1

HTTP (other) 3 1

False Alarms (Total) 0 6

HTTP 0 3

SMTP 0 3

P2P Detected (Total) 6 11

Total Identified 11 23Alarms Detection Efficiency Effectivene

ss

TH 34 11 .324 .917

27

Limitations, Future Work?

Are there any serious flaws in this algorithm?

Future work?

• Warhol type scanning• Network outages can cause TH to decide that a host is a worm• Worms could conceivably collaborate to defy detection• Worms could remember hosts that it can contact and defy

detection through them• Spoofing attack to get an uninfected host blocked• Interleave scanning with benign activities (i.e. for every scan

visit a website that is known to be running)

• Can trivially modify to work with the containment strategies discussed earlier

28

THERMINATOR!!!

Science comes to the aid of network-level anomaly detection

29

Network behavior is complicated

How do we use “microscopic” packet-level data to make “macro” network-level decisions? Too broad, e.g. keeping track of global traffic

patterns. Too refined, e.g. looking at individual packets.

Hmm… who else tries to make sense of the overall behavior of millions of single objects?

Physicists and Chemists!

30

Idea

Given a computer network with >1000 nodes,

Want to detect anomalous traffic, without any foreknowledge.

Idea of THERMINATOR Take advantage of lots of packet-level data. Use physical techniques to distill information

into relevant statistics: Temperature, entropy, etc.

31

Data Reduction

1. Take the set of hosts and group them into “buckets” or “conversation groups”.

2. Observe communication among buckets.3. Calculate physical statistics based on

these higher-level communications.

By virtue of the mathematics, these are guaranteed to be the same as if we’d just looked at hosts.

32

Physical Network Visualization

Based on reduced data, we know pseudo-physical statistics: Bucket size Temperature Entropy Heating rate Work rate

Visualizing these data shows network events.

Image courtesy of DISA

33

Network Event Detection

34

THERMINATOR Implementation

Jointly developed by DISA, NSA, and Lancope Inc.

Uses Lancope’s data-collection hardware to provide data to THERMINATOR.

THERMINATOR reduces data, computes stats, and provides visualization.

“Research tests validated that THERMINATOR detected anomalies that the intrusion detection systems did not capture.” -- NSA

35

Conclusion

Combined approaches (host-based, network-based, visualization)?

Can signatures be automatically generated? Can attacks be visualized? Potential impacts of false positives (is the

medicine worse than the sickness) and automated containment?

Need different solutions for local-scanning vs. non-local scanning worms?

Are there other scientific areas that malware research can leverage?

1 network-level malware detection mike mcnett, matthew spear, richard barnes cs-851 – malware 23...

Documents

worm traffic

worm detector

worm spreadingcan

destination port matching

early worm propagation

network level detection

networklevel detectiondesign

portlistsevery t seconds