automated signature extraction for high volume attacks
DESCRIPTION
Automated Signature Extraction for High Volume Attacks. Yehuda Afek Anat Bremler -Barr Shir Landau Feibish. This work is part of the Kabarnit –Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. - PowerPoint PPT PresentationTRANSCRIPT
Automated Signature Extraction for High Volume Attacks
Yehuda AfekAnat Bremler-Barr Shir Landau Feibish
This work is part of the Kabarnit–Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.
2
Zombies on innocent computers
Current DDoS Attack
Server-level DDoS attacks
Infrastructure-level DDoS attacks
Bandwidth-level DDoS attacks
3
High volume attacks - Current Defense
Defense Line1
Defense Line 2
Defense Line n
Defense Line 3
Many different types of attackers:
… Call for HELP!!
Remaining attacks: Botnets (millions of computers) Hard to identify behaviorally, under the radar screen Zero-day – no known signatures
access control list filtering
behavioral analysis
SYN cookies, Challenge-response
4
Signature based DDoS Attack Detection Unknown (zero-day) attacks:
Some hope: Attack tools usually leave some unique footprint (repeating pattern) Example in packet:
Connection: KEEP-ALIVE
Today: Find signatures manually (human eye)
Our goal: Find it automatically
Signatures used by anti-DDoS devices and firewalls to stop attack Mitigation in minutes, good enough for these types of attacks
5
Signatures also used in NIDS/IPS (Snort, Bro, etc.) Worm detection (automated extraction)
Previous work: Worm behavior (address dispersion, suspicious code,
etc.) Fixed-length signatures Non-scalable Notable works:
Kephart et al ‘94 Honeycomb [Kreibich et al ’04] Earlybird [Singh et al ‘04] Autograph[Kim et al ’04] Hancock[Griffin et al ’09]
6
System Overview
Our Challenge: Automatically find signatures that appear frequently only during attack
Where:Input collection:
In mitigation box (DDoS Guard/firewall/anti-DDoS etc.) In the cloud – collect data from several collectors.
Signature ExtractionAttack time traffic
sample
Peace time traffic sample Attack signatures
e.g. Connection: KEEP-ALIVE
7
Signature Extraction - High Level
Attack time traffic sample
Peace time traffic sample
Attack signaturese.g. Connection: KEEP-ALIVE
Signature Extraction
Find frequent strings in attack time traffic
Find frequent strings in peace time traffic
Take only strings found in attack and not in peace
8
Our GoalAutomatically find signatures that appear frequently only during attack
Requirements:1. Find minimal set of signatures
Some filtering devices have limited capacity2. Allow signatures of varying lengths 3. Don’t include signatures found in legitimate traffic
Minimum false positives4. Minimize space and time usage
Large amounts of data Quick response
9
Finding Frequent Strings in Traffic Input: Sequence of packets Output: Strings that appear frequently in packets
Common Stringology solution: use suffix trees/arrays too much space
Our solution uses heavy hitters
Attack time traffic sample
Peace time traffic sample
Attack signaturese.g. Connection: KEEP-ALIVE
Find frequent strings in attack time traffic
Find frequent strings in peace time traffic
Take only strings found in attack and not in peace
10
Heavy Hitters (Frequent Items) Input: N values, integer v Output: v values each appearing at least N/v
times Approximate solution:
Uses O(v) space! One pass over input!
Known counter based HH Algorithms: Misra & Gries 1982 Lossy Counting – Monku and Motwani 2002 Space saving - Metwally et al 2005 – currently using
11
Space saving Heavy Hitters [Metwally et al 2005] Algorithm:
Maintain v values, and their counters.
counter
value
1 101 221 30
Input102230103550
12
Space saving Heavy Hitters [Metwally et al 2005] Algorithm:
Maintain v values, and their counters. If next value x is one of the v, increment its
counter.
counter
value
2 101 221 30
Input102230103550
13
Space saving Heavy Hitters [Metwally et al 2005] Algorithm:
Maintain v values, and their counters. If next value x is one of the v, increment its
counter. Else take item with minimal counter c:
Replace value with x New counter is c+1
Error rate: N/vcounter
value
2 102 351 30
Input102230103550
14
Our Solution Heavy hitters usually done on numbers… how do we
use it for text?
k-grams: strings of length exactly k
Trivial idea: For each packet: Take all k-grams (sliding window) Do Heavy hitters on them
Fixed length not good enough Either too short: cuts up longer signatures
Substring pollution - Too many heavy hitters for one signature Or too long : noisy signatures
abcabcadefgfsdghjghnfdghfgsdhfjsb1=ab
cab2 = bcabb3 = cabc
k-grams
15
Our Solution: Double Heavy Hitters Double Heavy Hitters algorithm: two separate
instances of heavy hitters Heavy Hitters 1: Find heavy hitters of k-grams Heavy Hitters 2: Find heavy hitters of varying-length
strings created during run of Heavy Hitters 1
Heavy Hitters 1
k k….
kk
kk string
k k
Heavy Hitters 2
string
string
string
string
Input to Heavy Hitters 1: k-grams
Input to Heavy Hitters 2: strings
Output is output of Heavy Hitters 2
16
Double Heavy Hitters Algorithm While processing k-grams in Heavy Hitters1 Find max run of k-grams:
Already in Heavy Hitters 1 Counters of consecutive k-grams maintain predefined
ratio Create string Insert into Heavy Hitters 2
abca
cabc
bcab
k-grams:Is already in Heavy Hitters 1?
N YYNN Y YNNN
abca
abcabcCheck
ratio
abca
cabc
bcab
abcd
bcda
cdab
dabc
abca
N
17
Double Heavy Hitters Algorithm Example:
K-gram
bi
abca b1
bcab b2
cabc b3
abca b4
bcab b5
cabc b6
Heavy Hitters 1counter
K-gram
1 abca1 bcab1 cabc
Heavy Hitters 2counter
string
0 NULL0 NULL0 NULL
K-gram
bi
abca b1
bcab b2
cabc b3
abca b4
bcab b5
cabc b6
abcd b7
abcabcabcd
Input:
18
Double Heavy Hitters Algorithm Example:
K-gram
bi
abca b1
bcab b2
cabc b3
abca b4
bcab b5
cabc b6
Heavy Hitters 1counter
K-gram
2 abca1 bcab1 cabc
String = abcaHeavy Hitters 2counter
string
0 NULL0 NULL0 NULL
K-gram
bi
abca b1
bcab b2
cabc b3
abca b4
bcab b5
cabc b6
abcd b7
abcabcabcd
Input:
19
Double Heavy Hitters Algorithm Example:
K-gram
bi
abca b1
bcab b2
cabc b3
abca b4
bcab b5
cabc b6
abcd b7
Heavy Hitters 1counter
K-gram
2 abca2 bcab1 cabc
String = abcabHeavy Hitters 2counter
string
0 NULL0 NULL0 NULL
abcabcabcd
Input:
20
Double Heavy Hitters Algorithm Example:
K-gram
bi
abca b1
bcab b2
cabc b3
abca b4
bcab b5
cabc b6
Heavy Hitters 1counter
K-gram
2 abca2 bcab2 cabc
String = abcabcHeavy Hitters 2counter
string
0 NULL0 NULL0 NULL
K-gram
bi
abca b1
bcab b2
cabc b3
abca b4
bcab b5
cabc b6
abcd b7
abcabcabcd
Input:
21
Double Heavy Hitters Algorithm Example:
K-gram
bi
abca b1
bcab b2
cabc b3
abca b4
bcab b5
cabc b6
Heavy Hitters 1counter
K-gram
3 abcd2 bcab2 cabc
String = abcabcHeavy Hitters 2counter
string
1 abcabc
0 NULL0 NULL
K-gram
bi
abca b1
bcab b2
cabc b3
abca b4
bcab b5
cabc b6
abcd b7
abcabcabcd
Input:
22
Heavy Hitters on text – improving the estimation
Problem: substrings in heavy hitters Only longest run is in input to HH2
Correct the count: After run of algorithm For all strings s in Heavy Hitters 2:
Find other strings which contain s and add their counters to s’s counter
Heavy Hitters 2counter
string
200 wonder300 woman100 wonderwoma
n
Heavy Hitters 2Real counter
counter
string
300 200 wonder400 300 woman100 100 wonderwoma
n
23
Double Heavy Hitters Algorithm Analysis Input:
Input to HH1: N k-grams Input to HH2: C consecutive grams
Error bounds: For HH1 with v items: N/v For HH2 with v items: C/v
We Prove: C ≤ N/(k + 1) Overall: Error bound of the Double Heavy Hitters
algorithm
24
Signature Extraction - High Level
Formalize with thresholds
Attack time traffic sample
Peace time traffic sample
Attack signaturese.g. Connection: keep-ALIVE
Signature Extraction
Find frequent strings in attack time traffic
Find frequent strings in peace time traffic
Take only strings found in attack and not in peace
25
Chose Signatures Create signatures that never appear in legitimate traffic
Strings in attack with frequency > Attack-High
Thresholds: Attack-highPeace-lowPeace-highDelta
26
Chose Signatures Create signatures that never appear in legitimate traffic
Strings in attack with frequency > Attack-High
Strings in peace time
Signatures
Thresholds: Attack-highPeace-lowPeace-highDelta
False positives
27
Chose Signatures Create signatures that rarely appear in legitimate traffic
Strings in attack with frequency > Attack-High
Strings in peace with frequency > Peace-Low
Thresholds: Attack-highPeace-lowPeace-highDelta
Signatures
False positives
28
Chose Signatures Create signatures that may appear in legitimate traffic, but appear in
attack traffic much more
Strings in attack with frequency > Attack-High
Thresholds: Attack-highPeace-lowPeace-highDelta
frequency > Peace-Low
Signatures only if attack frequency at least delta more than peace frequency
False positives
Signatures
frequency > Peace-high
29
Use peace traffic to create filters
Double Heavy Hitters Algorith
m
abcabcadefgfsdghjghnfdghfg......b1=abca
b2 = bcab
b3 = cabc
……
Output values
Peace time traffic packets payload: White list
Maybe white list
Not white list
Use our Double Heavy Hitters algorithm on peace time traffic:
0%
100%
50%
Peace-high
Peace-low
frequency > Peace-high
frequency > Peace-Low
frequency > Peace-high
30
Extracting Attack Signatures
Heavy
Hitters 1
Heavy
Hitters 2
hagdhdadjashdklahdjkasfjasbfjabfhfgahfvhsbdfjkasnkiaywtqyeffcgfacsdxasdbasb1=hagd
b2 = agdh
b3 = gdhd
……
string
Output values
Signatures
Attack traffic packets payload:
White list: discard if contained in whitelist string
Maybe white list:
Now use Double Heavy Hitters algorithm on attack time traffic with filters
Modified DHH
frequency > Attack-High
31
Evaluations Overall eleven tests:
Ten real attack captures 5 captures of peacetime traffic 5 synthetic peacetime captures
One Synthetic attack in real peace time traffic
Compare to human expert
32
Sample Signatures Extra newline between header fields Use of upper-case characters, where
usually lower Use of a rarely used HTTP field Use of rare user agent.
Could not be identified manually
33
Results – Accuracy of Double Heavy Hitters estimation
Graph of frequency of signatures RED – Actual count (frequency) in attack traffic BLUE – Algorithm (DHH) estimation of frequency of signatures
Perc
ent
Signatures1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
0102030405060708090
100
Algorithm (DHH) Actual Count (frequency)
34
Results - Attack Rate EstimationAt
tack
rate
Test Number
Tests with real peace time traffic
Tests with synthetic peace time traffic
1 2 3 4 5 6 7 8 90
10
20
30
40
50
60
70
80
90
100
Human Ex...
35
Results – Recall and Precision Estimation
Tests with real peace time traffic
Tests with synthetic peace time traffic
Perc
ent
Test Number
1 2 3 4 5 6 7 8 9 10 110
102030405060708090
100
Peacetime ba...
Precision: relevant packets from all identified
Recall: identified packets from all relevantAverage: 99.96Worst case: 99.8
36
Future Work Identify signatures always found in same
packets
Good synthetic peace-time traffic, global white-list
Support regular expression signatures
37