polytechnic university,ece department1 detection of “hot spots” paper title : joint data...

27
Polytechnic University,ECE Department 1 Detection of “Hot Spots” Paper Title: Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations Liang,Chao

Post on 19-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Polytechnic University,ECE Department 1

Detection of “Hot Spots”

Paper Title: Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations

Liang,Chao

Polytechnic University,ECE Department 2

MotivationMotivation

“Hot spots” in the Internet – Super Source (large fan-out)

• Infected hosts by worm (Slammer worm)

– Super Destination (large fan-in)• DDoS victim

Internet attacks increasing in severity– Network security monitoring

Challenges • High packets arrival rate• Speed requirement of RAM (DRAM vs SRAM)• Impractical per-flow state maintenance

Polytechnic University,ECE Department 3

How to find the How to find the needleneedle in the in the haystackhaystack

IP Flow– Abstraction: set of packets identified with same

address, ports, etc. – Flow label: Source-destination pair <pkt.src,pkt.dst>

General Problem: Heavy distinct-hitters– Given a stream of flow label <pkt.src,pkt.dst> pairs,

find all the src that are paired with a large number of distinct destination.

– Detect super destination: Reverse the flow label

flow 1 flow 2 flow 3

Polytechnic University,ECE Department 4

WeaponsWeapons

Previous Techniques– Flow state maintenance– Probabilistic counting– Bloom Filters– Multi-resolute bitmap– ……

This paper Sampling Network Data streaming

Polytechnic University,ECE Department 5

PaperPaper Qi Zhao, Abhishek Kumar, Jun Xu, “Joint Data

Streaming and Sampling Techniques for Detection of Super Sources and Destinations”, IMC 2005

Polytechnic University,ECE Department 6

Outline of the rest of the talkOutline of the rest of the talk

Introduction of one previous work– Traditional hash-based flow sampling

Main approach– Simple scheme– Advanced scheme

Evaluation

Summary

Polytechnic University,ECE Department 7

Traditional hash-based flow samplingTraditional hash-based flow sampling

Flow sampling– Sample flows with a certain percentage p

• Hash function maps flow label to a value uniformly distributed in [0,1)

• H (flow label)<p, then sample the flow

Hash Table– HT1.Detect and discard duplicate ones

• Access the element with index by hashing flow label• Element: list of flow label pairs

– HT2.Count flow numbers• Access the element with index by hashing srcIP• Element: list of <srcIP,count> pairs

Polytechnic University,ECE Department 8

Traditional hash-based flow samplingTraditional hash-based flow sampling

Fan-out Calculation– Threshold Judge to report the super source – Estimation to compensate sampling

Ē=E*(1/p)

Performance Analysis– Key Ineffective Reason - Low sampling rate

• The update cost of hash tables (In DRAM)• Elephant flows influence

– Performance bottleneck: Query of the first hash table

– Result• The sampling rate p<< Hs / Tr

– Hs: operating speed of hash table– Tr: arrival rate of traffic

• Estimation error scale by 1/p

p is too slow!

Polytechnic University,ECE Department 9

Contribution of this paperContribution of this paper Network Data Streaming

– Process each and every incoming packet in real-time– Employ a small and fast memory– Maintain only the most pertinent information

Two schemes– Simple scheme : filtering after sampling– Advanced scheme : separation of counting and

identity gathering

Include more information

Polytechnic University,ECE Department 10

Simple Scheme SystemSimple Scheme System Filtering after sampling System

Data Streaming module – Replace the hash table– Final goal: improve the sampling rate

Polytechnic University,ECE Department 11

Simple Scheme – Data Streaming Simple Scheme – Data Streaming ModuleModule How to realize

– Employing bit array to label new flow• Bit array G: w bits• Hash function: maps to a value uniformly distributed in

[1,w]

– Employ SRAM (static random access memory)

packet H( )

0 1 2 i0

w-11

flow label

Polytechnic University,ECE Department 12

Simple Scheme - EstimationSimple Scheme - Estimation

Hash collision in data streaming – Different flows have same index of G– Miss the update of the hash table

Compensation of the collision– when the ith new flow arrival

• Variable u: to keep track of the number of “0” in G• Variable i : hash result of the new flow • P(G[i]=0) = u/w

– Compensate the hash collisions by adding w/u

Unbiased Estimation of count – Hash table updated by K flows

Polytechnic University,ECE Department 13

Simple Scheme - AlgorithmSimple Scheme - Algorithm

Compensation Calculation

Polytechnic University,ECE Department 14

Simple Scheme - AnalysisSimple Scheme - Analysis

Unbiased estimator of fan-out

Saturation Avoidance Number of ‘0’ element Probability to be recorded

– Minimum of ‘0’ element typically set around w/2 (half full)

– Two sets of arrays and hash tables operated alternatively

Sampling rate improved– Affordable SRAM

• Little memory consumption to support high speed links

– Streaming speed• Poisson alike update times of the hash table• Efficient hardware implementation of hash function• All operations in data streaming module can be finished in about

10ns

Bottleneck!

Polytechnic University,ECE Department 15

Advanced Scheme - SystemAdvanced Scheme - System

Record source identity

(e.g.. source IP)

Record flow information to array

in real-time

Use the source identity(2) to look up the array(1) to

estimate offline

Polytechnic University,ECE Department 16

Advanced Scheme – Streaming algorithmAdvanced Scheme – Streaming algorithm

2D bit array A(m,n) Four hash functions

– One to get row number (range [1,m])– Three to get column number (range [1,n])

this case k=3

Polytechnic University,ECE Department 17

Advanced Scheme – Streaming moduleAdvanced Scheme – Streaming module

Row collision

Column collisionWhy k=3?

Polytechnic University,ECE Department 18

The Linear-Time probabilistic counting The Linear-Time probabilistic counting algorithmalgorithm

Idea from Database field: counting the number of unique values in the presence of duplicates

Estimation of distinct flow number– m : column size– n : total number of flow– Aj : the jth element of column– Un: the number of element whose value is “0”

j

Polytechnic University,ECE Department 19

Joint relation calculationJoint relation calculation

The distinct values in the join of two relations– AB=A+B-AUB– A->G1 B->G2

Estimate them by linear counting D based on G– AB=D(G1)+D(G2)-

D(G1UG2)Note: Cannot directly

calculate G1G2 cause different space

AпB G1пG2

AUB G1UG2

Polytechnic University,ECE Department 20

Advanced Scheme – Estimation moduleAdvanced Scheme – Estimation module

Computing the join selectivity in three columns(k=3)– U: Bitwise-OR

Avoid two sources both hashed to the same k columns– S: total number distinct sources– n: column number

– The probability of collision drop to 0.002 – When n=16,000, S=100,000, k=3

Polytechnic University,ECE Department 21

AdvancedAdvanced SchemeScheme – Identity– Identity modulemodule

Purpose – Capture the identities of potential super sources– Write data into DRAM in real-time

Identity collection– Estimate the corresponding fan-out as input

data

Why DRAM?– Replace expensive hash table operation– Sequential writes can be very fast

• 100% and 25% recording for OC-192 and OC-768

Polytechnic University,ECE Department 22

EvaluationEvaluation

Real internet traffic traces– UNC(1 Gbps),USC,NLANR(IPKS+,IPKS-)(OC192

link)

Polytechnic University,ECE Department 23

Evaluation-Simple SchemeEvaluation-Simple Scheme

[UNC] Sampling rate:1/4 Bit array size:128Kb– Area1:false positives Area II: false negative

Polytechnic University,ECE Department 24

Evaluation-Advanced SchemeEvaluation-Advanced Scheme

[UNC]2D Bit array A: 128KB(64*16,384) sampling rate:1

Polytechnic University,ECE Department 25

Estimation AccuracyEstimation Accuracy

Polytechnic University,ECE Department 26

SummarySummary

Monitoring at high speed is challenging

Network Data Streaming– Keep up with the line speed– Include more pertinent information

Employ other fields achievements

Polytechnic University,ECE Department 27

Q&AQ&A