polytechnic university,ece department1 detection of “hot spots” paper title : joint data...
Post on 19-Dec-2015
221 views
TRANSCRIPT
Polytechnic University,ECE Department 1
Detection of “Hot Spots”
Paper Title: Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations
Liang,Chao
Polytechnic University,ECE Department 2
MotivationMotivation
“Hot spots” in the Internet – Super Source (large fan-out)
• Infected hosts by worm (Slammer worm)
– Super Destination (large fan-in)• DDoS victim
Internet attacks increasing in severity– Network security monitoring
Challenges • High packets arrival rate• Speed requirement of RAM (DRAM vs SRAM)• Impractical per-flow state maintenance
Polytechnic University,ECE Department 3
How to find the How to find the needleneedle in the in the haystackhaystack
IP Flow– Abstraction: set of packets identified with same
address, ports, etc. – Flow label: Source-destination pair <pkt.src,pkt.dst>
General Problem: Heavy distinct-hitters– Given a stream of flow label <pkt.src,pkt.dst> pairs,
find all the src that are paired with a large number of distinct destination.
– Detect super destination: Reverse the flow label
flow 1 flow 2 flow 3
Polytechnic University,ECE Department 4
WeaponsWeapons
Previous Techniques– Flow state maintenance– Probabilistic counting– Bloom Filters– Multi-resolute bitmap– ……
This paper Sampling Network Data streaming
Polytechnic University,ECE Department 5
PaperPaper Qi Zhao, Abhishek Kumar, Jun Xu, “Joint Data
Streaming and Sampling Techniques for Detection of Super Sources and Destinations”, IMC 2005
Polytechnic University,ECE Department 6
Outline of the rest of the talkOutline of the rest of the talk
Introduction of one previous work– Traditional hash-based flow sampling
Main approach– Simple scheme– Advanced scheme
Evaluation
Summary
Polytechnic University,ECE Department 7
Traditional hash-based flow samplingTraditional hash-based flow sampling
Flow sampling– Sample flows with a certain percentage p
• Hash function maps flow label to a value uniformly distributed in [0,1)
• H (flow label)<p, then sample the flow
Hash Table– HT1.Detect and discard duplicate ones
• Access the element with index by hashing flow label• Element: list of flow label pairs
– HT2.Count flow numbers• Access the element with index by hashing srcIP• Element: list of <srcIP,count> pairs
Polytechnic University,ECE Department 8
Traditional hash-based flow samplingTraditional hash-based flow sampling
Fan-out Calculation– Threshold Judge to report the super source – Estimation to compensate sampling
Ē=E*(1/p)
Performance Analysis– Key Ineffective Reason - Low sampling rate
• The update cost of hash tables (In DRAM)• Elephant flows influence
– Performance bottleneck: Query of the first hash table
– Result• The sampling rate p<< Hs / Tr
– Hs: operating speed of hash table– Tr: arrival rate of traffic
• Estimation error scale by 1/p
p is too slow!
Polytechnic University,ECE Department 9
Contribution of this paperContribution of this paper Network Data Streaming
– Process each and every incoming packet in real-time– Employ a small and fast memory– Maintain only the most pertinent information
Two schemes– Simple scheme : filtering after sampling– Advanced scheme : separation of counting and
identity gathering
Include more information
Polytechnic University,ECE Department 10
Simple Scheme SystemSimple Scheme System Filtering after sampling System
Data Streaming module – Replace the hash table– Final goal: improve the sampling rate
Polytechnic University,ECE Department 11
Simple Scheme – Data Streaming Simple Scheme – Data Streaming ModuleModule How to realize
– Employing bit array to label new flow• Bit array G: w bits• Hash function: maps to a value uniformly distributed in
[1,w]
– Employ SRAM (static random access memory)
packet H( )
0 1 2 i0
w-11
flow label
Polytechnic University,ECE Department 12
Simple Scheme - EstimationSimple Scheme - Estimation
Hash collision in data streaming – Different flows have same index of G– Miss the update of the hash table
Compensation of the collision– when the ith new flow arrival
• Variable u: to keep track of the number of “0” in G• Variable i : hash result of the new flow • P(G[i]=0) = u/w
– Compensate the hash collisions by adding w/u
Unbiased Estimation of count – Hash table updated by K flows
Polytechnic University,ECE Department 13
Simple Scheme - AlgorithmSimple Scheme - Algorithm
Compensation Calculation
Polytechnic University,ECE Department 14
Simple Scheme - AnalysisSimple Scheme - Analysis
Unbiased estimator of fan-out
Saturation Avoidance Number of ‘0’ element Probability to be recorded
– Minimum of ‘0’ element typically set around w/2 (half full)
– Two sets of arrays and hash tables operated alternatively
Sampling rate improved– Affordable SRAM
• Little memory consumption to support high speed links
– Streaming speed• Poisson alike update times of the hash table• Efficient hardware implementation of hash function• All operations in data streaming module can be finished in about
10ns
Bottleneck!
Polytechnic University,ECE Department 15
Advanced Scheme - SystemAdvanced Scheme - System
Record source identity
(e.g.. source IP)
Record flow information to array
in real-time
Use the source identity(2) to look up the array(1) to
estimate offline
Polytechnic University,ECE Department 16
Advanced Scheme – Streaming algorithmAdvanced Scheme – Streaming algorithm
2D bit array A(m,n) Four hash functions
– One to get row number (range [1,m])– Three to get column number (range [1,n])
this case k=3
Polytechnic University,ECE Department 17
Advanced Scheme – Streaming moduleAdvanced Scheme – Streaming module
Row collision
Column collisionWhy k=3?
Polytechnic University,ECE Department 18
The Linear-Time probabilistic counting The Linear-Time probabilistic counting algorithmalgorithm
Idea from Database field: counting the number of unique values in the presence of duplicates
Estimation of distinct flow number– m : column size– n : total number of flow– Aj : the jth element of column– Un: the number of element whose value is “0”
j
Polytechnic University,ECE Department 19
Joint relation calculationJoint relation calculation
The distinct values in the join of two relations– AB=A+B-AUB– A->G1 B->G2
Estimate them by linear counting D based on G– AB=D(G1)+D(G2)-
D(G1UG2)Note: Cannot directly
calculate G1G2 cause different space
AпB G1пG2
AUB G1UG2
Polytechnic University,ECE Department 20
Advanced Scheme – Estimation moduleAdvanced Scheme – Estimation module
Computing the join selectivity in three columns(k=3)– U: Bitwise-OR
Avoid two sources both hashed to the same k columns– S: total number distinct sources– n: column number
– The probability of collision drop to 0.002 – When n=16,000, S=100,000, k=3
Polytechnic University,ECE Department 21
AdvancedAdvanced SchemeScheme – Identity– Identity modulemodule
Purpose – Capture the identities of potential super sources– Write data into DRAM in real-time
Identity collection– Estimate the corresponding fan-out as input
data
Why DRAM?– Replace expensive hash table operation– Sequential writes can be very fast
• 100% and 25% recording for OC-192 and OC-768
Polytechnic University,ECE Department 22
EvaluationEvaluation
Real internet traffic traces– UNC(1 Gbps),USC,NLANR(IPKS+,IPKS-)(OC192
link)
Polytechnic University,ECE Department 23
Evaluation-Simple SchemeEvaluation-Simple Scheme
[UNC] Sampling rate:1/4 Bit array size:128Kb– Area1:false positives Area II: false negative
Polytechnic University,ECE Department 24
Evaluation-Advanced SchemeEvaluation-Advanced Scheme
[UNC]2D Bit array A: 128KB(64*16,384) sampling rate:1
Polytechnic University,ECE Department 26
SummarySummary
Monitoring at high speed is challenging
Network Data Streaming– Keep up with the line speed– Include more pertinent information
Employ other fields achievements