on designing fast nonuniformly distributed ip address lookup hashing algorithms
DESCRIPTION
On Designing Fast Nonuniformly Distributed IP Address Lookup Hashing Algorithms. Author: Christopher J. Martinez, Devang K. Pandya, and Wei-Ming lin Publisher: IEEE/ACM Transactions on Networking, 2009 Presenter : Yuen- Shuo Li Date: 2013/01/09. Outline. Introduction - PowerPoint PPT PresentationTRANSCRIPT
1
On Designing Fast Nonuniformly Distributed IP Address Lookup Hashing Algorithms
Author: Christopher J. Martinez, Devang K. Pandya, and Wei-Ming lin
Publisher: IEEE/ACM Transactions on Networking, 2009 Presenter: Yuen-Shuo Li Date: 2013/01/09
2
Outline
Introduction Proposed Hashing Algorithm Simulation Results Implementation
3
Introduction(1/4)
Hashing has been widely used for fast IP address, but performance from known hashing schemes is far from optimal due to the nonuniformity in actual IP address distribution.
4
Introduction(2/4)
there exist a set of well-established hash algorithms such as MD4, MD5, SHA-1, and SHA-2, which have found use in the cryptography field.
These algorithms rely on a series of addition, bit rotation, and logic operations through many cycles.
Too slow!
5
Introduction(3/4)
CRC-based hash functions have proven to be excellent means, but have some potential shortcomings.
Compared to a simple XOR folding hash algorithm that can be implemented in a fast parallel circuit, the CRC-based hash function requires a sequential circuit and a much longer time to determine the hash value.
can’t be implement in parallel !
6
Introduction(4/4)
The goal of this paper is to develop a universal hashing methodology applicable to nonuniformly distributed data sets.
Our proposed designs allow the application of a standard XOR folding hashing to produce a significantly improved performance.
A New Hash Function (improve XOR folding hashing)
K1, k2, k3K4, k5K9, k10K6, k11, k12K13,k19K7, k17K8, k18K14, k15, k16
balance!
7
Proposed Hashing Algorithm(1/13)
The hashing process is to hash each of the n-bit entries into an m-bit hash value.
n bits
m bitshash
n bits
Proposed Hashing Algorithm(2/13)
Intuitively, using the bits with smaller d values for hashing would lead to a probabilistically better hash distribution.
d: the difference between the number of 0’s and 1’s
1 1
1
1 1
1
d=
0
0
0
0
0
0
0 2 2
Proposed Hashing Algorithm(3/13)
Employ a simple preprocessing step in rearranging the n-bit vectors according to their d values sorted into a increasing order.
n bits
10
Proposed Hashing Algorithm(4/13)
A bit-extraction hashing is to simply extract m bits from the n-bit entry as its hash value
n bits
m bits
n bits
m bits
EXT d-EXT
sort by d
11
Proposed Hashing Algorithm(5/13)
MSL: the largest number of entries that are mapped into any hash bin.ASL: the average maximum number of matching steps needed for any given record to match.
n=32, m= varied
12
Proposed Hashing Algorithm(6/13)
Group-XOR is a commonly used hashing technique by simply grouping the n-bit key into m-bit hash result through a simple process XORing every n/m key bits into a final hash bit.
12
n bits m bits
⨁m bits
m bits
m bits
13
Proposed Hashing Algorithm(7/13)
The goal of this paper is to use the extracted information from the preprocessing (d values) to facilitate a better hash design with the XOR operator.
14
Proposed Hashing Algorithm(8/13)
In order not to degrade the hash performance, every intended XOR operation to be taken between two bits should lead to a value such that .
15
Proposed Hashing Algorithm(9/13)
Bit vectors with smaller d values are XORed with larger d-value bits in order to have a better chance for further reduction.
Bit vectors in the middle range are XORed together to provide the most reductions available.
16
Proposed Hashing Algorithm(10/13)
Two straightforward ways to exploit the benefit from the d-value-based sorted sequence are to perform XOR hashing on the preprocessed database.
17
Proposed Hashing Algorithm(11/13)
The traditional group-XOR process may easily lead to detrimental effect, while both d-IOX and d-SOX avoid XORing two bits –- both with small values (the worst possible XORing) both with large values (the XORing leading to minimal gain).
18
Proposed Hashing Algorithm(12/13)
Natural-Fold XOR(d-NFX) folds the sorted bit sequence from both ends’ matching pair of bits accordingly.
Natural-Fold with Duplication XOR(d-NFD) duplicates the middle subsegments to patch up the missing portion for uniformity.
19
Proposed Hashing Algorithm(13/13)
d-NFD may lead to overduplication or underduplication on the center subsegments.
A simple method is adopted in simply truncating the bits overshot or duplicating more the once.
20
Simulation Results(1/12)
The data set used for our simulation is randomly generated such that the value for each bit position is uniformly distributed.
16384(214) entries
21
Simulation Results(2/12)
The simulation results for n = 32 and are given in Fig. 12 in terms of MSL and ASL by taking an average of results from 1000 runs.
MSL: the largest number of entries that are mapped into any hash bin.ASL: the average maximum number of matching steps needed for any given record to match.
RS hash
22
Simulation Results(3/12)
RS Hash(additional)
23
Simulation Results(4/12)
a summary of performance gain in MSL from each of the three proposed techniques and the two reference techniques over the group-XOR.
24
Simulation Results(5/12)
RS HashThe RS is a multiplicative hash algorithm that requires two multiply and one addition steps for every 8 bits of hash key to generate a hash value.
CRC-32 HashThe CRC-32 requires 32 iterations to generate the final hash value for a given hash key, requiring additional control logic to properly maintain the sequential process.
25
Simulation Results(6/12)
the average d value of each final hash bit for m=14
26
Simulation Results(7/12)
a collection of real IP addresses gathered from three different sources: general IP traffic addresses; ad/spam IP addresses; P2P IP addresses.
27
Simulation Results(8/12)
Performance comparison in terms of MSL and ASL on general IP traffic addresses.
28
Simulation Results(9/12)
Performance comparison in terms of MSL and ASL on AD/SPAM IP traffic addresses.
29
Simulation Results(10/12)
Performance comparison in terms of MSL and ASL on P2P IP traffic addresses.
30
Simulation Results(11/12)
To further analyze potential performance difference between the d-value XOR folding algorithms and the well-established CRC and RS hashing algorithms, the 2 analysis is conducted.
31
Simulation Results(12/12)
the 2 analysis
32
Implementation(1/2)
The mapping from the original bit position to the sorted position and then through the d-SOX hashing.
33
Implementation(2/2)
d-NFD