reconfigurable accelerator for the

Upload: rajasekar-panneerselvam

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Reconfigurable Accelerator for The

    1/46

    Reconfigurable Accelerator for the

    Word-Matching Stage of BLASTN

    Abstract

    BLAST is one of the most popular sequence analysis tools used by molecular

    biologists. It is designed to efficiently find similar regions between two sequences that

    have biological significance. However, because the size of genomic databases is growing

    rapidly, the computation time of BLAST, when performing a complete genomic database

    search, is continuously increasing. Thus, there is a clear need to accelerate this process.

    In this paper, we present a new approach for genomic sequence database scanning

    utilizing reconfigurable field programmable gate array (FPGA)-based hardware. In order

    to derive an efficient structure for BLASTN, we propose a reconfigurable architecture to

    accelerate the computation of the word-matching stage. The experimental results show

    that the FPGA implementation achieves a speedup around one order of magnitude

    compared to the NCBI BLASTN software running on a general purpose computer.

    INTRODUCTION

    Scanning genomic sequence databases is a common and often repeated task in molecular

    biology. The need for speeding up these searches comes from the rapid growth of these

    gene banks: every year their size is scaled by a factor of 1.5 to 2. The aim of a scan

    operation is to find similarities between the query sequence and a particular genome

    sequence, which might indicate similar functionality from a biological point of view.

    Dynamic programming-based alignment algorithms can guarantee to find all important

    similarities. However, as the search space is the product of the two sequences, which

    could be several billion bases in size, it is generally not feasible to use a direct

    implementation. One frequently used approach to speed up this time-consumingoperation is to use heuristics in the search algorithm. One of the most widely used

    sequence analysis tools to use heuristics is the basic local alignment search tool (BLAST)

    [2]. Although BLASTs algorithms are highly optimized for similarity search, the ever

    growing databases outpace the speed improvements that BLAST can provide on a general

  • 7/27/2019 Reconfigurable Accelerator for The

    2/46

    purpose PC. BLASTN, a version of BLAST specifically designed for DNA sequence

    searches, consists of a three-stage pipeline.

    Stage 1: Word-Matching detect seeds (short exact matches of a certain length between

    the query sequence and the subject sequence), the inputs to this stage are strings of DNA

    bases, which typically uses the alphabet {A, C, G, T}.

    Stage 2: Ungapped Extension extends each seed in both directions allowing substitutions

    only and outputs the resulting high-scoring segment pairs (HSPs). An HSP [3] indicates

    two sequence fragments with equal length whose alignment score meets or exceeds a

    empirically set threshold (or cutoff score).

    Stage 3: Gapped Extension uses the Smith-Waterman dynamic programming algorithm

    to extend the HSPs allowing insertions and deletions.

    The basic idea underlying a BLASTN search is filtration. Although each stage in

    the BLASTN pipeline is becoming more sophisticated, the exponential increase in the

    volume ofdata makes it important that measures are taken to reduce theamount of data

    that needs to be processed. Filtration discards irrelevant fractions as early as possible,

    thus reducing the overall computation time. Analysis of the various stages of the

    BLASTN pipeline (see Table I) reveals that the word-matchingstage is the most time-

    consuming part. Therefore, accelerating the computation of this stage will have the

    greatest effect onthe overall performance.

    EXISTING SYSTEM

    BASIC LOCAL ALIGNMENT SEARCH TOOL

    A new approach to rapid sequence comparison, basic local alignment search tool

    (BLAST), directly approximates alignments that optimize a measure of local similarity,

    the maximal segment pair (MSP) score. Recent mathematical results on the stochasticproperties of MSP scores allow an analysis of the performance of this method as well as

    the statistical significance of alignments it generates. The basic algorithm is simple and

    robust; it can be implemented in a number of ways and applied in a variety of contexts

    including straight-forward DNA and protein sequence database searches, motif searches,

    gene identification searches, and in the analysis of multiple regions of similarity in long

  • 7/27/2019 Reconfigurable Accelerator for The

    3/46

    DNA sequences. In addition to its flexibility and tractability to mathematical analysis,

    BLAST is an order of magnitude faster than existing sequence comparison tools of

    comparable sensitivity.

    A RECONFIGURABLE BLOOM FILTER ARCHITECTURE FOR BLASTN

    Efficient seed-based filtration methods exist for scanning genomic sequence

    databases. However, current solutions require a significant scan time on traditional

    computer architectures. These scan time requirements are likely to become even more

    severe due to the rapid growth in the size of databases. In this paper, we present a new

    approach to genomic sequence database scanning using reconfigurable field-

    programmable gate array (FPGA)-based hardware. To derive an efficient mapping onto

    this type of architecture, we propose a reconfigurable Bloom filter architecture. Our

    experimental results show that the FPGA implementation achieves an order of magnitude

    speedup compared to the NCBI BLASTN software running on a general purpose

    computer.

    EFFICIENT HARDWARE HASHING FUNCTIONS FOR HIGH

    PERFORMANCE COMPUTERS

    Hashing is critical for high performance computer architecture. Hashing is used

    extensively in hardware applications, such as page tables, for address translation. Bit

    extraction and exclusive ORing hashing methods are two commonly used hashing

    functions for hardware applications. There is no study of the performance of these

    functions and no mention anywhere of the practical performance of the hashing functions

    in comparison with the theoretical performance prediction of hashing schemes. In this

    paper, we show that, by choosing hashing functions at random from a particular class,

    called H3, of hashing functions, the analytical performance of hashing can be achieved in

    practice on real-life data. Our results about the expected worst case performance of

    hashing are of special significance, as they provide evidence for earlier theoretical

    predictions.

    AN APPROACH FOR MINIMAL PERFECT HASH

  • 7/27/2019 Reconfigurable Accelerator for The

    4/46

    FUNCTIONS FOR VERY LARGE DATABASES

    We propose a novel external memory based algorithm for constructing minimal

    perfect hash functions h for huge sets of keys. For a set of n keys, our algorithm outputs h

    in time O(n). The algorithm needs a small vector of one byte entries in main memory to

    construct h. The evaluation of h(x) requires three memory accesses for any key x. The

    description of h takes a constant number of up to 9 bits for each key, which is optimal

    and close to the theoretical lower bound, i.e., around 2 bits per key. In our experiments,

    we used a collection of 1 billion URLs collected from the web, each URL 64 characters

    long on average. For this collection, our algorithm (i) finds a minimal perfect hash

    function in approximately 3 hours using a commodity PC, (ii) needs just 5.45 megabytes

    of internal memory to generate h and (iii) takes 8.1 bits per key for the description of h.

    MERCURY BLAST DICTIONARIES: ANALYSIS AND PERFORMANCE

    MEASUREMENT

    This report describes a hashing scheme for a dictionary of short bit strings. The

    scheme, which we call near-perfect hashing, was designed as part of the construction of

    Mercury BLAST, an FPGA-based accelerator for the BLAST family of biosequence

    comparison algorithms.

    Near-perfect hashing is a heuristic variant of the well-known displacement

    hashing approach to building perfect hash functions. It uses a family of hash functions

    composed from linear transformations on bit vectors and lookups in small precomputed

    tables, both of which are especially appropriate for implementation in hardware logic. We

    show empirically that for inputs derived from genomic DNA sequences, our scheme

    obtains a good tradeoff between the size of the hash table and the time required to ompute

    it from a set of input strings, while generating few or no collisions between keys in the

    table.

    One of the building blocks of our scheme is the H_3 family of hash functions,

    which are linear transformations on bit vectors. We show that the uniformity of hashing

    performed with randomly chosen linear transformations depends critically on their rank,

    and that randomly chosen transformations have a high probability of having the

    maximum possible uniformity. A simple test is sufficient to ensure that a randomly

  • 7/27/2019 Reconfigurable Accelerator for The

    5/46

    chosen H_3 hash function will not cause an unexpectedly large number of collisions.

    Moreover, if two such functions are chosen independently at random, the second function

    is unlikely to hash together two keys that were hashed together by the first.

    Hashing schemes based on H_3 hash functions therefore tend to distribute their

    inputs more uniformly than would be expected under a simple uniform hashing model,

    and schemes using pairs of these functions are more uniform than would be assumed for

    a pair of independent hash functions.

    PROPOSED SYSTEM

    In this paper, we propose a computationally efficient architecture to accelerate the

    data processing of the word-matching stage based on field programmable gate arrays

    (FPGA). FPGAs are suitable candidate platforms for high-performance computation due

    to their fine-grained parallelism and pipelining capabilities.

    BLOOM FILTERS

    Introduction

    Bloom filters [2] are compact data structures for probabilistic representation of a set in

    order to support membership queries (i.e. queries that ask: Is elementXin set Y?). This

    compact representation is the payoff for allowing a small rate offalse positives inmembership queries; that is, queries might incorrectly recognize an element as member

    of the set.

    We succinctly present Bloom filters use to date in the next section. In Section 3 we

    describe Bloom filters in detail, and in Section 4 we give a hopefully precise picture ofspace/computing time/error rate tradeoffs.

    Usage

    Since their introduction in [2], Bloom filters have seen various uses:

  • 7/27/2019 Reconfigurable Accelerator for The

    6/46

    Web cache sharing([3]) Collaborating Web caches use Bloom filters (dubbed cachesummaries) as compact representations for the local set of cached files. Each cache

    periodically broadcasts its summary to all other members of the distributed cache.

    Using all summaries received, a cache node has a (partially outdated, partially wrong)

    global image about the set of files stored in the aggregated cache. The Squid Web

    Proxy Cache [1] uses Cache Digests based on a similar idea.

    Query filtering and routing ([4, 6, 7]) The Secure wide-area Discovery Service[6], subsystem of Ninja project [5], organizes service providers in a hierarchy. Bloom

    filters are used as summaries for the set of services offered by a node. Summaries are

    sent upwards in the hierarchy and aggregated. A query is a description for a specific

    service, also represented as a Bloom filter. Thus, when a member node of the hierarchy

    generates/receives a query, it has enough information at hand to decide where to forward

    the query: downward, to one of its descendants (if a solution to the query is present in the

    filter for the corresponding node), or upward, toward its parent (otherwise).

    The OceanStore [7] replica location service uses a two-tiered approach: first it initiates an

    inexpensive, probabilistic search (based on Bloom filters, similar to Ninja) to try and find

    a replica. If this fails, the search falls-back on (expensive) deterministic algorithm (based

    on Plaxton replica location algorithm). Alas, their description of the probabilistic search

    algorithm is laconic. (An unpublished text [11] from members of the same group gives

    some more details. But this does not seem to work well when resources are dynamic.)

    Compact representation of a differential file ([9]). A differential file contains abatch of database records to be updated. For performance reasons the database is

    updated only periodically (i.e., midnight) or when the differential file grows above a

    certain threshold. However, in order to preserve integrity, each reference/query to the

    database has to access the differential file to see if a particular record is scheduled to be

    updated. To speed-up this process, with little memory and computational overhead, the

    differential file is represented as a Bloom filter.

    Free text searching ([10]). Basically, the set of words that appear in a text issuccinctly represented using a Bloom filter

  • 7/27/2019 Reconfigurable Accelerator for The

    7/46

    Constructing Bloom Filters

    Consider a set },...,,{ 21 naaaA of n elements. Bloom filters describe membership

    information ofA using a bit vectorVof length m. For this, khash functions, khhh ,...,, 21

    with }..1{: mXhi , are used as described below:

    The following procedure builds an m bits Bloom filter, corresponding to a set A and

    using khhh ,...,, 21 hash functions:

    Procedure BloomFilter(set A, hash_functions, integer m)

    returns filter

    filter = allocate m bits initialized to 0

    foreachai inA:

    foreach hash function hj:

    filter[hj(ai)] = 1

    end foreach

    end foreach

    return filter

    Therefore, if ai is member of a set A, in the resulting Bloom filter V all bits obtained

    corresponding to the hashed values of ai are set to 1. Testing for membership of an

    element elm is equivalent to testing that all corresponding bits ofVare set:

    Procedure MembershipTest (elm, filter, hash_functions)

    returns yes/no

    foreach hash function hj:

  • 7/27/2019 Reconfigurable Accelerator for The

    8/46

    iffilter[hj(elm)] != 1 return No

    end foreach

    return Yes

    Nice features: filters can be built incrementally: as new elements are added to a set the

    corresponding positions are computed through the hash functions and bits are set in the

    filter. Moreover, the filter expressing the reunion of two sets is simply computed as the

    bit-wise OR applied over the two corresponding Bloom filters.

    Bloom Filters

    the Math (this follows the description in [3])One prominent feature of Bloom filters is that there is a clear tradeoff between the size of

    the filter and the rate of false positives. Observe that after inserting n keys into a filter of

    size m using khash functions, the probability that a particular bit is still 0 is:

    m

    knkn

    em

    p

    1

    110 . (1)

    (Note that we assume perfect hash functions that spread the elements of A evenly

    throughout the space {1..m}. In practice, good results have been achieved using MD5

    and other hash functions [10].)

    Hence, the probability of a false positive (the probability that all k bits have been

    previously set) is:

    k

    m

    knk

    kn

    k

    err em

    pp

    11

    111 0 (2)

    In (2) perr is minimized for 2lnn

    mk hash functions. In practice however, only a small

    number of hash functions are used. The reason is that the computational overhead of

    each hash additional function is constant while the incremental benefit of adding a new

    hash function decreases after a certain threshold (see Figure 1).

  • 7/27/2019 Reconfigurable Accelerator for The

    9/46

    Figure 1: False positive rate as a function

    of the number of hash functions used. The

    size of the Bloom filter is 32 bits per entry

    (m/n=32). In this case using 22 hash

    functions minimizes the false positive rate.

    Note however that adding a hash function

    does not significantly decrease the error

    rate when more than 10 hashes are already

    used.

    Figure 2: Size of Bloom filter (bits/entry)

    as a function of the error rate desired.

    Different lines represent different numbers

    of hash keys used. Note that, for the error

    rates considered, using 32 keys does not

    bring significant benefits over using only 8

    keys.

    1.E-07

    1.E-06

    1.E-05

    1.E-04

    1.E-03

    1.E-02

    1.E-01

    1 4 7 10 13 16 19 22 25 28 31

    Falsepositives

    rate(logscale)

    Number of hash functions

    0

    10

    20

    30

    40

    50

    60

    70

    1.E-06 1.E-05 1.E-04 1.E-03 1.E-02 1.E-01

    Bits

    perentry

    Error rate (log scale)

    k=2

    k=4

    k=8

    k=16

    k=32

  • 7/27/2019 Reconfigurable Accelerator for The

    10/46

    (2) is the base formula for engineering Bloom filters. It allows, for example, computing minimal

    memory requirements (filter size) and number of hash functions given the maximum acceptable

    false positives rate and number of elements in the set (as we detail in Figure 2).

    k

    perr

    e

    knm

    ln

    1ln

    (bits per entry) (3)

    To summarize: Bloom filters are compact data structures for probabilistic representation of a set

    in order to support membership queries. The main design tradeoffs are the number of hash

    functions used (driving the computational overhead), the size of the filter and the error (collision)

    rate. Formula (2) is the main formula to tune parameters according to application requirements.

    Compressed Bloom filters

    Some applications that use Bloom filters need to communicate these filters across the network.

    In this case, besides the three performance metrics we have seen so far: (1) the computational

    overhead to lookup a value (related to the number of hash functions used), (2) the size of the

    filter in memory, and (3) the error rate, a fourth metric can be used: the size of the filter

    transmitted across the network. M. Mitzenmacher shows in [8] that compressing Bloom filters

    might lead to significant bandwidth savings at the cost of higher memory requirements (larger

    uncompressed filters) and some additional computation time to compress the filter that is sent

    across the network. We do not detail here all theoretical and practical issues analyzed in [8].

    A Bloom filter, conceived by Burton Howard Bloom in 1970 is a space-

    efficient probabilistic data structure that is used to test whether an element is a member of

    a set. False positive matches are possible, but false negatives are not; i.e. a query returns either

    "inside set (may be wrong)" or "definitely not in set". Elements can be added to the set, but not

    removed (though this can be addressed with a "counting" filter). The more elements that are

    added to the set, the larger the probability of false positives.

    Bloom proposed the technique for applications where the amount of source data would

    require an impracticably large hash area in memory if "conventional" error-free hashing

    techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of

    http://en.wikipedia.org/w/index.php?title=Burton_Howard_Bloom&action=edit&redlink=1http://en.wikipedia.org/wiki/Probabilistichttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Element_(mathematics)http://en.wikipedia.org/wiki/Set_(computer_science)http://en.wikipedia.org/wiki/Type_I_and_type_II_errorshttp://en.wikipedia.org/wiki/Type_I_and_type_II_errorshttp://en.wikipedia.org/wiki/Hyphenation_algorithmhttp://en.wikipedia.org/wiki/Hyphenation_algorithmhttp://en.wikipedia.org/wiki/Type_I_and_type_II_errorshttp://en.wikipedia.org/wiki/Type_I_and_type_II_errorshttp://en.wikipedia.org/wiki/Set_(computer_science)http://en.wikipedia.org/wiki/Element_(mathematics)http://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Probabilistichttp://en.wikipedia.org/w/index.php?title=Burton_Howard_Bloom&action=edit&redlink=1
  • 7/27/2019 Reconfigurable Accelerator for The

    11/46

    500,000 words, of which 90% could be hyphenated by following simple rules but all the

    remaining 50,000 words required expensive disk access to retrieve their specific patterns. With

    unlimited core memory, an error-free hash could be used to eliminate all the unnecessary disk

    access. But if core memory was insufficient, a smaller hash area could be used to eliminate most

    of the unnecessary access. For example, a hash area only 15% of the error-free size would still

    eliminate 85% of the disk accesses (Bloom (1970)).

    More generally, fewer than 10 bits per element are required for a 1% false positive probability,

    independent of the size or number of elements in the set (Bonomi et al. (2006)).

    Algorithm description

    An example of a Bloom filter, representing the set {x,y,z}. The colored arrows show the

    positions in the bit array that each set element is mapped to. The element w is not in the set {x, y,

    z}, because it hashes to one bit-array position containing 0. For this figure, m=18 and k=3.

    An empty Bloom filter is a bit array ofm bits, all set to 0. There must also be kdifferent hash

    functions defined, each of which maps or hashes some set element to one of the m array positions

    with a uniform random distribution.

    To add an element, feed it to each of the khash functions to get karray positions. Set the bits at

    all these positions to 1.

    To query for an element (test whether it is in the set), feed it to each of the khash functions to

    get karray positions. If any of the bits at these positions are 0, the element is definitely not in the

    setif it were, then all the bits would have been set to 1 when it was inserted. If all are 1, then

    either the element is in the set, or the bits have by chance been set to 1 during the insertion of

    http://en.wikipedia.org/wiki/Bloom_filter#CITEREFBloom1970http://en.wikipedia.org/wiki/Bloom_filter#CITEREFBonomiMitzenmacherPanigrahySingh2006http://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Map_(mathematics)http://en.wikipedia.org/wiki/File:Bloom_filter.svghttp://en.wikipedia.org/wiki/Map_(mathematics)http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFBonomiMitzenmacherPanigrahySingh2006http://en.wikipedia.org/wiki/Bloom_filter#CITEREFBloom1970
  • 7/27/2019 Reconfigurable Accelerator for The

    12/46

    other elements, resulting in a false positive. In a simple bloom filter, there is no way to

    distinguish between the two cases, but more advanced techniques can address this problem.

    The requirement of designing kdifferent independent hash functions can be prohibitive for

    large k. For a good hash functionwith a wide output, there should be little if any correlationbetween different bit-fields of such a hash, so this type of hash can be used to generate multiple

    "different" hash functions by slicing its output into multiple bit fields. Alternatively, one can

    pass kdifferent initial values (such as 0, 1, ..., k 1) to a hash function that takes an initial value;

    or add (or append) these values to the key. For largerm and/ork, independence among the hash

    functions can be relaxed with negligible increase in false positive rate (Dillinger & Manolios

    (2004a), Kirsch & Mitzenmacher (2006)). Specifically, Dillinger & Manolios (2004b) show the

    effectiveness of deriving the kindices using enhanced double hashing ortriple hashing, variants

    ofdouble hashing that are effectively simple random number generators seeded with the two or

    three hash values.

    Removing an element from this simple Bloom filter is impossible because false negatives are not

    permitted. An element maps to kbits, and although setting any one of those kbits to zero suffices

    to remove the element, it also results in removing any other elements that happen to map onto

    that bit. Since there is no way of determining whether any other elements have been added that

    affect the bits for an element to be removed, clearing any of the bits would introduce the

    possibility for false negatives.

    One-time removal of an element from a Bloom filter can be simulated by having a second Bloom

    filter that contains items that have been removed. However, false positives in the second filter

    become false negatives in the composite filter, which may be undesirable. In this approach re-

    adding a previously removed item is not possible, as one would have to remove it from the

    "removed" filter.

    It is often the case that all the keys are available but are expensive to enumerate (for example,

    requiring many disk reads). When the false positive rate gets too high, the filter can be

    regenerated; this should be a relatively rare event.

    Space and time advantages

    http://en.wikipedia.org/wiki/False_positivehttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFDillingerManolios2004ahttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFDillingerManolios2004ahttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFKirschMitzenmacher2006http://en.wikipedia.org/wiki/Bloom_filter#CITEREFDillingerManolios2004bhttp://en.wikipedia.org/w/index.php?title=Enhanced_double_hashing&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Triple_hashing&action=edit&redlink=1http://en.wikipedia.org/wiki/Double_hashinghttp://en.wikipedia.org/wiki/Double_hashinghttp://en.wikipedia.org/w/index.php?title=Triple_hashing&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Enhanced_double_hashing&action=edit&redlink=1http://en.wikipedia.org/wiki/Bloom_filter#CITEREFDillingerManolios2004bhttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFKirschMitzenmacher2006http://en.wikipedia.org/wiki/Bloom_filter#CITEREFDillingerManolios2004ahttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFDillingerManolios2004ahttp://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/False_positive
  • 7/27/2019 Reconfigurable Accelerator for The

    13/46

    Bloom filter used to speed up answers in a key-value storage system. Values are stored on a disk

    which has slow access times. Bloom filter decisions are much faster. However some unnecessary

    disk accesses are made when the filter reports a positive (in order to weed out the false

    positives). Overall answer speed is better with the Bloom filter than without the Bloom filter.

    Use of a Bloom filter for this purpose, however, does increase memory usage.

    While risking false positives, Bloom filters have a strong space advantage over other data

    structures for representing sets, such as self-balancing binary search trees, tries, hash tables, or

    simple arrays orlinked lists of the entries. Most of these require storing at least the data items

    themselves, which can require anywhere from a small number of bits, for small integers, to an

    arbitrary number of bits, such as for strings (tries are an exception, since they can share storage

    between elements with equal prefixes). Linked structures incur an additional linear space

    overhead for pointers. A Bloom filter with 1% error and an optimal value ofk, in contrast,

    requires only about 9.6 bits per elementregardless of the size of the elements. This advantage

    comes partly from its compactness, inherited from arrays, and partly from its probabilistic nature.

    The 1% false-positive rate can be reduced by a factor of ten by adding only about 4.8 bits per

    element.

    However, if the number of potential values is small and many of them can be in the set, the

    Bloom filter is easily surpassed by the deterministic bit array, which requires only one bit for

    http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/File:Bloom_filter_speed.svghttp://en.wikipedia.org/wiki/Bit_arrayhttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Triehttp://en.wikipedia.org/wiki/Self-balancing_binary_search_tree
  • 7/27/2019 Reconfigurable Accelerator for The

    14/46

    each potential element. Note also that hash tables gain a space and time advantage if they begin

    ignoring collisions and store only whether each bucket contains an entry; in this case, they have

    effectively become Bloom filters with k= 1.[1]

    Bloom filters also have the unusual property that the time needed either to add items or to checkwhether an item is in the set is a fixed constant, O(k), completely independent of the number of

    items already in the set. No other constant-space set data structure has this property, but the

    average access time of sparse hash tables can make them faster in practice than some Bloom

    filters. In a hardware implementation, however, the Bloom filter shines because its klookups are

    independent and can be parallelized.

    To understand its space efficiency, it is instructive to compare the general Bloom filter with its

    special case when k= 1. Ifk= 1, then in order to keep the false positive rate sufficiently low, a

    small fraction of bits should be set, which means the array must be very large and contain long

    runs of zeros. The information content of the array relative to its size is low. The generalized

    Bloom filter (kgreater than 1) allows many more bits to be set while still maintaining a low false

    positive rate; if the parameters (kand m) are chosen well, about half of the bits will be set, and

    these will be apparently random, minimizing redundancy and maximizing information content.

    Probability of false positives

    http://en.wikipedia.org/wiki/Bloom_filter#cite_note-1http://en.wikipedia.org/wiki/Bloom_filter#cite_note-1http://en.wikipedia.org/wiki/Bloom_filter#cite_note-1http://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Information_contenthttp://en.wikipedia.org/wiki/File:Bloom_filter_fp_probability.svghttp://en.wikipedia.org/wiki/Information_contenthttp://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-1
  • 7/27/2019 Reconfigurable Accelerator for The

    15/46

    The false positive probability as a function of number of elements in the filter and the filter

    size . An optimal number of hash functions has been assumed.

    Assume that a hash function selects each array position with equal probability. Ifm is the

    number of bits in the array, and kis the number of hash functions, then the probability that a

    certain bit is not set to 1 by a certain hash function during the insertion of an element is then

    The probability that it is not set to 1 by any of the hash functions is

    If we have inserted n elements, the probability that a certain bit is still 0 is

    the probability that it is 1 is therefore

    Now test membership of an element that is not in the set. Each of the karray positions computed

    by the hash functions is 1 with a probability as above. The probability of all of them being 1,

    which would cause the algorithm to erroneously claim that the element is in the set, is often

    given as

    This is not strictly correct as it assumes independence for the probabilities of each bit being set.However, assuming it is a close approximation we have that the probability of false positives

    decreases as m (the number of bits in the array) increases, and increases as n (the number of

    inserted elements) increases. For a given m and n, the value ofk(the number of hash functions)

    that minimizes the probability is

    http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Hash_function
  • 7/27/2019 Reconfigurable Accelerator for The

    16/46

    which gives

    The required number of bits m, given n (the number of inserted elements) and a desired false

    positive probabilityp (and assuming the optimal value ofkis used) can be computed by

    substituting the optimal value ofkin the probability expression above:

    which can be simplified to:

    This results in:

    This means that for a given false positive probabilityp, the length of a Bloom filterm is

    proportionate to the number of elements being filtered n.[2]

    While the above formula is

    asymptotic (i.e. applicable as m,n ), the agreement with finite values ofm,n is also quite

    good; the false positive probability for a finite bloom filter with m bits, n elements, and khash

    functions is at most

    So we can use the asymptotic formula if we pay a penalty for at most half an extra element and at

    most one fewer bit.[3]

    Approximating the number of items in a Bloom filter

    Swamidass & Baldi (2007) showed that the number of items in a bloom filter can be

    approximated with the following formula,

    http://en.wikipedia.org/wiki/Bloom_filter#cite_note-2http://en.wikipedia.org/wiki/Bloom_filter#cite_note-2http://en.wikipedia.org/wiki/Bloom_filter#cite_note-2http://en.wikipedia.org/wiki/Bloom_filter#cite_note-3http://en.wikipedia.org/wiki/Bloom_filter#cite_note-3http://en.wikipedia.org/wiki/Bloom_filter#cite_note-3http://en.wikipedia.org/wiki/Bloom_filter#CITEREFSwamidassBaldi2007http://en.wikipedia.org/wiki/Bloom_filter#CITEREFSwamidassBaldi2007http://en.wikipedia.org/wiki/Bloom_filter#cite_note-3http://en.wikipedia.org/wiki/Bloom_filter#cite_note-2
  • 7/27/2019 Reconfigurable Accelerator for The

    17/46

    where is an estimate of the number of items in the filter,Nis length of the filter, kis the

    number of hash functions per item, andXis the number of bits set to one.

    The union and intersection of sets

    Bloom filters are a way of compactly representing a set of items. It is common to try and

    compute the size of the intersection or union between two sets. Bloom filters can be used to

    approximate the size of the intersection and union of two sets. Swamidass & Baldi (2007)

    showed that for two bloom filters of length , their counts, respectively can be estimated as

    and

    .

    The size of their union can be estimated as

    ,

    where is the number of bits set to one in either of the two bloom filters. And the

    intersection can be estimated as

    ,

    Using the three formulas together.

    Interesting properties

    Unlike a standard hash table, a Bloom filter of a fixed size can represent a set with an arbitrary

    large number of elements; adding an element never fails due to the data structure "filling up."

    However, the false positive rate increases steadily as elements are added until all bits in the filter

    are set to 1, at which point allqueries yield a positive result.

    Union and intersection of Bloom filters with the same size and set of hash functions can be

    implemented with bitwise OR and AND operations, respectively. The union operation on Bloom

    filters is lossless in the sense that the resulting Bloom filter is the same as the Bloom filter

    created from scratch using the union of the two sets. The intersect operation satisfies a weaker

    property: the false positive probability in the resulting Bloom filter is at most the false-positive

    http://en.wikipedia.org/wiki/Bloom_filter#CITEREFSwamidassBaldi2007http://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Union_(set_theory)http://en.wikipedia.org/wiki/Intersection_(set_theory)http://en.wikipedia.org/wiki/Bitwise_operationhttp://en.wikipedia.org/wiki/Bitwise_operationhttp://en.wikipedia.org/wiki/Intersection_(set_theory)http://en.wikipedia.org/wiki/Union_(set_theory)http://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFSwamidassBaldi2007
  • 7/27/2019 Reconfigurable Accelerator for The

    18/46

    probability in one of the constituent Bloom filters, but may be larger than the false positive

    probability in the Bloom filter created from scratch using the intersection of the two sets. There

    are also more accurate estimates of intersection and union[clarification needed]

    that are not biased in

    this way.[citation needed]

    Some kinds ofsuperimposed code can be seen as a Bloom filter implemented with

    physical edge-notched cards.

    Examples

    Google BigTable and Apache Cassandra use Bloom filters to reduce the disk lookups for non-

    existent rows or columns. Avoiding costly disk lookups considerably increases the performance

    of a database query operation.[4]

    The Google Chrome web browser uses a Bloom filter to identify malicious URLs. Any URL is

    first checked against a local Bloom filter and only upon a hit a full check of the URL is

    performed.[5]

    The Squid Web Proxy Cache uses Bloom filters forcache digests.[6]

    Bitcoin uses Bloom filters to verify payments without running a full network node.[7][8]

    The Venti archival storage system uses Bloom filters to detect previously stored data.[9]

    The SPIN model checkeruses Bloom filters to track the reachable state space for large

    verification problems.[10]

    The Cascading analytics framework uses Bloomfilters to speed up asymmetric joins, where one

    of the joined data sets is significantly larger than the other (often called Bloom join[11]

    in the

    database literature).[12]

    Alternatives

    Classic Bloom filters use bits of space per inserted key, where is the false

    positive rate of the Bloom filter. However, the space that is strictly necessary for any data

    structure playing the same role as a Bloom filter is only per key (Pagh, Pagh & Rao

    2005). Hence Bloom filters use 44% more space than a hypothetical equivalent optimal data

    structure. The number of hash functions used to achieve a given false positive rate is

    http://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarifyhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Superimposed_codehttp://en.wikipedia.org/wiki/Edge-notched_cardhttp://en.wikipedia.org/wiki/BigTablehttp://en.wikipedia.org/wiki/Apache_Cassandrahttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-4http://en.wikipedia.org/wiki/Bloom_filter#cite_note-4http://en.wikipedia.org/wiki/Bloom_filter#cite_note-4http://en.wikipedia.org/wiki/Google_Chromehttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-5http://en.wikipedia.org/wiki/Bloom_filter#cite_note-5http://en.wikipedia.org/wiki/Bloom_filter#cite_note-5http://en.wikipedia.org/wiki/Squid_(software)http://en.wikipedia.org/wiki/World_Wide_Webhttp://en.wikipedia.org/wiki/Web_cachehttp://wiki.squid-cache.org/SquidFaq/CacheDigestshttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-Wessels172-6http://en.wikipedia.org/wiki/Bloom_filter#cite_note-Wessels172-6http://en.wikipedia.org/wiki/Bloom_filter#cite_note-Wessels172-6http://en.wikipedia.org/wiki/Bitcoinhttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-7http://en.wikipedia.org/wiki/Bloom_filter#cite_note-7http://en.wikipedia.org/wiki/Bloom_filter#cite_note-7http://en.wikipedia.org/wiki/Ventihttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-9http://en.wikipedia.org/wiki/Bloom_filter#cite_note-9http://en.wikipedia.org/wiki/Bloom_filter#cite_note-9http://en.wikipedia.org/wiki/SPIN_model_checkerhttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-10http://en.wikipedia.org/wiki/Bloom_filter#cite_note-10http://en.wikipedia.org/wiki/Bloom_filter#cite_note-10http://en.wikipedia.org/wiki/Cascadinghttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-11http://en.wikipedia.org/wiki/Bloom_filter#cite_note-11http://en.wikipedia.org/wiki/Bloom_filter#cite_note-12http://en.wikipedia.org/wiki/Bloom_filter#cite_note-12http://en.wikipedia.org/wiki/Bloom_filter#cite_note-12http://en.wikipedia.org/wiki/Bloom_filter#CITEREFPaghPaghRao2005http://en.wikipedia.org/wiki/Bloom_filter#CITEREFPaghPaghRao2005http://en.wikipedia.org/wiki/Bloom_filter#CITEREFPaghPaghRao2005http://en.wikipedia.org/wiki/Bloom_filter#CITEREFPaghPaghRao2005http://en.wikipedia.org/wiki/Bloom_filter#cite_note-12http://en.wikipedia.org/wiki/Bloom_filter#cite_note-11http://en.wikipedia.org/wiki/Cascadinghttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-10http://en.wikipedia.org/wiki/SPIN_model_checkerhttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-9http://en.wikipedia.org/wiki/Ventihttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-7http://en.wikipedia.org/wiki/Bloom_filter#cite_note-7http://en.wikipedia.org/wiki/Bitcoinhttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-Wessels172-6http://wiki.squid-cache.org/SquidFaq/CacheDigestshttp://en.wikipedia.org/wiki/Web_cachehttp://en.wikipedia.org/wiki/World_Wide_Webhttp://en.wikipedia.org/wiki/Squid_(software)http://en.wikipedia.org/wiki/Bloom_filter#cite_note-5http://en.wikipedia.org/wiki/Google_Chromehttp://en.wikipedia.org/wiki/Bloom_filter#cite_note-4http://en.wikipedia.org/wiki/Apache_Cassandrahttp://en.wikipedia.org/wiki/BigTablehttp://en.wikipedia.org/wiki/Edge-notched_cardhttp://en.wikipedia.org/wiki/Superimposed_codehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Please_clarify
  • 7/27/2019 Reconfigurable Accelerator for The

    19/46

    proportional to which is not optimal as it has been proved that an optimal data structure

    would need only a constant number of hash functions independent of the false positive rate.

    Stern & Dill (1996) describe a probabilistic structure based on hash tables, hash compaction,

    which Dillinger & Manolios (2004b) identify as significantly more accurate than a Bloom filter

    when each is configured optimally. Dillinger and Manolios, however, point out that the

    reasonable accuracy of any given Bloom filter over a wide range of numbers of additions makes

    it attractive for probabilistic enumeration of state spaces of unknown size. Hash compaction is,

    therefore, attractive when the number of additions can be predicted accurately; however, despite

    being very fast in software, hash compaction is poorly suited for hardware because of worst-case

    linear access time.

    Putze, Sanders & Singler (2007) have studied some variants of Bloom filters that are either fasteror use less space than classic Bloom filters. The basic idea of the fast variant is to locate the k

    hash values associated with each key into one or two blocks having the same size as processor's

    memory cache blocks (usually 64 bytes). This will presumably improve performance by

    reducing the number of potential memory cache misses. The proposed variants have however the

    drawback of using about 32% more space than classic Bloom filters.

    The space efficient variant relies on using a single hash function that generates for each key a

    value in the range where is the requested false positive rate. The sequence of values

    is then sorted and compressed using Golomb coding (or some other compression technique) to

    occupy a space close to bits. To query the Bloom filter for a given key, it will

    suffice to check if its corresponding value is stored in the Bloom filter. Decompressing the whole

    Bloom filter for each query would make this variant totally unusable. To overcome this problem

    the sequence of values is divided into small blocks of equal size that are compressed separately.

    At query time only half a block will need to be decompressed on average. Because of

    decompression overhead, this variant may be slower than classic Bloom filters but this may becompensated by the fact that a single hash function need to be computed.

    Another alternative to classic Bloom filter is the one based on space efficient variants ofcuckoo

    hashing. In this case once the hash table is constructed, the keys stored in the hash table are

    http://en.wikipedia.org/wiki/Bloom_filter#CITEREFSternDill1996http://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/w/index.php?title=Hash_compaction&action=edit&redlink=1http://en.wikipedia.org/wiki/Bloom_filter#CITEREFDillingerManolios2004bhttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFPutzeSandersSingler2007http://en.wikipedia.org/wiki/Cache_misseshttp://en.wikipedia.org/wiki/Golomb_codinghttp://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Golomb_codinghttp://en.wikipedia.org/wiki/Cache_misseshttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFPutzeSandersSingler2007http://en.wikipedia.org/wiki/Bloom_filter#CITEREFDillingerManolios2004bhttp://en.wikipedia.org/w/index.php?title=Hash_compaction&action=edit&redlink=1http://en.wikipedia.org/wiki/Hash_tablehttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFSternDill1996
  • 7/27/2019 Reconfigurable Accelerator for The

    20/46

    replaced with short signatures of the keys. Those signatures are strings of bits computed using a

    hash function applied on the keys.

    Extensions and applications

    Counting filters

    Counting filters provide a way to implement a delete operation on a Bloom filter without

    recreating the filter afresh. In a counting filter the array positions (buckets) are extended from

    being a single bit to being an n-bit counter. In fact, regular Bloom filters can be considered as

    counting filters with a bucket size of one bit. Counting filters were introduced by Fan et al.

    (1998).

    The insert operation is extended to incrementthe value of the buckets and the lookup operation

    checks that each of the required buckets is non-zero. The delete operation, obviously, then

    consists of decrementing the value of each of the respective buckets.

    Arithmetic overflow of the buckets is a problem and the buckets should be sufficiently large to

    make this case rare. If it does occur then the increment and decrement operations must leave the

    bucket set to the maximum possible value in order to retain the properties of a Bloom filter.

    The size of counters is usually 3 or 4 bits. Hence counting Bloom filters use 3 to 4 times more

    space than static Bloom filters. In theory, an optimal data structure equivalent to a counting

    Bloom filter should not use more space than a static Bloom filter.

    Another issue with counting filters is limited scalability. Because the counting Bloom filter table

    cannot be expanded, the maximal number of keys to be stored simultaneously in the filter must

    be known in advance. Once the designed capacity of the table is exceeded, the false positive rate

    will grow rapidly as more keys are inserted.

    Bonomi et al. (2006) introduced a data structure based on d-left hashing that is functionally

    equivalent but uses approximately half as much space as counting Bloom filters. The scalabilityissue does not occur in this data structure. Once the designed capacity is exceeded, the keys

    could be reinserted in a new hash table of double size.

    The space efficient variant by Putze, Sanders & Singler (2007) could also be used to implement

    counting filters by supporting insertions and deletions.

    http://en.wikipedia.org/wiki/Bloom_filter#CITEREFFanCaoAlmeidaBroder1998http://en.wikipedia.org/wiki/Bloom_filter#CITEREFFanCaoAlmeidaBroder1998http://en.wikipedia.org/wiki/Arithmetic_overflowhttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFBonomiMitzenmacherPanigrahySingh2006http://en.wikipedia.org/wiki/Bloom_filter#CITEREFPutzeSandersSingler2007http://en.wikipedia.org/wiki/Bloom_filter#CITEREFPutzeSandersSingler2007http://en.wikipedia.org/wiki/Bloom_filter#CITEREFBonomiMitzenmacherPanigrahySingh2006http://en.wikipedia.org/wiki/Arithmetic_overflowhttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFFanCaoAlmeidaBroder1998http://en.wikipedia.org/wiki/Bloom_filter#CITEREFFanCaoAlmeidaBroder1998
  • 7/27/2019 Reconfigurable Accelerator for The

    21/46

    Data synchronization

    Bloom filters can be used for approximate data synchronization as in Byers et al. (2004).

    Counting Bloom filters can be used to approximate the number of differences between two sets

    and this approach is described in Agarwal & Trachtenberg (2006).

    Bloomier filters

    Chazelle et al. (2004) designed a generalization of Bloom filters that could associate a value with

    each element that had been inserted, implementing an associative array. Like Bloom filters, these

    structures achieve a small space overhead by accepting a small probability of false positives. In

    the case of "Bloomier filters", afalse positive is defined as returning a result when the key is not

    in the map. The map will never return the wrong value for a key that is in the map.

    Compact approximators

    Boldi & Vigna (2005) proposed a lattice-based generalization of Bloom filters. A compact

    approximator associates to each key an element of a lattice (the standard Bloom filters being

    the case of the Boolean two-element lattice). Instead of a bit array, they have an array of lattice

    elements. When adding a new association between a key and an element of the lattice, they

    compute the maximum of the current contents of the karray locations associated to the key with

    the lattice element. When reading the value associated to a key, they compute the minimum of

    the values found in the klocations associated to the key. The resulting value approximates from

    above the original value.

    Stable Bloom filters

    Deng & Rafiei (2006) proposed Stable Bloom filters as a variant of Bloom filters for streaming

    data. The idea is that since there is no way to store the entire history of a stream (which can be

    infinite), Stable Bloom filters continuously evict stale information to make room for more recent

    elements. Since stale information is evicted, the Stable Bloom filter introduces false negatives,

    which do not appear in traditional bloom filters. The authors show that a tight upper bound of

    false positive rates is guaranteed, and the method is superior to standard bloom filters in terms of

    false positive rates and time efficiency when a small space and an acceptable false positive rate

    are given.

    http://en.wikipedia.org/wiki/Data_synchronizationhttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFByersConsidineMitzenmacherRost2004http://en.wikipedia.org/wiki/Bloom_filter#CITEREFAgarwalTrachtenberg2006http://en.wikipedia.org/wiki/Bloom_filter#CITEREFChazelleKilianRubinfeldTal2004http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFBoldiVigna2005http://en.wikipedia.org/wiki/Lattice_(order)http://en.wikipedia.org/wiki/Bloom_filter#CITEREFDengRafiei2006http://en.wikipedia.org/wiki/Bloom_filter#CITEREFDengRafiei2006http://en.wikipedia.org/wiki/Lattice_(order)http://en.wikipedia.org/wiki/Bloom_filter#CITEREFBoldiVigna2005http://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFChazelleKilianRubinfeldTal2004http://en.wikipedia.org/wiki/Bloom_filter#CITEREFAgarwalTrachtenberg2006http://en.wikipedia.org/wiki/Bloom_filter#CITEREFByersConsidineMitzenmacherRost2004http://en.wikipedia.org/wiki/Data_synchronization
  • 7/27/2019 Reconfigurable Accelerator for The

    22/46

    Scalable Bloom filters

    Almeida et al. (2007) proposed a variant of Bloom filters that can adapt dynamically to the

    number of elements stored, while assuring a minimum false positive probability. The technique

    is based on sequences of standard bloom filters with increasing capacity and tighter false positive

    probabilities, so as to ensure that a maximum false positive probability can be set beforehand,

    regardless of the number of elements to be inserted.

    Attenuated Bloom filters

    An attenuated bloom filter of depth D can be viewed as an array of D normal bloom filters. In the

    context of service discovery in a network, each node stores regular and attenuated bloom filters

    locally. The regular or local bloom filter indicates which services are offered by the node itself.

    The attenuated filter of level i indicates which services can be found on nodes that are i-hopsaway from the current node. The i-th value is constructed by taking a union of local bloom filters

    for nodes i-hops away from the node.

    Let's take a small network shown on the graph below as an example. Say we are searching for a

    service A whose id hashes to bits 0,1, and 3 (pattern 11010). Let n1 node to be the starting point.

    First, we check whether service A is offered by n1 by checking its local filter. Since the patterns

    don't match, we check the attenuated bloom filter in order to determine which node should be the

    next hop. We see that n2 doesn't offer service A but lies on the path to nodes that do. Hence, we

    move to n2 and repeat the same procedure. We quickly find that n3 offers the service, and hence

    the destination is located.

    By using attenuated Bloom filters consisting of multiple layers, services at more than one hop

    distance can be discovered while avoiding saturation of the Bloom filter by attenuating (shifting

    out) bits set by sources further away.

    http://en.wikipedia.org/wiki/Bloom_filter#CITEREFAlmeidaBaqueroPreguicaHutchison2007http://en.wikipedia.org/wiki/File:AttenuatedBloomFilter.pnghttp://en.wikipedia.org/wiki/Bloom_filter#CITEREFAlmeidaBaqueroPreguicaHutchison2007
  • 7/27/2019 Reconfigurable Accelerator for The

    23/46

    HASH TABLE

    A small phone book as a hash table

    In computing, a hash table (also hash map) is a data structure used to implement an associative

    array, a structure that can map keys to values. A hash table uses a hash function to compute

    an index into an array ofbuckets orslots, from which the correct value can be found.

    Ideally, the hash function should assign each possible key to a unique bucket, but this ideal

    situation is rarely achievable in practice (unless the hash keys are fixed; i.e. new entries are never

    added to the table after it is created). Instead, most hash table designs assume thathashcollisionsdifferent keys that are assigned by the hash function to the same bucketwill occur

    and must be accommodated in some way.

    In a well-dimensioned hash table, the average cost (number ofinstructions) for each lookup is

    independent of the number of elements stored in the table. Many hash table designs also allow

    arbitrary insertions and deletions of key-value pairs, at (amortized[2]

    ) constant average cost per

    operation.[3][4]

    In many situations, hash tables turn out to be more efficient than search trees or any

    othertable lookup structure. For this reason, they are widely used in many kinds of

    computersoftware, particularly for associative arrays, database indexing, caches, and sets.

    Hashing

    http://en.wikipedia.org/wiki/Computinghttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Unique_keyhttp://en.wikipedia.org/wiki/Value_(computer_science)http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Instruction_(computer_science)http://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-3http://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-3http://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-3http://en.wikipedia.org/wiki/Search_treehttp://en.wikipedia.org/wiki/Table_(computing)http://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Database_indexhttp://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Set_(abstract_data_type)http://en.wikipedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svghttp://en.wikipedia.org/wiki/Set_(abstract_data_type)http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Database_indexhttp://en.wikipedia.org/wiki/Softwarehttp://en.wikipedia.org/wiki/Table_(computing)http://en.wikipedia.org/wiki/Search_treehttp://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-3http://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-3http://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Amortized_analysishttp://en.wikipedia.org/wiki/Instruction_(computer_science)http://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Value_(computer_science)http://en.wikipedia.org/wiki/Unique_keyhttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Associative_arrayhttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Computing
  • 7/27/2019 Reconfigurable Accelerator for The

    24/46

    Main article: Hash function

    The idea of hashing is to distribute the entries (key/value pairs) across an array of buckets. Given

    a key, the algorithm computes an index that suggests where the entry can be found:

    index = f(key, array_size)

    Often this is done in two steps:

    hash = hashfunc(key)

    index = hash % array_size

    In this method, the hash is independent of the array size, and it is then reducedto an index (a

    number between 0 and array_size 1) using the modulus operator (%).

    In the case that the array size is a power of two, the remainder operation is reduced to masking,

    which improves speed, but can increase problems with a poor hash function.

    Choosing a good hash function

    A good hash function and implementation algorithm are essential for good hash table

    performance, but may be difficult to achieve.

    A basic requirement is that the function should provide a uniform distribution of hash values. A

    non-uniform distribution increases the number of collisions and the cost of resolving them.

    Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically using

    statistical tests, e.g. a Pearson's chi-squared test for discrete uniform distributions[5]

    [6]

    The distribution needs to be uniform only for table sizes that occur in the application. In

    particular, if one uses dynamic resizing with exact doubling and halving ofs, the hash function

    needs to be uniform only whens is a power of two. On the other hand, some hashing algorithms

    provide uniform hashes only whens is a prime number.[7]

    Foropen addressing schemes, the hash function should also avoid clustering, the mapping of two

    or more keys to consecutive slots. Such clustering may cause the lookup cost to skyrocket, even

    http://en.wikipedia.org/wiki/Hash_functionhttp://en.wikipedia.org/wiki/Power_of_twohttp://en.wikipedia.org/wiki/Mask_(computing)http://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Discrete_uniform_distributionhttp://en.wikipedia.org/wiki/Hash_table#cite_note-chernoff-5http://en.wikipedia.org/wiki/Hash_table#cite_note-chernoff-5http://en.wikipedia.org/wiki/Hash_table#cite_note-plackett-6http://en.wikipedia.org/wiki/Hash_table#cite_note-plackett-6http://en.wikipedia.org/wiki/Hash_table#cite_note-plackett-6http://en.wikipedia.org/wiki/Power_of_twohttp://en.wikipedia.org/wiki/Prime_numberhttp://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-7http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-7http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-7http://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-7http://en.wikipedia.org/wiki/Prime_numberhttp://en.wikipedia.org/wiki/Power_of_twohttp://en.wikipedia.org/wiki/Hash_table#cite_note-plackett-6http://en.wikipedia.org/wiki/Hash_table#cite_note-chernoff-5http://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Discrete_uniform_distributionhttp://en.wikipedia.org/wiki/Uniform_distribution_(discrete)http://en.wikipedia.org/wiki/Mask_(computing)http://en.wikipedia.org/wiki/Power_of_twohttp://en.wikipedia.org/wiki/Hash_function
  • 7/27/2019 Reconfigurable Accelerator for The

    25/46

    if the load factor is low and collisions are infrequent. The popular multiplicative hash[3]

    is

    claimed to have particularly poor clustering behavior.[7]

    Cryptographic hash functions are believed to provide good hash functions for any table sizes,

    either by modulo reduction or by bit masking. They may also be appropriate if there is a risk ofmalicious users trying to sabotage a network service by submitting requests designed to generate

    a large number of collisions in the server's hash tables. However, the risk of sabotage can also be

    avoided by cheaper methods (such as applying a secret salt to the data, or using a universal hash

    function).

    Some authors claim that good hash functions should have the avalanche effect; that is, a single-

    bit change in the input key should affect, on average, half the bits in the output. Some popular

    hash functions do not have this property.[citation needed]

    Perfect hash function

    If all keys are known ahead of time, a perfect hash function can be used to create a perfect hash

    table that has no collisions. Ifminimal perfect hashing is used, every location in the hash table

    can be used as well.

    Perfect hashing allows forconstant time lookups in the worst case. This is in contrast to most

    chaining and open addressing methods, where the time for lookup is low on average, but may be

    very large (proportional to the number of entries) for some sets of keys.

    Key statistics

    A critical statistic for a hash table is called the load factor. This is simply the number of entries

    divided by the number of buckets, that is, n/kwhere n is the number of entries and kis the

    number of buckets.

    If the load factor is kept reasonable, the hash table should perform well, provided the hashing is

    good. If the load factor grows too large, the hash table will become slow, or it may fail to work

    (depending on the method used). The expected constant timeproperty of a hash table assumes

    that the load factor is kept below some bound. For afixednumber of buckets, the time for a

    lookup grows with the number of entries and so does not achieve the desired constant time.

    http://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-3http://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-3http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-7http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-7http://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-7http://en.wikipedia.org/wiki/Cryptographic_hash_functionhttp://en.wikipedia.org/wiki/Modulo_operationhttp://en.wikipedia.org/wiki/Mask_(computing)http://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Salt_(cryptography)http://en.wikipedia.org/wiki/Universal_hash_functionhttp://en.wikipedia.org/wiki/Universal_hash_functionhttp://en.wikipedia.org/wiki/Avalanche_effecthttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Perfect_hash_functionhttp://en.wikipedia.org/wiki/Perfect_hash_function#Minimal_perfect_hash_functionhttp://en.wikipedia.org/wiki/Constant_timehttp://en.wikipedia.org/wiki/Constant_timehttp://en.wikipedia.org/wiki/Constant_timehttp://en.wikipedia.org/wiki/Constant_timehttp://en.wikipedia.org/wiki/Perfect_hash_function#Minimal_perfect_hash_functionhttp://en.wikipedia.org/wiki/Perfect_hash_functionhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Avalanche_effecthttp://en.wikipedia.org/wiki/Universal_hash_functionhttp://en.wikipedia.org/wiki/Universal_hash_functionhttp://en.wikipedia.org/wiki/Salt_(cryptography)http://en.wikipedia.org/wiki/Denial_of_service_attackhttp://en.wikipedia.org/wiki/Mask_(computing)http://en.wikipedia.org/wiki/Modulo_operationhttp://en.wikipedia.org/wiki/Cryptographic_hash_functionhttp://en.wikipedia.org/wiki/Hash_table#cite_note-twang1-7http://en.wikipedia.org/wiki/Hash_table#cite_note-knuth-3
  • 7/27/2019 Reconfigurable Accelerator for The

    26/46

    Second to that, one can examine the variance of number of entries per bucket. For example, two

    tables both have 1000 entries and 1000 buckets; one has exactly one entry in each bucket, the

    other has all entries in the same bucket. Clearly the hashing is not working in the second one.

    A low load factor is not especially beneficial. As load factor approaches 0, the proportion ofunused areas in the hash table increases, but there is not necessarily any reduction in search cost.

    This results in wasted memory.

    Collision resolution

    Hash collisions are practically unavoidable when hashing a random subset of a large set of

    possible keys. For example, if 2,500 keys are hashed into a million buckets, even with a perfectly

    uniform random distribution, according to the birthday problem there is a 95% chance of at leasttwo of the keys being hashed to the same slot.

    Therefore, most hash table implementations have some collision resolution strategy to handle

    such events. Some common strategies are described below. All these methods require that the

    keys (or pointers to them) be stored in the table, together with the associated values.

    Separate chaining

    http://en.wikipedia.org/wiki/Collision_(computer_science)http://en.wikipedia.org/wiki/Birthday_problemhttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svghttp://en.wikipedia.org/wiki/Birthday_problemhttp://en.wikipedia.org/wiki/Collision_(computer_science)
  • 7/27/2019 Reconfigurable Accelerator for The

    27/46

    Hash collision resolved by separate chaining.

    In the method known asseparate chaining, each bucket is independent, and has some sort

    oflist of entries with the same index. The time for hash table operations is the time to find the

    bucket (which is constant) plus the time for the list operation. (The technique is also called open

    hashingorclosed addressing.)

    In a good hash table, each bucket has zero or one entries, and sometimes two or three, but rarely

    more than that. Therefore, structures that are efficient in time and space for these cases are

    preferred. Structures that are efficient for a fairly large number of entries are not needed or

    desirable. If these cases happen often, the hashing is not working well, and this needs to be fixed.

    Separate chaining with linked lists

    Chained hash tables with linked lists are popular because they require only basic data structures

    with simple algorithms, and can use simple hash functions that are unsuitable for other methods.

    The cost of a table operation is that of scanning the entries of the selected bucket for the desired

    key. If the distribution of keys is sufficiently uniform, the average cost of a lookup depends only

    on the average number of keys per bucketthat is, on the load factor.

    Chained hash tables remain effective even when the number of table entries n is much higher

    than the number of slots. Their performance degrades more gracefully (linearly) with the load

    factor. For example, a chained hash table with 1000 slots and 10,000 stored keys (load factor 10)

    is five to ten times slower than a 10,000-slot table (load factor 1); but still 1000 times faster than

    a plain sequential list, and possibly even faster than a balanced search tree.

    For separate-chaining, the worst-case scenario is when all entries are inserted into the same

    bucket, in which case the hash table is ineffective and the cost is that of searching the bucket data

    structure. If the latter is a linear list, the lookup procedure may have to scan all its entries, so the

    worst-case cost is proportional to the numbern of entries in the table.

    The bucket chains are often implemented as ordered lists, sorted by the key field; this choice

    approximately halves the average cost of unsuccessful lookups, compared to an unordered

    list[citation needed]

    . However, if some keys are much more likely to come up than others, an

    unordered list with move-to-front heuristic may be more effective. More sophisticated data

    structures, such as balanced search trees, are worth considering only if the load factor is large

    http://en.wikipedia.org/wiki/List_(abstract_data_type)http://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/SUHAhttp://en.wikipedia.org/wiki/Graceful_degradationhttp://en.wikipedia.org/wiki/Sequencehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Move-to-front_heuristichttp://en.wikipedia.org/wiki/Move-to-front_heuristichttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Sequencehttp://en.wikipedia.org/wiki/Graceful_degradationhttp://en.wikipedia.org/wiki/SUHAhttp://en.wikipedia.org/wiki/Linked_listhttp://en.wikipedia.org/wiki/List_(abstract_data_type)
  • 7/27/2019 Reconfigurable Accelerator for The

    28/46

    (about 10 or more), or if the hash distribution is likely to be very non-uniform, or if one must

    guarantee good performance even in a worst-case scenario. However, using a larger table and/or

    a better hash function may be even more effective in those cases.

    Chained hash tables also inherit the disadvantages of linked lists. When storing small keys andvalues, the space overhead of the next pointer in each entry record can be significant. An

    additional disadvantage is that traversing a linked list has poorcache performance, making the

    processor cache ineffective.

    Separate chaining with list heads

    Hash collision by separate chaining with head records in the bucket array.

    Some chaining implementations store the first record of each chain in the slot array itself.[4]

    The

    number of pointer traversals is decreased by one for most cases. The purpose is to increase cache

    efficiency of hash table access.

    The disadvantage is that an empty bucket takes the same space as a bucket with one entry. To

    save memory space, such hash tables often have about as many slots as stored entries, meaning

    that many slots have two or more entries.

    Separate chaining with other structures[edit source]

    Instead of a list, one can use any other data structure that supports the required operations. For

    example, by using a self-balancing tree, the theoretical worst-case time of common hash table

    http://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-4http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-4http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-4http://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=9http://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_LL.svghttp://en.wikipedia.org/wiki/Self-balancing_binary_search_treehttp://en.wikipedia.org/w/index.php?title=Hash_table&action=edit&section=9http://en.wikipedia.org/wiki/Hash_table#cite_note-cormen-4http://en.wikipedia.org/wiki/Locality_of_reference
  • 7/27/2019 Reconfigurable Accelerator for The

    29/46

    operations (insertion, deletion, lookup) can be brought down to O(log n) rather than O(n).

    However, this approach is only worth the trouble and extra memory cost if long delays must be

    avoided at all costs (e.g. in a real-time application), or if one must guard against many entries

    hashed to the same slot (e.g. if one expects extremely non-uniform distributions, or in the case of

    web sites or other publicly accessible services, which are vulnerable to malicious key

    distributions in requests).

    The variant called array hash table uses a dynamic array to store all the entries that hash to the

    same slot. Each newly inserted entry gets appended to the end of the dynamic array that is

    assigned to the slot. The dynamic array is resized in an exact-fitmanner, meaning it is grown

    only by as many bytes as needed. Alternative techniques such as growing the array by block

    sizes orpages were found to improve insertion performance, but at a cost in space. This variation

    makes more efficient use ofCPU caching and the translation lookaside buffer(TLB), because

    slot entries are stored in sequential memory positions. It also dispenses with the next pointers

    that are required by linked lists, which saves space. Despite frequent array resizing, space

    overheads incurred by operating system such as memory fragmentation, were found to be small.

    An elaboration on this approach is the so-called dynamic perfect hashing,[11]

    where a bucket that

    contains kentries is organized as a perfect hash table with k2

    slots. While it uses more memory

    (n2

    slots forn entries, in the worst case and n*kslots in the average case), this variant has

    guaranteed constant worst-case lookup time, and low amortized time for insertion.

    http://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/w/index.php?title=Array_hash_table&action=edit&redlink=1http://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Translation_lookaside_bufferhttp://en.wikipedia.org/wiki/Dynamic_perfect_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-11http://en.wikipedia.org/wiki/Hash_table#cite_note-11http://en.wikipedia.org/wiki/Hash_table#cite_note-11http://en.wikipedia.org/wiki/Hash_table#cite_note-11http://en.wikipedia.org/wiki/Dynamic_perfect_hashinghttp://en.wikipedia.org/wiki/Translation_lookaside_bufferhttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Dynamic_arrayhttp://en.wikipedia.org/w/index.php?title=Array_hash_table&action=edit&redlink=1http://en.wikipedia.org/wiki/Big_O_notation
  • 7/27/2019 Reconfigurable Accelerator for The

    30/46

    Open addressing

    Hash collision resolved by open addressing with linear probing (interval=1). Note that "Ted

    Baker" has a unique hash, but nevertheless collided with "Sandra Dee", that had previously

    collided with "John Smith".

    In another strategy, called open addressing, all entry records are stored in the bucket array itself.

    When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot

    and proceeding in someprobe sequence, until an unoccupied slot is found. When searching for

    an entry, the buckets are scanned in the same sequence, until either the target record is found, or

    an unused array slot is found, which indicates that there is no such key in the table.[12]

    The name

    "open addressing" refers to the fact that the location ("address") of the item is not determined by

    its hash value. (This method is also called closed hashing; it should not be confused with "open

    hashing" or "closed addressing" that usually mean separate chaining.)

    Well-known probe sequences include:

    Linear probing, in which the interval between probes is fixed (usually 1)

    http://en.wikipedia.org/wiki/Open_addressinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-12http://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-12http://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-12http://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_SP.svghttp://en.wikipedia.org/wiki/Linear_probinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-12http://en.wikipedia.org/wiki/Open_addressing
  • 7/27/2019 Reconfigurable Accelerator for The

    31/46

    Quadratic probing, in which the interval between probes is increased by adding thesuccessive outputs of a quadratic polynomial to the starting value given by the original hash

    computation

    Double hashing, in which the interval between probes is computed by another hash functionA drawback of all these open addressing schemes is that the number of stored entries cannot

    exceed the number of slots in the bucket array. In fact, even with good hash functions, their

    performance dramatically degrades when the load factor grows beyond 0.7 or so. Thus a more

    aggressive resize scheme is needed. Separate linking works correctly with any load factor,

    although performance is likely to be reasonable if it is kept below 2 or so. For many applications,

    these restrictions mandate the use of dynamic resizing, with its attendant costs.

    Open addressing schemes also put more stringent requirements on the hash function: besides

    distributing the keys more uniformly over the buckets, the function must also minimize the

    clustering of hash values that are consecutive in the probe order. Using separate chaining, the

    only concern is that too many objects map to thesame hash value; whether they are adjacent or

    nearby is completely irrelevant.

    Open addressing only saves memory if the entries are small (less than four times the size of a

    pointer) and the load factor is not too small. If the load factor is close to zero (that is, there are

    far more buckets than stored entries), open addressing is wasteful even if each entry is just two

    words.

    http://en.wikipedia.org/wiki/Quadratic_probinghttp://en.wikipedia.org/wiki/Double_hashinghttp://en.wikipedia.org/wiki/File:Hash_table_average_insertion_time.pnghttp://en.wikipedia.org/wiki/Double_hashinghttp://en.wikipedia.org/wiki/Quadratic_probing
  • 7/27/2019 Reconfigurable Accelerator for The

    32/46

    This graph compares the average number of cache misses required to look up elements in tables

    with chaining and linear probing. As the table passes the 80%-full mark, linear probing's

    performance drastically degrades.

    Open addressing avoids the time overhead of allocating each new entry record, and can be

    implemented even in the absence of a memory allocator. It also avoids the extra indirection

    required to access the first entry of each bucket (that is, usually the only one). It also has

    betterlocality of reference, particularly with linear probing. With small record sizes, these

    factors can yield better performance than chaining, particularly for lookups.

    Hash tables with open addressing are also easier to serialize, because they do not use pointers.

    On the other hand, normal open addressing is a poor choice for large elements, because these

    elements fill entire CPU cachelines (negating the cache advantage), and a large amount of space

    is wasted on large empty table slots. If the open addressing table only stores references to

    elements (external storage), it uses space comparable to chaining even for large records but loses

    its speed advantage.

    Generally speaking, open addressing is better used for hash tables with small records that can be

    stored within the table (internal storage) and fit in a cache line. They are particularly suitable for

    elements of one word or less. If the table is expected to have a high load factor, the records are

    large, or the data is variable-sized, chained hash tables often perform as well or better.

    Ultimately, used sensibly, any kind of hash table algorithm is usually fast enough; and the

    percentage of a calculation spent in hash table code is low. Memory usage is rarely considered

    excessive. Therefore, in most cases the differences between these algorithms are marginal, and

    other considerations typically come into play.[citation needed]

    Coalesced hashing

    A hybrid of chaining and open addressing, coalesced hashing links together chains of nodes

    within the table itself.[12]

    Like open addressing, it achieves space usage and (somewhat

    diminished) cache advantages over chaining. Like chaining, it does not exhibit clustering effects;

    in fact, the table can be efficiently filled to a high density. Unlike chaining, it cannot have more

    elements than table slots.

    http://en.wikipedia.org/wiki/Locality_of_referencehttp://en.wikipedia.org/wiki/Serializationhttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Coalesced_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-12http://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-12http://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-12http://en.wikipedia.org/wiki/Hash_table#cite_note-tenenbaum90-12http://en.wikipedia.org/wiki/Coalesced_hashinghttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Serializationhttp://en.wikipedia.org/wiki/Locality_of_reference
  • 7/27/2019 Reconfigurable Accelerator for The

    33/46

    Cuckoo hashing

    Another alternative open-addressing solution is cuckoo hashing, which ensures constant lookup

    time in the worst case, and constant amortized time for insertions and deletions. It uses two or

    more hash functions, which means any key/value pair could be in two or more locations. For

    lookup, the first hash function is used; if the key/value is not found, then the second hash

    function is used, and so on. If a collision happens during insertion, then the key is re-hashed with

    the second hash function to map it to another bucket. If all hash functions are used and there is

    still a collision, then the key it collided with is removed to make space for the new key, and the

    old key is re-hashed with one of the other hash functions, which maps it to another bucket. If that

    location also results in a collision, then the process repeats until there is no collision or the

    process traverses all the buckets, at which point the table is resized. By combining multiple hash

    functions with multiple cells per bucket, very high space utilisation can be achieved.

    Robin Hood hashing

    One interesting variation on double-hashing collision resolution is Robin Hood hashing.[13]

    The

    idea is that a new key may displace a key already inserted, if its probe count is larger than that of

    the key at the current position. The net effect of this is that it reduces worst case search times in

    the table. This is similar to Knuth's ordered hash tables except that the criterion for bumping a

    key does not depend on a direct relationship between the keys. Since both the worst case and the

    variation in the number of probes is reduced dramatically, an interesting variation is to probe the

    table starting at the expected successful probe value and then expand from that position in both

    directions.[14]

    External Robin Hashing is an extension of this algorithm where the table is stored

    in an external file and each table position corresponds to a fixed-sized page or bucket

    withB records.[15]

    2-choice hashing

    2-choice hashing employs 2 different hash functions, h1(x) and h2(x), for the hash table. Both

    hash functions are used to compute two table locations. When an object is inserted in the table,

    then it is placed in the table location that contains fewer objects (with the default being the h1(x)

    table location if there is equality in bucket size). 2-choice hashing employs the principle of

    thepower of two choices.

    http://en.wikipedia.org/wiki/Cuckoo_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-13http://en.wikipedia.org/wiki/Hash_table#cite_note-13http://en.wikipedia.org/wiki/Hash_table#cite_note-13http://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/wiki/2-choice_hashinghttp://en.wikipedia.org/w/index.php?title=Power_of_two_choices&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Power_of_two_choices&action=edit&redlink=1http://en.wikipedia.org/wiki/2-choice_hashinghttp://en.wikipedia.org/wiki/Hash_table#cite_note-15http://en.wikipedia.org/wiki/Hash_table#cite_note-14http://en.wikipedia.org/wiki/Hash_table#cite_note-13http://en.wikipedia.org/wiki/Cuckoo_hashing
  • 7/27/2019 Reconfigurable Accelerator for The

    34/46

    Hopscotch hashing

    Another alternative open-addressing solution is hopscotch hashing,[16]

    which combines the

    approaches ofcuckoo hashing and linear probing, yet seems in general to avoid their limitations.

    In particular it works well even when the load factor grows beyond 0.9. The algorithm is well

    suited for implementing a resizable concurrent hash table.

    The hopscotch hashing algorithm works by defining a neighborhood of buckets near the original

    hashed bucket, where a given entry is always found. Thus, search is limited to the number of

    entries in this neighborhood, which is logarithmic in the worst case, constant on average, and

    with proper alignment of the neighborhood typically requires one cache miss. When inserting an

    entry, one first attempts to add it to a bucket in the neighborhood. However, if all buckets in this

    neighborhood are occupied, the algorithm traverses buckets in sequence until an open slot (an

    unoccupied bucket) is found (as in linear probing). At that point, since the empty bucket is

    outside the neighborhood, items are repeatedly displaced in a sequence of hops. (This is similar

    to cuckoo hashing, but with the difference that in this case the empty slot is being moved into the

    neighborhood, instead of items being moved out with the hope of eventually finding an empty

    slot.) Each hop brings the open slot closer to the original neighborhood, without invalidating the