the bloom paradox ori rottenstreich joint work with isaac keslassy technion, israel

Post on 18-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Bloom Paradox

Ori Rottenstreich

Joint work with Isaac Keslassy

Technion, Israel

• Requirement: A data structure in user with fast answer to• Solutions:

o O(n) – Searching in a listo O(log(n)) – Searching in a sorted listo O(1) – But with false positives / negatives

Slocal cache

Problem Definition

2

Mcentral memory with

all elements

vuzyxzx

x

usercost = 10

cost = 1x

y

cost = 10

y

user

y

• False Positive: but the data structure answers

• Results in a redundant access to the local cache.

Additional cost of 1.

• False Negative: but the data structure answers

• Results in an expensive access to the central memory instead of the local cache.

Additional cost of 10-1=9.

Two Possible Errors

3

x

y

1

• Initialization: Array of zero bits.

• Insertion: Each of the elements is hashed times, the corresponding bits are set.

• Query: Hashing the element, checking that all bits are set.

• False positive rate (probability) of .

• No false negatives.

Bloom Filters (Bloom, 1970)

4

0000000000 00

1

y1 1

0000000000 00

1 1

z

x11

1 1

1 11 1 1

x11 1 w

1 11

• Cache/Memory Framework• Packet Classification• Intrusion Detection• Routing• Accounting• Beyond networking: Spell Checking, DNA Classification

• Can be found in o Google's web browser Chromeo Google's database system BigTableo Facebook's distributed storage system Cassandrao Mellanox's IB Switch System

Bloom Filters are Widely Used

5

The Bloom Paradox

6

Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it,

thus making the Bloom filter useless.

Outline

Introduction to Bloom Filters The Bloom Paradox

o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

Summary

7

• Parameters:

• Extreme case without locality: All elements with equal probability of

belonging to the cache.o Toy example

Bloom Paradox Example

8

Bloom filter

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives in Bloom filter

• Intuition:

Slocal cache

Mcentral memory with

all elements

vuzyxzx

cost = 10cost = 1

cost = 10

Bloom Paradox Example

. .

userBBloom filterBloom filter

9

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives in Bloom filter

• Surprise:

cost = 1

Slocal cache

Mcentral memory with

all elements

vuzyxzx

cost = 10

cost = 10

Bloom Paradox Example

. . 9

BBloom filter

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives in Bloom filter

• Surprise:

The Bloom filter indicates the membership of

elements. Only of them are indeed in .

Bloom Paradox Example

. .

BBloom filter

• When the Bloom filter states that , it is wrong with probability

• Average cost if we listen to the Bloom filter:

• Average cost if we don’t:

The Bloom filter is useless!

Bloom Paradox Example

11

Don’t listen to the Bloom filter

= =

Outline

Introduction to Bloom Filters The Bloom Paradox

o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

Summary

12

• The cost of a false positive : 1• The cost of a false negative :

• In the cache example:

Costs of the Two Possible Errors

13

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is small

Conditions for the Bloom Paradox

14

localcache

Bloom filter

central memory

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is smallo is large (i.e. is small)

Conditions for the Bloom Paradox

14central memory

localcache

Bloom filter

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is small o is large (i.e. is small)o is small (because the Bloom filter implicitly assumes )

Conditions for the Bloom Paradox

14

Bloom filtercentral memory

localcache

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is small o is large (i.e. is small)o is small (because the Bloom filter implicitly assumes )

• Theorem 1:The Bloom paradox occurs if and only if

• Boundaries of the Bloom Paradox: (for )

Conditions for the Bloom Paradox

14

If and the Bloom paradox occurs if

• Theorem 1:The Bloom paradox occurs if and only if

Bloom Filter Improvements

15

• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be

useful

Bloom filtercentral memory

localcache

• Theorem 1:The Bloom paradox occurs if and only if

Bloom Filter Improvements

15

• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be

useful

Bloom filtercentral memory

localcache

Outline

Introduction to Bloom Filters The Bloom Paradox

o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

Summary

16

1

• Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives.

• The solution: Counting Bloom filters - Storing array of counters instead of bits.o Insertion: Incrementing counters by one.o Deletion: Decrementing counters by one. o Query: Checking that counters are positive.

• The same false positive probability.• Require too much memory, e.g. 57 bits per element for .

Counting Bloom Filters (CBFs)

y+1 +1

0102001010 01

+1 +1x

+1+1

0000001010 00

x11 111

• Queryo Checking that counters are positive.

o Question: Which is more likely to be correct? y or z?

Counting Bloom Filter Query

18

0381052010 12

zy

y

• Theorem 2:Let denote the values of the counters pointed by the

set of hash functions. Then,

19

The Bloom Paradox in the Counting Bloom Filter

Only counters product matters!

• Parameters: n=3328, m = 28485, k=6 20

CBF Based Membership Probability

-Before checking CBF, a priori membership probability = ≈ 0.03-CBF indicates counters product=8 a posteriori membership probability ≈ 0.69

• Internet trace (equinix-chicago) with real hash functions.

Counting Bloom filter parameters: n=210, m / n = 30, k=5, 220

queries

21

Experimental Results

• Discovery of the Bloom paradox

• Importance of the a priori membership probability

• Using the counters product to estimate the correctness of a positive indication of the CBF

Concluding Remarks

22

Thank You

top related