the bloom paradox ori rottenstreich joint work with isaac keslassy technion, israel

28
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

Upload: louisa-waters

Post on 18-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

The Bloom Paradox

Ori Rottenstreich

Joint work with Isaac Keslassy

Technion, Israel

Page 2: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Requirement: A data structure in user with fast answer to• Solutions:

o O(n) – Searching in a listo O(log(n)) – Searching in a sorted listo O(1) – But with false positives / negatives

Slocal cache

Problem Definition

2

Mcentral memory with

all elements

vuzyxzx

x

usercost = 10

cost = 1x

y

cost = 10

y

user

y

Page 3: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• False Positive: but the data structure answers

• Results in a redundant access to the local cache.

Additional cost of 1.

• False Negative: but the data structure answers

• Results in an expensive access to the central memory instead of the local cache.

Additional cost of 10-1=9.

Two Possible Errors

3

x

y

Page 4: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

1

• Initialization: Array of zero bits.

• Insertion: Each of the elements is hashed times, the corresponding bits are set.

• Query: Hashing the element, checking that all bits are set.

• False positive rate (probability) of .

• No false negatives.

Bloom Filters (Bloom, 1970)

4

0000000000 00

1

y1 1

0000000000 00

1 1

z

x11

1 1

1 11 1 1

x11 1 w

1 11

Page 5: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Cache/Memory Framework• Packet Classification• Intrusion Detection• Routing• Accounting• Beyond networking: Spell Checking, DNA Classification

• Can be found in o Google's web browser Chromeo Google's database system BigTableo Facebook's distributed storage system Cassandrao Mellanox's IB Switch System

Bloom Filters are Widely Used

5

Page 6: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

The Bloom Paradox

6

Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it,

thus making the Bloom filter useless.

Page 7: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

Outline

Introduction to Bloom Filters The Bloom Paradox

o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

Summary

7

Page 8: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Parameters:

• Extreme case without locality: All elements with equal probability of

belonging to the cache.o Toy example

Bloom Paradox Example

8

Bloom filter

Page 9: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives in Bloom filter

• Intuition:

Slocal cache

Mcentral memory with

all elements

vuzyxzx

cost = 10cost = 1

cost = 10

Bloom Paradox Example

. .

userBBloom filterBloom filter

9

Page 10: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives in Bloom filter

• Surprise:

cost = 1

Slocal cache

Mcentral memory with

all elements

vuzyxzx

cost = 10

cost = 10

Bloom Paradox Example

. . 9

BBloom filter

Page 11: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives in Bloom filter

• Surprise:

The Bloom filter indicates the membership of

elements. Only of them are indeed in .

Bloom Paradox Example

. .

BBloom filter

Page 12: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• When the Bloom filter states that , it is wrong with probability

• Average cost if we listen to the Bloom filter:

• Average cost if we don’t:

The Bloom filter is useless!

Bloom Paradox Example

11

Don’t listen to the Bloom filter

= =

Page 13: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

Outline

Introduction to Bloom Filters The Bloom Paradox

o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

Summary

12

Page 14: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• The cost of a false positive : 1• The cost of a false negative :

• In the cache example:

Costs of the Two Possible Errors

13

Page 15: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is small

Conditions for the Bloom Paradox

14

localcache

Bloom filter

central memory

Page 16: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is smallo is large (i.e. is small)

Conditions for the Bloom Paradox

14central memory

localcache

Bloom filter

Page 17: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is small o is large (i.e. is small)o is small (because the Bloom filter implicitly assumes )

Conditions for the Bloom Paradox

14

Bloom filtercentral memory

localcache

Page 18: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is small o is large (i.e. is small)o is small (because the Bloom filter implicitly assumes )

• Theorem 1:The Bloom paradox occurs if and only if

• Boundaries of the Bloom Paradox: (for )

Conditions for the Bloom Paradox

14

If and the Bloom paradox occurs if

Page 19: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Theorem 1:The Bloom paradox occurs if and only if

Bloom Filter Improvements

15

• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be

useful

Bloom filtercentral memory

localcache

Page 20: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Theorem 1:The Bloom paradox occurs if and only if

Bloom Filter Improvements

15

• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be

useful

Bloom filtercentral memory

localcache

Page 21: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

Outline

Introduction to Bloom Filters The Bloom Paradox

o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

Summary

16

Page 22: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

1

• Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives.

• The solution: Counting Bloom filters - Storing array of counters instead of bits.o Insertion: Incrementing counters by one.o Deletion: Decrementing counters by one. o Query: Checking that counters are positive.

• The same false positive probability.• Require too much memory, e.g. 57 bits per element for .

Counting Bloom Filters (CBFs)

y+1 +1

0102001010 01

+1 +1x

+1+1

0000001010 00

x11 111

Page 23: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Queryo Checking that counters are positive.

o Question: Which is more likely to be correct? y or z?

Counting Bloom Filter Query

18

0381052010 12

zy

y

Page 24: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Theorem 2:Let denote the values of the counters pointed by the

set of hash functions. Then,

19

The Bloom Paradox in the Counting Bloom Filter

Only counters product matters!

Page 25: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Parameters: n=3328, m = 28485, k=6 20

CBF Based Membership Probability

-Before checking CBF, a priori membership probability = ≈ 0.03-CBF indicates counters product=8 a posteriori membership probability ≈ 0.69

Page 26: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Internet trace (equinix-chicago) with real hash functions.

Counting Bloom filter parameters: n=210, m / n = 30, k=5, 220

queries

21

Experimental Results

Page 27: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

• Discovery of the Bloom paradox

• Importance of the a priori membership probability

• Using the counters product to estimate the correctness of a positive indication of the CBF

Concluding Remarks

22

Page 28: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

Thank You