randomized algorithms cs648

31
Randomized Algorithms CS648 Lecture 12 Hashing - II 1

Upload: terris

Post on 08-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Randomized Algorithms CS648. Lecture 12 Hashing - II. Recap of Last Lecture. Problem Definition. called universe and Examples: , Aim Given a set , build a data structure storing s.t. we can answer in O ( 1 ) time : “ Does ?” for any given. Hashing. Hash table: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Randomized Algorithms CS648

Randomized AlgorithmsCS648

Lecture 12Hashing - II

1

Page 2: Randomized Algorithms CS648

RECAP OF LAST LECTURE

Page 3: Randomized Algorithms CS648

Problem Definition• called universe• and • Examples: ,

Aim Given a set , build a data structure storing s.t. we can answer in O(1) time :

“Does ?” for any given .

Page 4: Randomized Algorithms CS648

Hashing• Hash table: : an array of size .• Hash function : Answering a Query: “Does ?” 1. ;2. Search the list stored at .

Properties of :• computable in O(1) time. • Space required by : O(1).

0 1

𝑻

How many bits needed to encode ?

Elements of

Page 5: Randomized Algorithms CS648

CollisionDefinition: Two elements are said to collide under hash function if

Worst case time complexity of searching an item : No. of elements in colliding with .

0 1

𝑻

Page 6: Randomized Algorithms CS648

Universal Hash Family

Definition: A collection of hash-functions is said to be universal if there exists a constant such that for any ,

This definition appears strange in the beginning! But we shall soon see that there is a very natural way to arrive at this definition.

Page 7: Randomized Algorithms CS648

Perfect hashing using O() space

Let be Universal Hash Family. Let : the number of collisions for when ? Question: What is ?

Page 8: Randomized Algorithms CS648

Perfect hashing using O() space

Let be Universal Hash Family. Let : the number of collisions for when ? Lemma1:Lemma2:For , there will be no collision with probability at least .

Algorithm1: Perfect hashing for Fix ;Repeat1. Pick ;2. the number of collisions for under .Until .Build the hash table.

Theorem: A perfect hash function can be computed for in expected O() time.

Page 9: Randomized Algorithms CS648

HASHING WITH OPTIMAL SPACE AND WORST CASE O(1) SEARCH TIME

Page 10: Randomized Algorithms CS648

Optimal space hashing with worst case O(1) search time

be Universal Hash Family. : no. of collisions for when ? Lemma1:.

Question: What is ] when = ?Answer: .

Page 11: Randomized Algorithms CS648

Optimal space hashing with worst case O(1) search time

be Universal Hash Family. : no. of collisions for when ? Lemma1: when .Algorithm:Fix ;Repeat1. Pick ;2. no. of collisions for under ;Until ;Build the hash table; //primary hash table

For each If size of list > 1 1. Build a perfect hash table for list ; 2. Make point to this hash table;

0 1

𝑻

Page 12: Randomized Algorithms CS648

Optimal space hashing with worst case O(1) search time

be Universal Hash Family. : no. of collisions for when ? Lemma1: when .Algorithm:Fix ;Repeat1. Pick ;2. no. of collisions for under ;Until ;Build the hash table; //primary hash table

For each If size of list > 1 1. Build a perfect hash table for list ; 2. Make point to this hash table;

0 1

𝑻

Page 13: Randomized Algorithms CS648

Optimal space hashing with worst case O(1) search time

be Universal Hash Family. : no. of collisions for when ? Lemma1: when .Algorithm:Fix ;Repeat1. Pick ;2. no. of collisions for under ;Until ;Build the hash table; //primary hash table

For each If size of list > 1 1. Build a perfect hash table for list ; 2. Make point to this hash table;

0 1

𝑻

Page 14: Randomized Algorithms CS648

Optimal space hashing with worst case O(1) search time

be Universal Hash Family. : no. of collisions for when ? Lemma1: when .Algorithm:Fix ;Repeat1. Pick ;2. no. of collisions for under ;Until ;Build the hash table; //primary hash table

For each If size of list > 1 1. Build a perfect hash table for list ; 2. Make point to this hash table;

𝑻 0 1

Page 15: Randomized Algorithms CS648

be Universal Hash Family. : no. of collisions for when ? Lemma1: when .

Let : number of elements in []Extra Space required: = = +

𝑻𝑻

0 1 2

. . .

0 1 2

. .

.

Is there any relation between and ’s?

Page 16: Randomized Algorithms CS648

Theorem: A given set can be preprocessed in expected O() time to build a data structure (2-level hash table) of O() size such that any search query can be answer in worst case O(1) time.

Page 17: Randomized Algorithms CS648

WHY SUCH A DEFINITION FOR UNIVERSAL HASH FAMILY ?

Page 18: Randomized Algorithms CS648

Why does hashing work so well in Practice ?

A simple hash function: .• works so well in practice because the set is usually a uniformly random

subset of . As a result

• It is easy to fool this hash function such that it achieves O(s) search time.

This makes us think:“Can we achieve expected O(1) search time for any given set .”

similar question while Quick Sort Randomized Quick Sort

Page 19: Randomized Algorithms CS648

Universal Hash Family

A simple hash function: .

Definition: A collection of hash-functions is said to be universal if there exists a constant such that for any ,

Page 20: Randomized Algorithms CS648

A SIMPLE AND COMPACT UNIVERSAL HASH FAMILY

Page 21: Randomized Algorithms CS648

The starting point The simple hash function: .

Problem: Two elements in are bound to collide if divides || .

Is there some operation which when applied over any distributes || randomly

uniformly over [0,1,…, ] ?

Page 22: Randomized Algorithms CS648

mod operation : a non-negative integer : a positive integer mod {0,1,…,}.

Question: How is | mod | related to ||mod ?Consider some Examples: • | 55 mod 31 43 mod 31 | = ?? and | 55 43| mod 31 = ??

• | 91 mod 31 102 mod 31 | = ?? and | 91 102| mod 31 = ??

Answer: Let = || mod . Then | mod | = ??

12 12

20 11

{, }

Page 23: Randomized Algorithms CS648

mod operation : a prime number: {}Consider any .Question: What can we say about set = { } ?Example: , .

1 2 3 4 5 6

mod 3 6 2 5 1 4

Page 24: Randomized Algorithms CS648

mod operation : a prime number: {}Consider any .Question: What can we say about set = { } ?Example: , .

Fact: = for all .Proof: = divides divides divides or divides

1 2 3 4 5 6

mod mod

3 6 2 5 1 44 1 5 2 6 3

Not possible

Page 25: Randomized Algorithms CS648

mod operation : a prime number: {}Consider any .Define set = { } ?Fact: = for all .

Question: If , then what can we say about ?Answer: distributed randomly uniformly over .

Can you now see, that the above answer plays the key role in formulating the hash function ?

Page 26: Randomized Algorithms CS648

Good fact: An element is mapped to a random element in {}.

Slightly bad fact :Once element is mapped to a location, the mapping of is no more random.

So it is not clear whether| - | is mapped uniformly randomly over {0,…, }.…So let us see () a bit more closely…

12

.

.

.

𝑖

𝑖𝑥𝐦𝐨𝐝𝑝

𝑖+Δ

Page 27: Randomized Algorithms CS648

Probability of collision between and

Let

and will collide under if |mod mod | is divisible by .

Question: What is relation between |mod mod | and mod ?

Answer: |mod mod | is either mod or .

Page 28: Randomized Algorithms CS648

Probability of collision between and

Let Lemma: If and collide under , then either mod is divisible by or is divisible by .

{mod | } = ??

Let .Probability of collision between and = P(mod is divisible of or is divisible by ) 2 P(mod is divisible of )=

{,…, }Students must

realize that it is a necessary condition

and not sufficient condition for

collision. To get an idea, study the

example given at the last slide of this

lecture.

Page 29: Randomized Algorithms CS648

Theorem: Let , then H={| } is universal.

Page 30: Randomized Algorithms CS648

Example

, .Observe that =1Question: How many collisions between nd ?Answer: two (for =3,4).Here for =4.And for =3

Answer: No collisions! (although for here.)

1 2 3 4 5 6

2 4 6 1 3 5

3 6 2 5 1 4

4 1 5 2 6 3

5 3 1 6 4 2

6 5 4 3 2 1

1 2 3 4 5 6

123456

Table storing

Page 31: Randomized Algorithms CS648

Homework:

Let , Then prove that H={| } is universal. In particular, show that for any ,

Hence it is slightly better than the hash family discussed just now.