lecture 11-cs648-2013 randomized algorithms

24
Randomized Algorithms CS648 Lecture 11 Hashing - I 1

Upload: anshul-yadav

Post on 26-Jun-2015

69 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Lecture 11-cs648-2013 Randomized Algorithms

Randomized AlgorithmsCS648

Lecture 11Hashing - I

1

Page 2: Lecture 11-cs648-2013 Randomized Algorithms

Problem Definition• called universe• and •

Examples: ,

Aim Maintain a data structure for storing to support the search query :

“Does ?” for any given .

Page 3: Lecture 11-cs648-2013 Randomized Algorithms

Solutions

Solutions with worst case guaranteesSolution for static : • Array storing in sorted orderSolution for dynamic : • Height Balanced Search trees (AVL trees, Red-Black trees,…)Time per operation: O(), Space: O()

Alternative: Time per operation: O(), Space: O()

Solutions used in practice with no worst case guaranteesHashing.

Page 4: Lecture 11-cs648-2013 Randomized Algorithms

Hashing• Hash table: : an array of size .• Hash function : Answering a Query: “Does ?” 1. ;2. Search the list stored at .

Properties of :• computable in O(1) time. • Space required by : O(1).

0 1

𝑻

How many bits needed to encode ?

Elements of

Page 5: Lecture 11-cs648-2013 Randomized Algorithms

CollisionDefinition: Two elements are said to collide under hash function if

Worst case time complexity of searching an item : No. of elements in colliding with .

A Discouraging fact: No hash function can be found which is good for all .Proof: At least elements from are mapped to a single index in .

0 1

𝑻

Page 6: Lecture 11-cs648-2013 Randomized Algorithms

CollisionDefinition: Two elements are said to collide under hash function if

Worst case time complexity of searching an item : No. of elements in colliding with .

A Discouraging fact: No hash function can be found which is good for all .Proof: At least elements from are mapped to a single index in .

0 1

𝑻

𝑚/𝑛

Page 7: Lecture 11-cs648-2013 Randomized Algorithms

Hashing• A very popular heuristic since 1950’s• Achieves O(1) search time in practice• Worst case guarantee on search time: O()

Question: Can we have a hashing ensuring • O(1) worst case guarantee on search time.• O() space.• Expected O() preprocessing time.

The following result gave an answer in affirmativeMichael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3), 1984.

Page 8: Lecture 11-cs648-2013 Randomized Algorithms

WHY DOES HASHING WORK SO WELL IN PRACTICE ?

Page 9: Lecture 11-cs648-2013 Randomized Algorithms

Why does hashing work so well in Practice ?

Question: What is the simplest hash function : ?Answer:

Hashing works so well in practice because the set is usually a uniformly random subset of .

Let us give a theoretical reasoning for this fact.

Page 10: Lecture 11-cs648-2013 Randomized Algorithms

Why does hashing work so well in Practice ?

Let denote elements selected randomly uniformly from to form .Question: What is expected number of elements colliding with ?Answer: Let takes value .P( collides with ) = ??

12

m

𝑖

𝑖−𝑛

𝑖+𝑛

𝑖+2𝑛

𝑖+3𝑛⋮

How many possible values can take ?

𝑚−1How many possible values

can collide with ?

Page 11: Lecture 11-cs648-2013 Randomized Algorithms

Why does hashing work so well in Practice ?

Let denote elements selected randomly uniformly from to form .Question: What is expected number of elements colliding with ?Answer: Let takes value .P( collides with ) = Expected number of elements of colliding with = for

12

m

𝑖

𝑖−𝑛

𝑖+𝑛

𝑖+2𝑛

𝑖+3𝑛⋮

Values which may collide with

under the hash function

Page 12: Lecture 11-cs648-2013 Randomized Algorithms

Why does hashing work so well in Practice ?

Conclusion

1. works so well because for a uniformly random subset of , the expected number of collision at an index of is O(1).

2. It is easy to fool this hash function such that it achieves O(s) search time. (do it as a simple exercise).

This makes us think:

“How can we achieve worst case O(1) search time for a given set .”

Page 13: Lecture 11-cs648-2013 Randomized Algorithms

HOW TO ACHIEVE WORST CASE O(1) SEARCH TIME

Page 14: Lecture 11-cs648-2013 Randomized Algorithms

Key idea to achieve worst case O(1) search time

Observation: Of course, no single hash function is good for every possible . But we may strive for a hash function which is good for a given .

A promising direction: Find out a set of hash functions H such that • For any given , many of them are good.

• Select a function randomly from H and try for .

The notion of goodness is captured formally by Universal hash family in

the following slide.

Page 15: Lecture 11-cs648-2013 Randomized Algorithms

UNIVERSAL HASH FAMILY

Page 16: Lecture 11-cs648-2013 Randomized Algorithms

Universal Hash Family

Definition: A collection of hash-functions is said to be universal if there exists a constant such that for any ,

Fact: Set of all functions from to is a universal hash family (do it as homework). Question: Can we use the set of all functions as universal hash family in real life ? Answer: No.• There are possible functions. • Every pair of them must differ in at least one bit. At least one of them will require bits to encode. So the space occupied by a randomly chosen hash function is too large .Question: Does there exist a Universal hash family whose hash functions have a compact encoding?

Page 17: Lecture 11-cs648-2013 Randomized Algorithms

Universal Hash Family

Definition: A collection of hash-functions is said to be universal if there exists a constant such that for any ,

There indeed exist many c-Universal hash families with compact hash function

Example: Let : defined as

is -universal.

This looks complicated. In the next class we shall show that it is very natural and intuitive. For today’s

lecture, you don’t need it

Page 18: Lecture 11-cs648-2013 Randomized Algorithms

STATIC HASHING WORST CASE O(1) SEARCH TIME

Page 19: Lecture 11-cs648-2013 Randomized Algorithms

The Journey

One Milestone in Our Journey:• A perfect hash function using hash table of size O()

Tools Needed:• Universal Hash Family where is a small constant• Elementary Probability

Page 20: Lecture 11-cs648-2013 Randomized Algorithms

Perfect hashing using O() space

Let be Universal Hash Family. Let : the number of collisions for when ? Question: What is ?

Page 21: Lecture 11-cs648-2013 Randomized Algorithms

Perfect hashing using O() space

Let be Universal Hash Family. Let : the number of collisions for when ? Lemma1:

Question: How large should be to achieve no collision ?

Question: How large should be to achieve ? Answer: Pick .

Page 22: Lecture 11-cs648-2013 Randomized Algorithms

Perfect hashing using O() space

Let be Universal Hash Family. Let : the number of collisions for when ? Lemma1:

Observation: when .

Question: What is the probability of no collision when ? Answer: “No collision” “”P(No collision ) = P() = P()

Use Markov’s Inequality to bound it.

Page 23: Lecture 11-cs648-2013 Randomized Algorithms

Perfect hashing using O() space

Let be Universal Hash Family. Lemma2:For , there will be no collision with probability at least .

Algorithm1: Perfect hashing for Repeat1. Pick ;2. the number of collisions for under .Until .

Theorem: A perfect hash function can be computed for in expected O() time.

Corollary: A hash table occupying O() space and worst case O() search time.

Page 24: Lecture 11-cs648-2013 Randomized Algorithms

Hashing with O() space and O(1) worst case search time

We have completed almost 90% of our journey.To achieve the goal of O() space and worst case O() search time, here is the sketch (the details will be given in the beginning of the next class)• Use the same hashing scheme as used in Algorithm1 except that use O(). • Of course, there will be collisions. Use an additional level of hash tables to

take care of collisions.

In the next class:• We shall complete our algorithm for hashing with O() space and

O(1) worst case search time• We shall present a very natural way to design various Universal Hash

Families.