randomized algorithms cs648ย ยท 2019-03-13ย ยท bernoulli random variable bernoulli random variable:...
TRANSCRIPT
Randomized Algorithms CS648
Lecture 7
โข Tools for
P[๐ฟ > (๐ + ๐น)๐[๐ฟ]]
โข Hashing
1
How to show
P[๐ฟ > (๐ + ๐)๐[๐ฟ]] ?
2
Tools
โข Markovโs Inequality
โข Chebyshevโs Inequality
โข Chernoff bound
Markovโs Inequality Theorem: Suppose ๐ฟ is a random variable defined over a probability space (๐,P)
such that ๐ฟ(ฯ) โฅ 0 for each ฯ ฯต ๐.
Then for any positive real number ๐,
P(๐ฟโฅ๐) โค๐ฌ[๐ฟ]
๐
Important points: โข Applicable only for a nonnegative random variable.
โข Makes sense only for ๐ > ๐ฌ[๐ฟ].
โข Applied only for getting a bound of the probability of event โ๐ฟโฅ๐โ ,
(canโt be used for โ๐ฟ โค ๐โ)
โข gives very loose bound and so not useful most of the times.
โข Plays a key role in proving other stronger inequalities
(Chernoff bound, Chebyshevโs Inequality)
4
Bernoulli Random Variable
Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐
if it takes value 1 with probability ๐ and takes value 0 with probability 1 โ ๐.
The corresponding random experiment is usually called a Bernoulli trial.
Example:
Tossing a coin (of HEADS probability= ๐) once,
HEADS corresponds to 1 and TAILS corresponds to 0.
E[X] = ๐
5
Chernoffโs Bound
Theorem (a): Suppose ๐ฟ๐, ๐ฟ๐, โฆ , ๐ฟ๐ be ๐ independent Bernoulli random variables
with parameters ๐๐, ๐๐, โฆ , ๐๐ ,
Let ๐ฟ = ๐ฟ๐๐ and ๐ = ๐ฌ[๐ฟ] = ๐๐๐ .
For any ๐น > ๐,
๐ ๐ฟ โฅ ๐ + ๐น ๐ โค๐๐น
(๐ + ๐น) ๐+๐น
๐
Alternate and more usable forms:
If 1 + ๐น โฅ ๐๐ then ๐ ๐ฟ โฅ ๐ + ๐น ๐ โค ๐โ ๐+๐น ๐
If 1 < ๐ + ๐น < ๐๐ then ๐ ๐ฟ โฅ ๐ + ๐น ๐ โค ๐โ๐๐น๐/๐
Chernoffโs Bound
Theorem (b): Suppose ๐ฟ๐, ๐ฟ๐, โฆ , ๐ฟ๐ be ๐ independent Bernoulli random variables
with parameters ๐๐, ๐๐, โฆ , ๐๐ ,
Let ๐ฟ = ๐ฟ๐๐ and ๐ = ๐ฌ[๐ฟ] = ๐๐๐ .
For any ๐น > ๐,
๐ ๐ฟ โค ๐ โ ๐น ๐ โค ๐โ๐๐น๐/๐
Hashing
(part 1)
Problem Definition
โข ๐ผ = 1,2, โฆ ,๐ called universe
โข ๐บ โ ๐ผ and ๐ = |๐บ|
โข ๐ โช ๐
Examples:
๐ = 1018 , ๐ = 103
Aim Build a data structure for a given ๐บ to support the search query :
โDoes ๐ โ ๐บ ?โ for any ๐ โ ๐ผ.
Solutions with worst case guarantees
Practical solution with no worst case guarantees
Query time Data structure
Space
Static ๐บ O(log ๐ ) Array O(๐ )
Dynamic ๐บ O(log ๐ ) Red-Black trees O(๐ )
Dynamic ๐บ O(1) Array
O(๐)
Hashing
Hashing
โข Hash table:
๐ป: an array of size ๐.
โข Hash function
๐ : ๐ผ [๐]
Answering a Query: โDoes ๐ โ ๐บ ?โ
1. ๐๐(๐);
2. Search the list stored at ๐ป[๐].
Properties of ๐ :
โข ๐ ๐ computable in O(1) time.
โข Space required by ๐: O(1).
โฎ
โฎ
0 1 ๐ โ ๐
๐ป
How many words needed to encode ๐ ?
Elements of ๐บ
Collision
Definition: Two elements ๐, ๐ โ ๐ผ are
said to collide under hash function ๐ if ๐ ๐ = ๐ ๐
Worst case time for searching an item ๐ :
No. of elements in ๐บ colliding with ๐.
A Discouraging fact:
No hash function can be found
which is good for every ๐บ.
Proof:
At least ๐/๐ elements from ๐ผ
are mapped to a single index in ๐ป.
โฎ
โฎ
0 1 ๐ โ ๐
๐ป
Collision
โฎ
โฎ
0 1 ๐ โ ๐
๐ป
โฏ
๐/๐
History of Hashing
โข A very popular heuristic since 1950โs
โข Achieves O(1) search time in practice
โข Worst case guarantee on search time: O(๐)
Question: Can we have a hashing ensuring for a given ๐บ
โข O(1) worst case guarantee on search time.
โข O(๐) space.
โข Expected O(๐) preprocessing time.
The following result gave an answer in affirmative
Michael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3), 1984.
WHY DOES HASHING WORK SO WELL IN PRACTICE ?
Why does hashing work so well in Practice ?
Question: What is the simplest hash function ๐ : ๐ผ [๐] ?
Answer: ๐ ๐ = ๐ ๐ฆ๐จ๐ ๐
Hashing works so well in practice because the set ๐บ is usually a uniformly
random subset of ๐ผ.
Let us give a theoretical reasoning for this fact.
Why does hashing work so well in Practice ?
Question: What is the simplest hash function ๐ : ๐ผ [๐] ?
Answer: ๐ ๐ = ๐ ๐ฆ๐จ๐ ๐
Why does hashing work so well in Practice ?
Let ๐ฆ1, ๐ฆ2, โฆ , ๐ฆ๐ denote ๐ elements
selected randomly uniformly from ๐ผ to form ๐บ.
Question:
What is expected number of elements
colliding with ๐ฆ1?
Answer: Let ๐ฆ1 takes value ๐.
P(๐ฆ๐ collides with ๐ฆ1) = ??
1 2
m
๐
๐ โ ๐
๐ + ๐
๐ + 2๐
๐ + 3๐ โฎ
โฎ
How many possible values can ๐ฆ๐ take ?
๐ โ 1
How many possible values can collide with
๐ ?
Why does hashing work so well in Practice ?
Let ๐ฆ1, ๐ฆ2, โฆ , ๐ฆ๐ denote ๐ elements
selected randomly uniformly from ๐ผ to form ๐บ.
Question:
What is expected number of elements
colliding with ๐ฆ1?
Answer: Let ๐ฆ1 takes value ๐.
P(๐ฆ๐ collides with ๐ฆ1) =
๐
๐โ1
๐โ1
Expected number of elements of ๐บ colliding with ๐ฆ1 =
=๐
๐โ1
๐โ1 (๐ โ 1)
= ๐ถ 1 for ๐ = ๐ถ(๐ )
1 2
m
๐
๐ โ ๐
๐ + ๐
๐ + 2๐
๐ + 3๐ โฎ
โฎ
Values which may collide with ๐
under the hash function
๐ ๐ฅ = ๐ ๐ฆ๐จ๐ ๐
<๐
๐
๐
๐
Why does hashing work so well in Practice ?
Conclusion
1. ๐ ๐ = ๐ ๐ฆ๐จ๐ ๐ works so well because
for a uniformly random subset of ๐ผ,
the expected number of collision at an index of ๐ป is O(1).
It is easy to fool this hash function such that it achieves O(s) search time.
(do it as a simple exercise).
This makes us think:
โHow can we achieve worst case O(1) search time for a given set ๐บ.โ
HOW TO ACHIEVE WORST CASE O(1) SEARCH TIME
Key idea to achieve worst case O(1) search time
A promising direction:
Find out a family of hash functions H such that
โข For any given ๐บ , many of them are good.
โข Select a function randomly from H and try for ๐บ.
No hash function which is good for every ๐บ
A good hash function for a given ๐บ
The notion of goodness is captured formally by Universal hash family in
the following slide.
Of course, no single hash function is good for every possible ๐บ. But we may strive for a hash function which is good for a given ๐บ.
UNIVERSAL HASH FAMILY
Inspiration
from theoretical explanation of the popularity of hashing in practice
๐๐,๐โ๐๐ผ ๐ ๐ = ๐ ๐ โค1
๐
๐ โ๐ ๐ฏ
Universal Hash Family
Definition: A collection ๐ฏ of hash-functions is said to be c-universal
if there exists a constant ๐ such that for any ๐, ๐ โ ๐ผ,
๐ ๐ ๐ = ๐ ๐ โค๐
๐
Question: Does there exist a Universal hash family whose hash functions have a compact encoding?
Answer:
Yes and it is very simple too
But for the time being, let us see how we can use Universal Hash family for solving our problem.
๐ โ๐ ๐ฏ
STATIC HASHING WORST CASE O(1) SEARCH TIME
Michael Fredman, Janos Komlos, Endre Szemeredy.
Storing a Sparse Table with O(1) Worst Case Access Time.
Journal of the ACM (Volume 31, Issue 3), 1984.
The Journey
One Milestone โข A perfect hash function using hash table of size O(๐ 2)
Tools Needed: โข Universal Hash Family where ๐ is a small constant
โข Elementary Probability
Perfect hashing using O(๐๐) space
Let ๐บ be any set of size ๐.
Let ๐ฏ be a Universal Hash Family.
๐ฟ : no. of collisions for ๐บ when ๐ โ๐ ๐ฏ
Question: What is ๐[๐ฟ] ?
For each ๐, ๐ โ ๐บ, define
๐ฟ = ๐ฟ๐,๐๐<๐ ๐๐ง๐ ๐,๐โ๐บ
๐ ๐ฟ = ๐[๐ฟ๐,๐]
๐<๐ ๐๐ง๐ ๐,๐โ๐บ
= ๐[๐ฟ๐,๐ = ๐]
๐<๐ ๐๐ง๐ ๐,๐โ๐บ
โค ๐
๐๐<๐ ๐๐ง๐ ๐,๐โ๐บ
=๐
๐โ๐(๐ โ ๐)
๐
๐ฟ๐,๐ = ๐ if ๐ ๐ = ๐(๐)๐ otherwise
Perfect hashing using O(๐๐) space
Let ๐ฏ be Universal Hash Family.
Let ๐ฟ : the number of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
Lemma1: ๐[๐ฟ] =๐
๐โ๐(๐โ๐)
๐
Question: How large should ๐ be to achieve no collision ?
Question: How large should ๐ be to achieve ๐ ๐ฟ =๐
๐ ?
Answer: Pick ๐ = ๐๐๐.
Perfect hashing using O(๐๐) space
Let ๐ฏ be Universal Hash Family.
Let ๐ฟ : the number of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
Lemma1: ๐[๐ฟ] =๐
๐โ๐(๐โ๐)
๐
Observation: ๐ ๐ฟ โค๐
๐ when ๐ = ๐๐๐.
Question: What is the probability of no collision when ๐ = ๐๐๐?
Answer:
โNo collisionโ
P(No collision ) = P(๐ฟ = ๐)
= ๐ โ P(๐ฟ โฅ ๐)
โฅ ๐ โ๐
๐
=๐
๐
Use Markovโs Inequality to bound it.
โ๐ฟ = ๐โ
Perfect hashing using O(๐๐) space
Let ๐ฏ be Universal Hash Family.
Lemma2: For ๐ = ๐๐๐, there will be no collision with probability at least 1
2.
Algorithm1: Perfect hashing for ๐บ Repeat
1. Pick ๐ โ๐ ๐ฏ ;
2. ๐ the number of collisions for ๐บ under ๐.
Until ๐ = ๐.
Theorem: A perfect hash function can be computed for ๐บ in expected ? time.
Corollary: A hash table occupying O(๐๐) space and worst case O(๐) search time.
O(๐๐)
HASHING WITH OPTIMAL SPACE AND WORST CASE O(1) SEARCH TIME
Think over it before coming to next class