randomized algorithms cs648ย ยท 2019-03-13ย ยท bernoulli random variable bernoulli random variable:...

31
Randomized Algorithms CS648 Lecture 7 โ€ข Tools for P[ >( + )[]] โ€ข Hashing 1

Upload: others

Post on 27-Jan-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Randomized Algorithms CS648

Lecture 7

โ€ข Tools for

P[๐‘ฟ > (๐Ÿ + ๐œน)๐„[๐‘ฟ]]

โ€ข Hashing

1

Page 2: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

How to show

P[๐‘ฟ > (๐Ÿ + ๐)๐„[๐‘ฟ]] ?

2

Page 3: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Tools

โ€ข Markovโ€™s Inequality

โ€ข Chebyshevโ€™s Inequality

โ€ข Chernoff bound

Page 4: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Markovโ€™s Inequality Theorem: Suppose ๐‘ฟ is a random variable defined over a probability space (๐›€,P)

such that ๐‘ฟ(ฯ‰) โ‰ฅ 0 for each ฯ‰ ฯต ๐›€.

Then for any positive real number ๐’‚,

P(๐‘ฟโ‰ฅ๐’‚) โ‰ค๐‘ฌ[๐‘ฟ]

๐’‚

Important points: โ€ข Applicable only for a nonnegative random variable.

โ€ข Makes sense only for ๐’‚ > ๐‘ฌ[๐‘ฟ].

โ€ข Applied only for getting a bound of the probability of event โ€œ๐‘ฟโ‰ฅ๐’‚โ€ ,

(canโ€™t be used for โ€œ๐‘ฟ โ‰ค ๐’‚โ€)

โ€ข gives very loose bound and so not useful most of the times.

โ€ข Plays a key role in proving other stronger inequalities

(Chernoff bound, Chebyshevโ€™s Inequality)

4

Page 5: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Bernoulli Random Variable

Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

if it takes value 1 with probability ๐‘ and takes value 0 with probability 1 โˆ’ ๐‘.

The corresponding random experiment is usually called a Bernoulli trial.

Example:

Tossing a coin (of HEADS probability= ๐‘) once,

HEADS corresponds to 1 and TAILS corresponds to 0.

E[X] = ๐‘

5

Page 6: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Chernoffโ€™s Bound

Theorem (a): Suppose ๐‘ฟ๐Ÿ, ๐‘ฟ๐Ÿ, โ€ฆ , ๐‘ฟ๐’ be ๐’ independent Bernoulli random variables

with parameters ๐’‘๐Ÿ, ๐’‘๐Ÿ, โ€ฆ , ๐’‘๐’ ,

Let ๐‘ฟ = ๐‘ฟ๐’Š๐’Š and ๐ = ๐‘ฌ[๐‘ฟ] = ๐’‘๐’Š๐’Š .

For any ๐œน > ๐ŸŽ,

๐ ๐‘ฟ โ‰ฅ ๐Ÿ + ๐œน ๐ โ‰ค๐’†๐œน

(๐Ÿ + ๐œน) ๐Ÿ+๐œน

๐

Alternate and more usable forms:

If 1 + ๐œน โ‰ฅ ๐Ÿ๐’† then ๐ ๐‘ฟ โ‰ฅ ๐Ÿ + ๐œน ๐ โ‰ค ๐Ÿโˆ’ ๐Ÿ+๐œน ๐

If 1 < ๐Ÿ + ๐œน < ๐Ÿ๐’† then ๐ ๐‘ฟ โ‰ฅ ๐Ÿ + ๐œน ๐ โ‰ค ๐’†โˆ’๐๐œน๐Ÿ/๐Ÿ’

Page 7: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Chernoffโ€™s Bound

Theorem (b): Suppose ๐‘ฟ๐Ÿ, ๐‘ฟ๐Ÿ, โ€ฆ , ๐‘ฟ๐’ be ๐’ independent Bernoulli random variables

with parameters ๐’‘๐Ÿ, ๐’‘๐Ÿ, โ€ฆ , ๐’‘๐’ ,

Let ๐‘ฟ = ๐‘ฟ๐’Š๐’Š and ๐ = ๐‘ฌ[๐‘ฟ] = ๐’‘๐’Š๐’Š .

For any ๐œน > ๐ŸŽ,

๐ ๐‘ฟ โ‰ค ๐Ÿ โˆ’ ๐œน ๐ โ‰ค ๐’†โˆ’๐๐œน๐Ÿ/๐Ÿ

Page 8: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Hashing

(part 1)

Page 9: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Problem Definition

โ€ข ๐‘ผ = 1,2, โ€ฆ ,๐‘š called universe

โ€ข ๐‘บ โŠ† ๐‘ผ and ๐‘  = |๐‘บ|

โ€ข ๐‘  โ‰ช ๐‘š

Examples:

๐‘š = 1018 , ๐‘  = 103

Aim Build a data structure for a given ๐‘บ to support the search query :

โ€œDoes ๐‘– โˆˆ ๐‘บ ?โ€ for any ๐‘– โˆˆ ๐‘ผ.

Page 10: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Solutions with worst case guarantees

Practical solution with no worst case guarantees

Query time Data structure

Space

Static ๐‘บ O(log ๐‘ ) Array O(๐‘ )

Dynamic ๐‘บ O(log ๐‘ ) Red-Black trees O(๐‘ )

Dynamic ๐‘บ O(1) Array

O(๐‘š)

Hashing

Page 11: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Hashing

โ€ข Hash table:

๐‘ป: an array of size ๐’.

โ€ข Hash function

๐’‰ : ๐‘ผ [๐’]

Answering a Query: โ€œDoes ๐‘– โˆˆ ๐‘บ ?โ€

1. ๐‘˜๐’‰(๐‘–);

2. Search the list stored at ๐‘ป[๐‘˜].

Properties of ๐’‰ :

โ€ข ๐’‰ ๐‘– computable in O(1) time.

โ€ข Space required by ๐’‰: O(1).

โ‹ฎ

โ‹ฎ

0 1 ๐’ โˆ’ ๐Ÿ

๐‘ป

How many words needed to encode ๐’‰ ?

Elements of ๐‘บ

Page 12: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Collision

Definition: Two elements ๐‘–, ๐‘— โˆˆ ๐‘ผ are

said to collide under hash function ๐’‰ if ๐’‰ ๐‘– = ๐’‰ ๐‘—

Worst case time for searching an item ๐‘– :

No. of elements in ๐‘บ colliding with ๐‘–.

A Discouraging fact:

No hash function can be found

which is good for every ๐‘บ.

Proof:

At least ๐‘š/๐‘› elements from ๐‘ผ

are mapped to a single index in ๐‘ป.

โ‹ฎ

โ‹ฎ

0 1 ๐’ โˆ’ ๐Ÿ

๐‘ป

Page 13: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Collision

โ‹ฎ

โ‹ฎ

0 1 ๐’ โˆ’ ๐Ÿ

๐‘ป

โ‹ฏ

๐‘š/๐‘›

Page 14: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

History of Hashing

โ€ข A very popular heuristic since 1950โ€™s

โ€ข Achieves O(1) search time in practice

โ€ข Worst case guarantee on search time: O(๐’”)

Question: Can we have a hashing ensuring for a given ๐‘บ

โ€ข O(1) worst case guarantee on search time.

โ€ข O(๐’”) space.

โ€ข Expected O(๐’”) preprocessing time.

The following result gave an answer in affirmative

Michael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3), 1984.

Page 15: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

WHY DOES HASHING WORK SO WELL IN PRACTICE ?

Page 16: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Why does hashing work so well in Practice ?

Question: What is the simplest hash function ๐’‰ : ๐‘ผ [๐’] ?

Answer: ๐’‰ ๐‘– = ๐‘– ๐ฆ๐จ๐ ๐‘›

Hashing works so well in practice because the set ๐‘บ is usually a uniformly

random subset of ๐‘ผ.

Let us give a theoretical reasoning for this fact.

Page 17: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Why does hashing work so well in Practice ?

Question: What is the simplest hash function ๐’‰ : ๐‘ผ [๐’] ?

Answer: ๐’‰ ๐‘– = ๐‘– ๐ฆ๐จ๐ ๐‘›

Page 18: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Why does hashing work so well in Practice ?

Let ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘  denote ๐‘  elements

selected randomly uniformly from ๐‘ผ to form ๐‘บ.

Question:

What is expected number of elements

colliding with ๐‘ฆ1?

Answer: Let ๐‘ฆ1 takes value ๐‘–.

P(๐‘ฆ๐‘— collides with ๐‘ฆ1) = ??

1 2

m

๐‘–

๐‘– โˆ’ ๐‘›

๐‘– + ๐‘›

๐‘– + 2๐‘›

๐‘– + 3๐‘› โ‹ฎ

โ‹ฎ

How many possible values can ๐‘ฆ๐‘— take ?

๐‘š โˆ’ 1

How many possible values can collide with

๐‘– ?

Page 19: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Why does hashing work so well in Practice ?

Let ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘  denote ๐‘  elements

selected randomly uniformly from ๐‘ผ to form ๐‘บ.

Question:

What is expected number of elements

colliding with ๐‘ฆ1?

Answer: Let ๐‘ฆ1 takes value ๐‘–.

P(๐‘ฆ๐‘— collides with ๐‘ฆ1) =

๐‘š

๐‘›โˆ’1

๐‘šโˆ’1

Expected number of elements of ๐‘บ colliding with ๐‘ฆ1 =

=๐‘š

๐‘›โˆ’1

๐‘šโˆ’1 (๐‘  โˆ’ 1)

= ๐‘ถ 1 for ๐‘› = ๐‘ถ(๐‘ )

1 2

m

๐‘–

๐‘– โˆ’ ๐‘›

๐‘– + ๐‘›

๐‘– + 2๐‘›

๐‘– + 3๐‘› โ‹ฎ

โ‹ฎ

Values which may collide with ๐‘–

under the hash function

๐’‰ ๐‘ฅ = ๐’™ ๐ฆ๐จ๐ ๐‘›

<๐‘š

๐‘›

๐‘ 

๐‘š

Page 20: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Why does hashing work so well in Practice ?

Conclusion

1. ๐’‰ ๐‘– = ๐‘– ๐ฆ๐จ๐ ๐‘› works so well because

for a uniformly random subset of ๐‘ผ,

the expected number of collision at an index of ๐‘ป is O(1).

It is easy to fool this hash function such that it achieves O(s) search time.

(do it as a simple exercise).

This makes us think:

โ€œHow can we achieve worst case O(1) search time for a given set ๐‘บ.โ€

Page 21: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

HOW TO ACHIEVE WORST CASE O(1) SEARCH TIME

Page 22: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Key idea to achieve worst case O(1) search time

A promising direction:

Find out a family of hash functions H such that

โ€ข For any given ๐‘บ , many of them are good.

โ€ข Select a function randomly from H and try for ๐‘บ.

No hash function which is good for every ๐‘บ

A good hash function for a given ๐‘บ

The notion of goodness is captured formally by Universal hash family in

the following slide.

Of course, no single hash function is good for every possible ๐‘บ. But we may strive for a hash function which is good for a given ๐‘บ.

Page 23: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

UNIVERSAL HASH FAMILY

Inspiration

from theoretical explanation of the popularity of hashing in practice

๐๐‘–,๐‘—โˆˆ๐‘Ÿ๐‘ผ ๐’‰ ๐‘– = ๐’‰ ๐‘— โ‰ค1

๐‘›

๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ

Page 24: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Universal Hash Family

Definition: A collection ๐‘ฏ of hash-functions is said to be c-universal

if there exists a constant ๐‘ such that for any ๐‘–, ๐‘— โˆˆ ๐‘ผ,

๐ ๐’‰ ๐‘– = ๐’‰ ๐‘— โ‰ค๐‘

๐‘›

Question: Does there exist a Universal hash family whose hash functions have a compact encoding?

Answer:

Yes and it is very simple too

But for the time being, let us see how we can use Universal Hash family for solving our problem.

๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ

Page 25: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

STATIC HASHING WORST CASE O(1) SEARCH TIME

Michael Fredman, Janos Komlos, Endre Szemeredy.

Storing a Sparse Table with O(1) Worst Case Access Time.

Journal of the ACM (Volume 31, Issue 3), 1984.

Page 26: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

The Journey

One Milestone โ€ข A perfect hash function using hash table of size O(๐‘ 2)

Tools Needed: โ€ข Universal Hash Family where ๐‘ is a small constant

โ€ข Elementary Probability

Page 27: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Perfect hashing using O(๐’”๐Ÿ) space

Let ๐‘บ be any set of size ๐’”.

Let ๐‘ฏ be a Universal Hash Family.

๐‘ฟ : no. of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ

Question: What is ๐„[๐‘ฟ] ?

For each ๐‘–, ๐‘— โˆˆ ๐‘บ, define

๐‘ฟ = ๐‘ฟ๐‘–,๐‘—๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ

๐„ ๐‘ฟ = ๐„[๐‘ฟ๐‘–,๐‘—]

๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ

= ๐[๐‘ฟ๐‘–,๐‘— = ๐Ÿ]

๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ

โ‰ค ๐’„

๐’๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ

=๐’„

๐’โˆ™๐’”(๐’” โˆ’ ๐Ÿ)

๐Ÿ

๐‘ฟ๐‘–,๐‘— = ๐Ÿ if ๐’‰ ๐‘– = ๐’‰(๐‘—)๐ŸŽ otherwise

Page 28: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Perfect hashing using O(๐’”๐Ÿ) space

Let ๐‘ฏ be Universal Hash Family.

Let ๐‘ฟ : the number of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?

Lemma1: ๐„[๐‘ฟ] =๐’„

๐’โˆ™๐’”(๐’”โˆ’๐Ÿ)

๐Ÿ

Question: How large should ๐’ be to achieve no collision ?

Question: How large should ๐’ be to achieve ๐„ ๐‘ฟ =๐Ÿ

๐Ÿ ?

Answer: Pick ๐’ = ๐’„๐’”๐Ÿ.

Page 29: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Perfect hashing using O(๐’”๐Ÿ) space

Let ๐‘ฏ be Universal Hash Family.

Let ๐‘ฟ : the number of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?

Lemma1: ๐„[๐‘ฟ] =๐’„

๐’โˆ™๐’”(๐’”โˆ’๐Ÿ)

๐Ÿ

Observation: ๐„ ๐‘ฟ โ‰ค๐Ÿ

๐Ÿ when ๐’ = ๐’„๐’”๐Ÿ.

Question: What is the probability of no collision when ๐’ = ๐’„๐’”๐Ÿ?

Answer:

โ€œNo collisionโ€

P(No collision ) = P(๐‘ฟ = ๐ŸŽ)

= ๐Ÿ โˆ’ P(๐‘ฟ โ‰ฅ ๐Ÿ)

โ‰ฅ ๐Ÿ โˆ’๐Ÿ

๐Ÿ

=๐Ÿ

๐Ÿ

Use Markovโ€™s Inequality to bound it.

โ€œ๐‘ฟ = ๐ŸŽโ€

Page 30: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

Perfect hashing using O(๐’”๐Ÿ) space

Let ๐‘ฏ be Universal Hash Family.

Lemma2: For ๐’ = ๐’„๐’”๐Ÿ, there will be no collision with probability at least 1

2.

Algorithm1: Perfect hashing for ๐‘บ Repeat

1. Pick ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ;

2. ๐’• the number of collisions for ๐‘บ under ๐’‰.

Until ๐’• = ๐ŸŽ.

Theorem: A perfect hash function can be computed for ๐‘บ in expected ? time.

Corollary: A hash table occupying O(๐’”๐Ÿ) space and worst case O(๐Ÿ) search time.

O(๐’”๐Ÿ)

Page 31: Randomized Algorithms CS648ย ยท 2019-03-13ย ยท Bernoulli Random Variable Bernoulli Random Variable: A r.v. variable X is said to be a Bernoulli random variable with parameter ๐‘

HASHING WITH OPTIMAL SPACE AND WORST CASE O(1) SEARCH TIME

Think over it before coming to next class