hashing€¦ · chained hashing for random hash functions • conclusion: with random constant time...
TRANSCRIPT
![Page 1: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/1.jpg)
Hashing
Philip Bille
![Page 2: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/2.jpg)
Outline
• Dictionaries
• Chained Hashing
• Universal Hashing
• Static Dictionaries and Perfect Hashing
![Page 3: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/3.jpg)
Dictionaries
![Page 4: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/4.jpg)
Dictionaries
• The dictionary problem: Maintain a set S ⊆ U = {0, ..., u-1} under operations
• lookup(x): return true if x ∈ S and false otherwise.
• insert(x): set S = S ∪ {x}
• delete(x): set S = S - {x}
• We may also have associated satellite information for the keys
• Goal: A compact data structure (linear space) with fast operations (constant time).
![Page 5: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/5.jpg)
Applications
• Maintain a dictionary (!)
• Key component in many data structures and algorithms. (A few examples in exercises and later lectures).
![Page 6: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/6.jpg)
Solutions to the Dictionary Problem
• Which solutions do we know?
• Direct addressing (bitvector)
• Linked lists.
• Binary search trees (balanced)
• Chained hashing
![Page 7: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/7.jpg)
Chained Hashing
![Page 8: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/8.jpg)
Chained Hashing [Dumey 1956]
• Simplifying assumption: |S| ≤ N at all times and we can use space O(N).
• Pick some crazy, chaotic, random function h (the hash function) mapping U to {0, ..., N-1}.
• Initialize an array A[0, ..., N-1].
• A[i] stores a linked list containing the keys in S whose hash value is i.
![Page 9: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/9.jpg)
41 1
54
66 96 16
1
2
3
0
4
5
6
7
8
9
A Satellite info
• U = {0, ..., 99}
• S = {1, 16, 41, 54, 66, 96}
• h(x) = x mod 10
![Page 10: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/10.jpg)
• How can we support lookup, insert, and delete?
• lookup(x): Compute h(x). Scan through list for h(x). Return true if x is in list and false otherwise.
• insert(x): Compute h(x). Scan through list for h(x). If x is in list do nothing. Otherwise, add x to the front of list.
• delete(x): Compute h(x). Scan through list for h(x). If x is in list remove it. Otherwise, do nothing.
• Time: O(1 + length of linked list for h(x))
Operations
![Page 11: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/11.jpg)
Hash Functions
• A crazy, chaotic hash function (like h(x) = x mod 10) sounds good. But there is a big problem. What?
• For any choice of h, we can find a set whose elements all map to the same slot
• => We end up with a single linked list.
• How can we overcome this?
![Page 12: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/12.jpg)
Randomness
• There are two ways to get around hard set instances:
• Assume the input set is random.
• Choose the hash function at random.
![Page 13: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/13.jpg)
Chained Hashing for Random Hash Functions
• Assumption 1: h: U - > {0, ..., N-1} is chosen uniformly at random from the set of all functions from U to {0, ..., N-1}.
• Assumption 2: h can be evaluated in constant time.
• What is the expected time for an operation OP(x), where OP = {lookup, insert, delete}?
![Page 14: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/14.jpg)
Chained Hashing for Random Hash Functions
Time for OP(x) = O (1 + E [length of linked list for h(x)])= O (1 + E [|{y 2 S | h(y) = h(x)}|])
= O
0
@1 + E
2
4X
y2S
(1 if h(y) = h(x)0 if h(y) 6= h(x)
3
5
1
A
= O
0
@1 +X
y2S
E
"(1 if h(y) = h(x)0 if h(y) 6= h(x)
#1
A
= O(1 +X
y2S
Pr[h(x) = h(y)])
= O(1 + 1 +X
y2S\{x}
Pr[h(x) = h(y)])
= O(1 + 1 +X
y2S\{x}
1/N)
= O(1 + 1 + N(1/N)) = O(1)
N2 choices for pair (h(x), h(y)), N of which cause collision
![Page 15: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/15.jpg)
Chained Hashing for Random Hash Functions
• Conclusion: With random constant time hash function we solve the dictionary problem in
• O(N) space.
• O(1) expected time per operation (lookup, insert, delete).
• Expectation is over the choice of hash function.
• Independent of the input set.
![Page 16: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/16.jpg)
Random Hash Functions?
• How do we find and represent a random hash function?
• We need u log N bits to store an arbitrary function from {0,..., u-1} to {0,..., N-1} (specify for each element x in U the value h(x)).
• We need a lot of random bits to generate the function.
• We need a lot of time to generate the function.
![Page 17: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/17.jpg)
Random Hash Functions?
• Do we actually need a truly random hash function?
• When did we use the fact that h was random in our analysis?
Time for OP(x) = O (1 + E [length of linked list for h(x)])= O (1 + E [|{y 2 S | h(y) = h(x)}|])
= O
0
@1 + E
2
4X
y2S
(1 if h(y) = h(x)0 if h(y) 6= h(x)
3
5
1
A
= O
0
@1 +X
y2S
E
"(1 if h(y) = h(x)0 if h(y) 6= h(x)
#1
A
= O(1 +X
y2S
Pr[h(x) = h(y)])
= O(1 + 1 +X
y2S\{x}
Pr[h(x) = h(y)])
= O(1 + 1 +X
y2S\{x}
1/N)
= O(1 + 1 + N(1/N)) = O(1)
For all x 6= y, Pr[h(x) = h(y)] 1/N
![Page 18: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/18.jpg)
Random Hash Functions?
• We do not need truly random hash function!
• We only need: For all x ≠ y, Pr[h(x) = h(y)] ≤ 1/N
• Captured in definition of universal hashing.
![Page 19: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/19.jpg)
Universal Hashing
![Page 20: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/20.jpg)
Universal Hashing [Carter and Wegman 1979]
• Def: Let H be a set of functions mapping U to {0, ..., N-1}. H is universal if for any x≠y in U and h chosen uniformly at random in H, we have
• Pr[h(x) = h(y)] ≤ 1/N
• If we can find family of universal hash functions such that
• we can store it in small space
• we can evaluate it in constant time
• => efficient chained hashing in the “real-world”.
![Page 21: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/21.jpg)
A Universal Family• Fact: For any N there exists a prime p such that N < p < 2N [Chebyshev 1850]
• Integer encoding: We identify each element x ∈ U with a base-p integer x = (x1, x2, .., xr) containing r-digits.
• Example: 107 in base 7 = (2, 1, 2) (why?)
• 10710 = 1⋅102 + 0⋅101 + 7⋅100 = 2⋅72 + 1⋅71 + 2⋅70 = 2127
• For a = (a1, a2, .., ar) ∈ {0, ..., p-1}r define
• ha(x1, x2, .., xr) = a1x1 + a2x2 + ... + arxr mod p
• H = {ha | a ∈ {0, ..., p-1}r }
• H is universal (next slides).
• O(1) time evaluation
• O(1) space
• Fast construction (finding primes?)
![Page 22: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/22.jpg)
Number Theory
• Fact: Let p be a prime. For any a ∈ {1, ..., p-1} there exists a unique inverse a-1 such that a-1 ⋅ a ≡ 1 mod p. (Zp is a field)
• Example: p = 7
a 1 2 3 4 5 6a-1
a 1 2 3 4 5 6a-1 1 4 5 2 3 6
![Page 23: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/23.jpg)
Analysis
• Goal: For random a = (a1, a2, .., ar), show that if x = (x1, x2, .., xr) ≠ y = (y1, y2, .., yr) then Pr[ha(x) = ha(y)] ≤ 1/N
• (x1, x2, .., xr) ≠ (y1, y2, .., yr) => xi ≠ yi for some i. Assume wlog. that xr ≠ yr.
p choices for ar, exactly one causes a collision by uniqueness of inverses.
existence of inverses
Pr[ha(x1, . . . , xr) = ha(y1, . . . , yr)]
= Pr [a1x1 + · · ·+ arxr ⌘ a1y1 + · · ·+ aryr mod p]
= Pr [arxr � aryr ⌘ a1y1 � a1x1 + · · ·+ ar�1yr�1 � ar�1xr�1 mod p]
= Pr [ar(xr � yr) ⌘ a1(y1 � x1) + · · ·+ ar�1(yr�1 � xr�1) mod p]
= Pr
⇥ar(xr � yr)(xr � yr)
�1 ⌘ (a1(y1 � x1) + · · ·+ ar�1(yr�1 � xr�1))(xr � yr)�1
mod p
⇤
= Pr
⇥ar ⌘ (a1(y1 � x1) + · · ·+ ar�1(yr�1 � xr�1))(xr � yr)
�1mod p
⇤=
1
p
1
N
![Page 24: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/24.jpg)
Universal Hashing and Dictionaries
• We have a universal family H:
• O(1) time evaluation
• O(1) space
• Theorem: We can solve the dictionary problem (without special assumptions) in:
• O(N) space.
• O(1) expected time per operation (lookup, insert, delete).
![Page 25: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/25.jpg)
• For prime p > 0, a ∈ {1, .., p-1}, b ∈ {0, ..., p-1}
• Hash function from k-bit numbers to l-bit numbers. a is an odd k-bit integer.
Other Universal Families
ha,b(x) = (ax + b mod p) mod N
H = {ha,b | a 2 {1, . . . , p� 1}, b 2 {0, . . . , p� 1}}
ha(x) = (ax mod 2
k)� (k � l)
H = {ha | a is an odd integer in {1, . . . , 2k � 1}}
l most significant bits of the k least significant bits of ax
![Page 26: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/26.jpg)
Static Dictionaries and Perfect Hashing
![Page 27: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/27.jpg)
Static Dictionaries
• The static dictionary problem: Given a set S ⊆ U = {0, ..., u-1} of size N for preprocessing support the following operation
• lookup(x): return true if x ∈ S and false otherwise.
• As the dictionary problem with no updates (insert and deletes). Set given in advance.
![Page 28: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/28.jpg)
Perfect Hashing
• Chained hashing with a universal hash function solves the static dictionary problem for S in O(N) space and O(1) expected time per lookup.
• Can we do even better?
• A perfect hash function for a set S is a collision-free hash function.
• With a perfect hash function for S we get O(1) worst-case lookup time for static dictionary problem on S. (Why?)
![Page 29: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/29.jpg)
Static Dictionaries
• Combine two solutions for static hashing using universal hashing:
• A solution that uses too much space but is collision-free.
• A solution that uses too little space with many collision.
• Two-level solution:
• Use too little space at level 1.
• Resolve collisions at level 1 with collision-free solution at level 2.
• lookup(x): look-up in level 1 to find the correct level 2 dictionary. Lookup in level 2 dictionary.
![Page 30: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/30.jpg)
Using too much Space
• Use a universal hash function to map into an array of size N2. What is the expected total number of collisions in the array?
• We get a perfect hash function with probability 1/2. If not perfect try again.
• Expected number of trials before we get a perfect hash function is O(1).
• => For a static set S we can support lookups in O(1) worst-case time using O(N2) space.
E[#collisions] = E
2
4X
x,y2S,x6=y
(1 if h(y) = h(x)
0 if h(y) 6= h(x)
3
5
=
X
x,y2S,x6=y
E
"(1 if h(y) = h(x)
0 if h(y) 6= h(x)
#
=
X
x,y2S,x6=y
Pr[h(x) = h(y)] =
✓N
2
◆1
N
2 N
2
2
· 1
N
2= 1/2
#distinct pairs Universal hashing into N2 range
![Page 31: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/31.jpg)
Using too little Space
• If we instead use an array of size N the expected total number of collisions is
E[#collisions] = E
2
4X
x,y2S,x6=y
(1 if h(y) = h(x)
0 if h(y) 6= h(x)
3
5
=
X
x,y2S,x6=y
E
"(1 if h(y) = h(x)
0 if h(y) 6= h(x)
#
=
X
x,y2S,x6=y
Pr[h(x) = h(y)] =
✓N
2
◆1
N
N
2
2
· 1
N
= 1/2N
![Page 32: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/32.jpg)
FKS-scheme [Fredman, Komlós, Szemerédi 1984]
• Two-level solution.
• At level 1 use solution with O(N) collisions.
• Resolve collisions at level 2 with collision-free quadratic space solution.
![Page 33: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/33.jpg)
• S = {1, 16, 41, 54, 66, 96}
• Level 1 hash partitions into S1 = {1, 41}, S4 = {54}, S6 = {16, 66, 96}
• Level 2 hash info (e.g., a and size of subtable) stored with subtable.
• O(1) worst-case lookup. What is the space?
41
54
1
2
3
0
4
5
6
7
8
9
16 96 66
1
![Page 34: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/34.jpg)
• Space is the total size of level 1 and level 2 hash tables:
• The total number of collisions at level 1 (by choice of hash function and by definition):
How much space does FKS use?
space = O
0
@N +X
i2{0,...,N�1}
|Si|21
A
#collisions =
X
i2{0,...,N�1}
✓|Si|2
◆#collisions = O(N)
For any integer a, a2= a + 2
�a2
�
space = O
N +
X
i
|Si|2!
= O
N +
X
i
✓|Si| + 2
✓|Si|2
◆◆!
= O
N +
X
i
|Si| + 2X
i
✓|Si|2
◆!= O(N + N + 2N) = O(N)
![Page 35: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/35.jpg)
FKS-scheme
• O(N) space and O(N) expected preprocessing time.
• Lookups with two evaluations of a universal hash function.
• Theorem: We can solve the static dictionary problem for a set S of size N in:
• O(N) space and O(N) expected preprocessing time.
• O(1) worst-case time per lookup.
• FKS example of important data structure technique. Combine different solutions for same problem to get an improved solution.
![Page 36: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/36.jpg)
Summary
• Dictionaries
• Chained Hashing
• Universal Hashing
• Static Dictionaries and Perfect Hashing
![Page 37: Hashing€¦ · Chained Hashing for Random Hash Functions • Conclusion: With random constant time hash function we solve the dictionary problem in • O(N) space. • O(1) expected](https://reader033.vdocuments.site/reader033/viewer/2022060520/604eacf4207bd8279725bd0d/html5/thumbnails/37.jpg)
References
• J. Carter and M. Wegman, Universal Classes of Hash Functions. J. Comp. Sys. Sci., 1977
• M. Fredman, J. Komlos and E. Szemeredi, Storing a Sparse Table with O(1) Worst Case Access Time, J. ACM., 1984
• Scribe notes from MIT
• Peter Bro Miltersen’s notes from Aarhus