cs 261: graduate data structures week 2: dictionaries and

59
CS 261: Graduate Data Structures Week 2: Dictionaries and hash tables David Eppstein University of California, Irvine Spring Quarter, 2021

Upload: others

Post on 19-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 261: Graduate Data Structures Week 2: Dictionaries and

CS 261: Graduate Data Structures

Week 2: Dictionaries and hash tables

David EppsteinUniversity of California, Irvine

Spring Quarter, 2021

Page 2: CS 261: Graduate Data Structures Week 2: Dictionaries and

Dictionaries

Page 3: CS 261: Graduate Data Structures Week 2: Dictionaries and

What is a dictionary?

Maintain a collection of key–value pairs

I Keys are often integers or strings but can be other types

I At most one copy of each different key

I Values may be any type of data

Operations update the collection and look up the value associatedwith a given key

Dictionary in everyday usage: book of definitions of words.So the keys are words and the values are their definitions

Page 4: CS 261: Graduate Data Structures Week 2: Dictionaries and

Example application

We’ve already seen one in week 1!

In the depth-first-search example, we represented the input graphas a dictionary with keys = vertices and values = collections ofneighboring vertices

The algorithm didn’t need to know what kind of object the verticesare, only that they are usable as keys in the dictionary

Page 5: CS 261: Graduate Data Structures Week 2: Dictionaries and

Dictionary API

Create new empty dictionary

Look up key k and return the associated value

(Exception if k is not included in the collection)

Add key–value pair (k , x) to the collection

(Replace old value if k is already in the collection)

Check whether key k is in the collection and return Boolean result

Enumerate pairs in the collection

Page 6: CS 261: Graduate Data Structures Week 2: Dictionaries and

Dictionaries in Java and Python

Python: dict type

I Create: D = key1: value1, key2: value2, ...

I Get value: value = D[key]

I Set value: D[key] = value

I Test membership: if key in D:

I List key–value pairs: for key, value in D.items():

Java: HashMap

Similar access methods with different syntax

Page 7: CS 261: Graduate Data Structures Week 2: Dictionaries and

Python example: Counting occurrences

def count_occurrences(sequence):

D = # Create dictionary

for x in sequence:

if x in D: # Test membership

D[x] += 1 # Get and then set

else:

D[x] = 1 # Set

return D

Page 8: CS 261: Graduate Data Structures Week 2: Dictionaries and

Non-hashing dictionaries: Association list

Store unsorted collection of key–value pairs

I Very slow (each get/set must scan whole dictionary),

O(n) time per operation where n is # key–value pairs

I Can be ok when you know entire dictionary will have size O(1)

I We will use these as a subroutine in hash chaining

Page 9: CS 261: Graduate Data Structures Week 2: Dictionaries and

Non-hashing dictionaries: Direct addressing

Use key as index into an array

I Only works when keys are small integers

I Wastes space unless most keys are present

I Fast: O(1) time to look up a key or change its value

I Important as motivation for hashing

Page 10: CS 261: Graduate Data Structures Week 2: Dictionaries and

Non-hashing dictionaries: Search trees

Binary search trees, B-trees, tries, and flat trees

I We’ll see these in weeks 6 and 7!

(Until then, you won’t need to know much about them)

I Unnecessarily slow if you need only dictionary operations(searching for exact matches)

I But they can be useful for other kinds of query(inexact matches)

Page 11: CS 261: Graduate Data Structures Week 2: Dictionaries and

Hashing

Page 12: CS 261: Graduate Data Structures Week 2: Dictionaries and

Hash table intuition

The short version:

Use a hash function h(k) to map keys to small integers

Use direct addressing with key h(k) instead of k itself

Page 13: CS 261: Graduate Data Structures Week 2: Dictionaries and

Hash table intuition

Maintain a (dynamic) array A whose cells store key–value pairs

Construct and use a hash function h(k) that “randomly” scramblesthe keys, mapping them to positions in A

Store key–value pair (k, x) in cell A[h(k)] and do lookups bychecking whether the pair stored in that cell has the correct key

When table doubles in size, update h for the larger set of positions

All operations: O(1) amortized time (assume computing h is fast)

Complication: Collisions. What happens when two keys k1 and k2

have the same hash value h(k1) = h(k2)?

Page 14: CS 261: Graduate Data Structures Week 2: Dictionaries and

Hash functions: Perfect hashing

Sometimes, you can construct a function h that maps the n keysone-to-one to the integers 0, 1, . . . n − 1

By definition, there are no collisions!

Works when set of keys is small, fixed, and known in advance, socan spend a lot of time searching for perfect hash function

Example: reserved words in programming languages / compilers

Use fixed (not dynamic) array A of size n, store key–value pair(k , x) in cell A[h(k)] (include value so can detect invalid keys)

Page 15: CS 261: Graduate Data Structures Week 2: Dictionaries and

Hash functions: Random

Standard assumption for analysis of hashing:

The value of h(k) is a random number,independent of the values of h on all the other keys

Not actually true in most applications of hashing

Results in this model are not mathematically validfor non-randomly-chosen hash functions

(nevertheless, analysis tends to match practical performancebecause many nonrandom functions behave like random)

This assumption is valid for Java IdentityHashMap:Each object has hash value randomly chosen at its creation

Page 16: CS 261: Graduate Data Structures Week 2: Dictionaries and

Hash functions: Cryptographic

There has been much research in cryptographic hash functions thatmap arbitrary information to large integers (e.g. 512 bits)

Could be used for hash functions in dictionariesby taking result modulo n

Any detectable difference between the results and a randomfunction ⇒ the cryptographic hash is considered broken

Too slow to be practical for most purposes

Page 17: CS 261: Graduate Data Structures Week 2: Dictionaries and

Hash functions: Fast, practical, and provable

It’s possible to construct hash functions that are fast and practical

... and at the same time use them in valid mathematical analysisof hashing algorithms

Details depend on the choice of hashing algorithm

We’ll see more on this topic later this week!

Page 18: CS 261: Graduate Data Structures Week 2: Dictionaries and

Quick review of probability

Page 19: CS 261: Graduate Data Structures Week 2: Dictionaries and

Basic concepts

Random event: Something that might or might not happen

Probability of an event: a number, the fraction of times that anevent will happen over many repetitions of an experiment

Pr[X ]

Discrete random variable: Can take one of finitely many values,with (possibly different) probabilities summing to one

Uniformly random variable: Each value is equally likely

Independent variables: Knowing the value of any subset doesn’thelp predict the rest (their probabilities stay the same)

Page 20: CS 261: Graduate Data Structures Week 2: Dictionaries and

Expected values

If R is any function R(X ) of the random choices we’re making, itsexpected value is just weighted average of its values:

E [R] =∑

outcome X

Pr[X ]R(X )

Linearity of expectations: For all collections of functions Ri ,

∑i

E [Ri ] = E

[∑i

Ri

](It expands into a double summation, can do sums in either order)

If X can only take the values 0 or 1, then E [X ] = Pr[X = 1]

So E [number of Xi that are one] =∑

i Pr[Xi = 1]

Page 21: CS 261: Graduate Data Structures Week 2: Dictionaries and

Analysis of random algorithms

We will analyze algorithms that use randomly-chosen numbers toguide their decisions

...but on inputs that are not random (usually worst-case)

The amount of time (number of steps) that such an algorithmtakes is a random variable

Most common measure of time complexity: E [time]

Also used: “with high probability”, meaning that the probability ofseeing a given time bound is 1− o(1)

Page 22: CS 261: Graduate Data Structures Week 2: Dictionaries and

Chernoff bound: Intuition

Suppose you have a collection of random events

(independent, but possibly with different probabilities)

Define a random variable X : how many of these events happen

Main idea: X is very unlikely to be far away from E [X ]

Example: Flip a coin n times

X = how many times do you flip heads?

Very unlikely to be far away from E [X ] = n/2

Page 23: CS 261: Graduate Data Structures Week 2: Dictionaries and

Chernoff bound (multiplicative form)

Let X be a sum of independent 0–1 random variables

Then for any c > 1,

Pr[X ≥ c E [X ]] ≤(ec−1

cc

)E [X ]

(where e ≈ 2.718281828 . . . , “Euler’s constant”)

Similar formula for the probability that X is at most E [X ]/c

Page 24: CS 261: Graduate Data Structures Week 2: Dictionaries and

Three special cases of Chernoff

Pr[X ≥ c E [X ]] ≤(ec−1

cc

)E [X ]

If c is constant, but E [X ] is large ⇒probability is an exponentially small function of E [X ]

If E [X ] is constant, but c is large ⇒probability is ≤ 1/cΘ(c), even smaller than exponential in c

If c is close to one (c = 1 + δ for 0 < δ < 1) ⇒probability is ≤ e−δ

2E [X ]/3

Page 25: CS 261: Graduate Data Structures Week 2: Dictionaries and

Collisions and collision resolution

Page 26: CS 261: Graduate Data Structures Week 2: Dictionaries and

Random functions have many collisions

Suppose we map n keys randomly to a hash table with N cells.

Then the expected number of pairs of keys that collide is

∑keys i ,j

Pr[i collides with j ] =

(n

2

)1

N≈ n2

2N

The sum comes from linearity of expectation.There are

(n2

)key pairs, each colliding with probability 1/N.

E.g. birthday paradox: a class of 80 students has in expectation(802

)1

365 ≈ 8.66 pairs with the same birthday

Conclusion: Avoiding all collisions with random functions needshash table size Ω(n2), way too big to be efficient

Page 27: CS 261: Graduate Data Structures Week 2: Dictionaries and

Resolving collisions: Open addressing

Instead of having a single array cell that each key can go to,use a “probe sequence” of multiple cells

Look for the key at each position of the probe sequence until either:

You find a cell containing the key

You find an empty cell (and deduce that key is not present)

Many variations; we’ll see two later this week

Page 28: CS 261: Graduate Data Structures Week 2: Dictionaries and

Resolving collisions: Hash chaining (bucketing)

Instead of storing a single key–value pair in each array cell,store an association list (collection of key–value pairs)

To set or get value for key k , search the list in A[h(k)] only

Time will be fast enough if these lists are small

(But in practice the overhead of having a multi-level data structureand keeping a list per array cell means the constant factors in time

and space bounds are larger than other methods.)

First hashing method ever published: Peter Luhn, 1953

Page 29: CS 261: Graduate Data Structures Week 2: Dictionaries and

Expected analysis of hash chaining

Assumption: N cells with load factor α = n/N = O(1)

(Use dynamic tables and increase the table size when N << n)

Then, for any key k, the time to set or get the value for k isO(`k), the number of key–value pairs in A[h(k)].

By linearity of expectation,

E [`k ] = 1 +∑

key j 6=k

Pr[j collides with k] =n − 1

N< α = O(1).

So with tables of this size, expected time/operation is O(1).

Page 30: CS 261: Graduate Data Structures Week 2: Dictionaries and

Expected size of the biggest chain

A random or arbitrary key has expected time per operation O(1)

But what if attacker chooses key whose cell has max # keys?

(Again, assuming constant load factor)

Chernoff bound: The probability that any given cell has ≥ xα keysis at most (

ex

xx

)α(and is lower-bounded by a similar expression)

For x = C log nlog log n , simplifies to 1/Nc for some c (depending on C )

c > 1 ⇒ high probability of no cells bigger than xα

c < 1 ⇒ expect many cells bigger than xα

Can prove: Expected size of largest cell is Θ( log nlog log n )

Page 31: CS 261: Graduate Data Structures Week 2: Dictionaries and

Linear probing

Page 32: CS 261: Graduate Data Structures Week 2: Dictionaries and

What is linear probing?

The simplest possible probe sequence:Look for key k in cells h(k), h(k) + 1, h(k) + 2, . . . (mod N)

Invented by Gene Amdahl, Elaine M. McGraw, and Arthur Samuel,1954, and first analyzed by Donald Knuth, 1963

I Fast in practice: simple lookups and insertions,works well with cached memory

I Commonly used

I Requires a high-quality hash function (next week)

I Load factor α must be < 1 (else no room for all keys)

Page 33: CS 261: Graduate Data Structures Week 2: Dictionaries and

Lookups and insertions

With array A of length N, holding n keys, and hash function h:

To look up value for key k :

i = h(k)

while A[i] has wrong key:

i = (i + 1) % N

if A[i] is empty:

raise exception

return value from A[i]

To set value for k to x :

i = h(k)

while A[i] has wrong key:

i = (i + 1) % N

if A[i] is empty:

n += 1

if n/N too large:

expand A and rebuild

A[i] = k,x

Page 34: CS 261: Graduate Data Structures Week 2: Dictionaries and

Example

key value hash

X 2 6

Y 3 7

Z 5 6

A:i = 5 6 7 8

X,2

X,2 Y,3

X,2 Y,3 Z,5

Default position for Z was filled ⇒ placed at the next available cell

Have to be careful with deletion: blanking the cell for X wouldmake the lookup algorithm think Z is also missing!

Page 35: CS 261: Graduate Data Structures Week 2: Dictionaries and

Deletion

First, find key and blank cell:

i = h(k)

while A[i] has wrong key:

i = (i + 1) % N

if A[i] is empty:

raise exception

A[i] = empty

Then, pull other keys forward:

j = (i + 1) % N

while A[j] nonempty:

k = key in A[j]

if h(k) <= i < j (mod N):

A[i] = A[j]

A[j] = empty

i = j

j = (j + 1) % N

Result = As if remaining keys inserted without the deleted key

Page 36: CS 261: Graduate Data Structures Week 2: Dictionaries and

Blocks of nonempty cells

The analysis is controlled by the lengths of contiguous blocks ofnonempty cells (with empty cells at both ends)

Any sequence of B cells is a block only when:

I Nothing hashes to the empty cells at both ends

I Exactly B items hash to somewhere inside

I For each i , at least i items hash to the first i cells

Our analysis will only use the middle condition:Exactly B items hash into the block

Page 37: CS 261: Graduate Data Structures Week 2: Dictionaries and

Probability of forming a block

Suppose we have a sequence of exactly B cells

Expected number of keys hashing into it: αB

To be a block, actual number = B = 1αexpected

Chernoff bound:Probability this happens is at most(

e1/α−1

(1/α)1/α

)αB= cB

for some constant c < 1 depending on α

Long sequences of cells are exponentially unlikely to form blocks!

Page 38: CS 261: Graduate Data Structures Week 2: Dictionaries and

Linear probing analysis

Expected time per operation on key k

= O(expected length of block containing h(k))

=∑

possible blocks

Pr[it is a block]× length

=∑`

(number of blocks of length ` containing k)× c` × `

=∑`

`× c` × `

= O(1)

(because the exponentially-small c` overwhelms the factors of `)

Page 39: CS 261: Graduate Data Structures Week 2: Dictionaries and

Expected length of the longest block

A random or arbitrary key has expected time per operation O(1)

But what if attacker chooses key whose block has max # length?

Same analysis ⇒ blocks of length C log n have probability 1/NΘ(1)

(inverse-exponential in length) of existing

Large C ⇒ high probability of no blocks bigger than C log n

Small C ⇒ expect many blocks bigger than C log n

Can prove: Expected size of largest block is Θ(log n)

Page 40: CS 261: Graduate Data Structures Week 2: Dictionaries and

Cuckoo hashing

Page 41: CS 261: Graduate Data Structures Week 2: Dictionaries and

Main ideas

Cuckoo = bird that lays eggs in other birds’ nests

when it hatches, the baby pushes other eggs and chicks out

Cuckoo hashing = open addressing, only two possible cells per key

Invented by Rasmus Pagh and Flemming Friche Rodler in 2001

Lookup: always O(1) time – just look in those two cells

Deletion: easy, just blank your cell

Insertion: if a key is already in your cell, push it out(and recursively insert it into its other cell)

Page 42: CS 261: Graduate Data Structures Week 2: Dictionaries and

Comparison with other methods

When an adversary can pick a bad key, other methods can be slow:

Θ(log n/ log log n) for hash chaining

Θ(log n) for linear probing

In contrast, cuckoo hash lookup is always O(1)

Cuckoo hashing is useful when keys can be adversarialor when lookups must always be fast

e.g. internet packet filter

But for fast average case, linear probing may be better

Page 43: CS 261: Graduate Data Structures Week 2: Dictionaries and

Requirements and variations

Sometimes the two cells for each key are in two different arrays butit works equally well for both to be in a single array

(We will use the single-array version)

Basic method needs load factor α < 1/2 to avoid infinite recursion

More complicated variations can handle any α < 1

I Store a fixed number of keys per cell, not just one

I Keep a “stash” of O(1) keys that don’t fit

Page 44: CS 261: Graduate Data Structures Week 2: Dictionaries and

Easy operations: Lookup and delete

With array A, hash functions h1 and h2 such that h1(k) 6= h2(k):

To look up value for key k :

if A[h1(k)] contains k:

return value in A[h1(k)]

if A[h2(k)] contains k:

return value in A[h2(k)]

raise exception

To delete key k :

if A[h1(k)] contains k:

empty A[h1(k)] and return

if A[h2(k)] contains k:

empty A[h2(k)] and return

raise exception

Page 45: CS 261: Graduate Data Structures Week 2: Dictionaries and

Insertion

To insert key k with value x :

i = h1(k)

while A[i] is non-empty:

swap k,x with A[i]

i = h1(k) + h2(k) - i

A[i] = k,x

Not shown: test for infinite loop

If we ever return to a state where we are trying to re-insert theoriginal key k into its original cell h1(k), the insertion fails

(put k into the stash or rebuild the whole data structure)

Page 46: CS 261: Graduate Data Structures Week 2: Dictionaries and

Example

key value h1

X 7 2

Y 9 3

Z 8 2

h2

4

4

3

W 6 2 5

A:i = 2 3 4 5

X,7

X,7

X,7

Y,9

Y,9Z,8

X,7 Y,9 W,6Z,8

Page 47: CS 261: Graduate Data Structures Week 2: Dictionaries and

Graph view of cuckoo state

Vertices = array cells

Edges = pairs of cells used by the same keyDirected away from cell where the key is stored

key value h1

X 7 2

Y 9 3

Z 8 2

h2

4

4

3

W 6 2 5

A:i = 2 3 4 5

X,7 Y,9 W,6Z,8

A[2]

A[3] A[4]

A[5]

Key properties: Each vertex has at most one outgoing edge

Each component is a pseudotree (cycle with trees attached)

Page 48: CS 261: Graduate Data Structures Week 2: Dictionaries and

Graph view of key insertion

Add edge h1(k)→ h2(k) to the graph

If h1(k) already had an outgoing edge:

Follow path of outgoing edges, reversing each edge in the path

Case analysis:

I h1(k) was in a tree: path stops at previous root

I h1(k) was in a pseudotree and h2(k) was in a tree: pathloops through cycle, comes back out, reverses new edge, andstops at root of tree

I h1(k) and h2(k) are both in pseudotrees (or samepseudotree): fails, too many edges for all vertices to have onlyone outgoing edge

Page 49: CS 261: Graduate Data Structures Week 2: Dictionaries and

Summary of graph interpretation

The insertion algorithm

I Works correctly when the undirected graph with edgesh1(k)—h2(k) has at most one cycle per component

I Fails when any component has two cycles(equivalently: more edges than vertices)

I Takes time proportional to the component size

The graph is random (each two vertices equally likely to be edge)

Page 50: CS 261: Graduate Data Structures Week 2: Dictionaries and

We know a lot about random graphs!

For n vertices and αn edges, α < 1/2:

I With high probability, all components are trees or pseudotrees

I Prob. component of v has size ` is exponentially small in `

I Expected size of component containing v is O(1)

I But expected largest component size Θ(log n)

I Expected number of pseudotrees is O(1)

For α = 1/2:

I There is a giant component with Θ(n2/3) vertices(with high probability)

I The rest of the graph is still small trees and pseudotrees

Page 51: CS 261: Graduate Data Structures Week 2: Dictionaries and

What random graphs imply for cuckoo hashing

For load factor α < 1/2:

I With high probability, all keys can be inserted

I Expected time per insertion is O(1)

I Expected time of slowest insertion is O(log n)

For load factor α ≥ 1/2:

I It doesn’t work (in simplest variation)

Page 52: CS 261: Graduate Data Structures Week 2: Dictionaries and

k-independent hash functions

Page 53: CS 261: Graduate Data Structures Week 2: Dictionaries and

The problem

All analysis so far has assumed hash function is random

But that is rarely achievable in practice

I Cryptographic functions act like random but too slow

I IdentityHashMap not usable in all applicationsand doesn’t allow changing to a new hash function

Many software libraries use ad-hoc hash functions that arearbitrary, but not random

I We can’t prove anything about how well they work!

Instead, we want a function that

I Can be constructed using only a small seed of randomness

I Is fast (theoretically and in practice) to evaluate

I Can be proven to work well with hashing algorithms

Page 54: CS 261: Graduate Data Structures Week 2: Dictionaries and

k-independence

Choose function h randomly from a bigger family H of functions

If H = all functions, h is uniformly random (previous assumption)

If H is smaller, h will be less random

Define H to be k-independent if every k-tuple of keys hasindependent outputs (every tuple of outputs is equally likely)

Bigger values of k give stronger independence guarantees

An example of a (bad) 1-independent hash function: choose onerandom number r and define hr to ignore its argument and return r

So we are selecting a function randomly from the set H = hr

Page 55: CS 261: Graduate Data Structures Week 2: Dictionaries and

Is k-independence enough?

Expected-time analysis of hash chaining only considers pairs of keys

(expected time is sum of probabilities that another key collideswith given key)

If we use a 2-independent hash function, these probabilities are thesame as for a fully independent hash function

Expected-time analysis of linear probing has been done with5-independent hashing [Pagh, Pagh & Ruzic]

But there exist 4-independent hash functions designed to makelinear probing bad [Patrascu & Thorup]

Cuckoo hashing requires Θ(log n)-independence

Page 56: CS 261: Graduate Data Structures Week 2: Dictionaries and

Algebraic method for k-independence

From b-bit numbers (that is, 0 ≤ value < 2b to the range [0,N − 1]

Choose (nonrandom) prime number p > 2b

Choose k random coefficients a0, a1, . . . ak−1 (mod p)

h(x) =

((∑i

aixi

)mod p

)mod N

Works because, for every k-tuple of keys and every k-tuple ofoutputs, exactly one polynomial mod p produces that output

O(k) arithmetic operations but multiplications can be slow

Page 57: CS 261: Graduate Data Structures Week 2: Dictionaries and

Tabulation hashing

Represent key x as a base-B number for an appropriate base Bx =

∑i xiB

i with 0 ≤ xi < B

E.g. for B = 256, xi can be calculated by (x>>(i<<3))&0xff

(using only shifts and masks, no multiplication)

Let d be number of digits (d = 4 for 32-bit keys and B = 256)

Initialize: fill a d × B table T with random numbers

h(x) = bitwise exclusive or of T [i , xi ] (i = 0, 1, . . . d − 1)

3-independent but not 4-independent; works anyway for linearprobing and static cuckoo hashing [Patrascu & Thorup]

Page 58: CS 261: Graduate Data Structures Week 2: Dictionaries and

Summary

Page 59: CS 261: Graduate Data Structures Week 2: Dictionaries and

Summary

I Dictionary problem: Updating and looking up key–value pairs

I In standard libraries (e.g. Java HashMap, Python dict)

I Hash tables: direct addressing + hash function

I Hash functions: perfect, cryptographic, random,k-independent (algebraic and tabulation)

I Hash chaining, expected time, expected worst case

I Linear probing, expected time, expected worst case

I Cuckoo hashing and random graphs

I Reviewed basic probability theory + Chernoff bound