ch8.dictionaries hashing.handouts
TRANSCRIPT
-
8/8/2019 Ch8.Dictionaries Hashing.handouts
1/7
Dictionaries 2/19/08 14:
2/19/08 14:15 Dictionaries 1
Dictionaries & Hashing (Ch 8)
01234 451-229-0004
981-101-0002
025-612-0001
2/19/08 14:15 Dictionaries 2
Outline and Reading
Dictionary ADT (8.1)
Hash Tables (8.2)
Ordered Dictionaries (8.3)
Binary search (8.3.3)
Lookup table (8.3.2)
Skip Lists (8.4)
2/19/08 14:15 Dictionaries 3
Dictionary ADT (8.1.1)The dictionary ADT models asearchable collection of key-element items
The main operations of adictionary are searching,inserting, and deleting items
Multiple items with the samekey are allowed
Applications: address book
credit card authorization
mapping host names (e.g.,cs16.net) to internet addresses(e.g., 128.148.34.101)
Dictionary ADT methods:
find(k): if the dictionary has
an item with key k, returnsthe position of this element,
else, returns a null position.
insertItem(k, o): inserts item(k, o) into the dictionary
removeElement(k): if thedictionary has an item withkey k, removes it from thedictionary and returns its
element. An error occurs ifthere is no such element.
size(), isEmpty()
keys(), Elements()
2/19/08 14:15 Dictionaries 4
Unordered Sequence- Log File (8.1.2)
A log file is a dictionary implemented by means of an unsortedsequence
We store the items of the dictionary in a sequence (based on a doubly-linked lists or a circular array), in arbitrary order
Performance: insertItem takes O(1) time since we can insert the new item at the
beginning or at the end of the sequence
find and removeElement take O(n) time since in the worst case (the item isnot found) we traverse the entire sequence to look for an item with thegiven key
Space - can be O(n), where n is the number of elements in the dictionary
The log file is effective only for dictionaries of small size or fordictionaries on which insertions are the most common operations, whilesearches and removals are rarely performed (e.g., historical record oflogins to a workstation)
2/19/08 14:15 Dictionaries 5
Direct Address Table
A direct address table is a dictionary in which
The keys are in the range {0,1,2,,N}
Stored in an array of size N - T[0,N]
Item with key k stored in T[k]
Performance: insertItem, find, and removeElement all take O(1) time
Space - requires space O(N), independent of n, the number ofitems stored in the dictionary
The direct address table is not space efficient unless the rangeof the keys is close to the number of elements to be stored inthe dictionary, I.e., unless n is close to N.
2/19/08 14:15 Dictionaries 6
Hash Tables
Hashing
Hash table (an array) of size N, H[0,N]
Hash function h that maps keys to indices in H
Issues Hash functions - need method to transform key to an index in H
that will have nice properties.
Collisions - some keys will map to the same index of H (otherwisewe have a Direct Address Table). Several methods to resolve thecollisions
Chaining - put elts that hash to same location in a linked list
Open addressing - if a collision occurs, have a method to select anotherlocation in the table.
Probe sequences
-
8/8/2019 Ch8.Dictionaries Hashing.handouts
2/7
Dictionaries 2/19/08 14:
2/19/08 14:15 Dictionaries 7
Hash Functions andHash Tables (8.2)
Ahash functionh maps keys of a given type to
integers in a fixed interval [0,N1]Example:
h(x) =xmodN
is a hash function for integer keys
The integer h(x) is called the hash value of key x
A hash table for a given key type consists of
Hash function h
Array (called table) of size N
When implementing a dictionary with a hash table, the goal is tostore item (k, o) at index i=h(k)
2/19/08 14:15 Dictionaries 8
Example
We design a hash table for
a dictionary storing items(SSN, Name), where SSN(social security number) is anine-digit positive integer
Our hash table uses anarray of sizeN= 10,000 andthe hash functionh(x) = last four digits ofx
0
1234
999799989999
451-229-0004
981-101-0002
200-751-9998
025-612-0001
2/19/08 14:15 Dictionaries 9
Hash Functions (8.2.2)
A hash function isusually specified as the
composition of twofunctions:
Hash code map: h1:keysintegers
Compression map: h2: integers [0,N1]
The hash code map isapplied first, and thecompression map isapplied next on theresult, i.e.,
h(x) = h2(h1(x))
The goal of the hashfunction is todisperse the keys inan apparently randomway
2/19/08 14:15 Dictionaries 10
Hash Code Maps (8.2.3)Memory address: We reinterpret the memory
address of the key object asan integer
Good in general, except fornumeric and string keys
Integer cast: We reinterpret the bits of the
key as an integer
Suitable for keys of lengthless than or equal to thenumber of bits of the integertype (e.g., char, short, intand float on many machines)
Component sum: We partition the bits of
the key into componentsof fixed length (e.g., 16or 32 bits) and we sumthe components(ignoring overflows)
Suitable for numeric keysof fixed length greaterthan or equal to thenumber of bits of theinteger type (e.g., longand double on many
machines)
2/19/08 14:15 Dictionaries 11
Hash Code Maps (cont.)
Polynomial accumulation: We partition the bits of the key
into a sequence of componentsof fixed length (e.g., 8, 16 or 32bits)
a0 a1 an1 We evaluate the polynomial
p(z)= a0+a1z +a2z2+
+an1zn1
at a fixed valuez, ignoringoverflows
Especially suitable for strings(e.g., the choice z=33gives atmost 6 collisions on a set of50,000 English words)
Polynomial p(z) can beevaluated in O(n) time
using Horners rule:
The followingpolynomials aresuccessively computed,each from the previousone in O(1) time
p0(z)= an1
pi(z)= ani1+zpi1(z)
(i=1, 2, , n 1)
We havep(z) = pn1(z)
2/19/08 14:15 Dictionaries 12
Compression Maps (8.2.4)
Division: h2 (y) = y mod N
The sizeNof thehash table is usuallychosen to be a prime
The reason has to dowith number theory
and is beyond thescope of this course
Multiply, Add andDivide (MAD): h2 (y) =(ay + b)mod N
a and b arenonnegative integerssuch thata mod N 0
Otherwise, everyinteger would map tothe same value b
-
8/8/2019 Ch8.Dictionaries Hashing.handouts
3/7
Dictionaries 2/19/08 14:
2/19/08 14:15 Dictionaries 13
Collision Handling(8.2.5)
Collisions occur whendifferent elements aremapped to the samecell
Chaining: let eachcell in the table pointto a linked list ofelements that mapthere
Chaining is simple,but requiresadditional memoryoutside the table
0
1234 451-229-0004 981-101-0004
025-612-0001
2/19/08 14:15 Dictionaries 14
Exercise: chaining
Assume you have a hash table H withN=9 slots (H[0,8]) and let the hashfunction be h(k)=k mod N.
Demonstrate (by picture) the insertionof the following keys into a hash tablewith collisions resolved by chaining. 5, 28, 19, 15, 20, 33, 12, 17, 10
2/19/08 14:15 Dictionaries 15
Linear ProbingOpen addressing: thecolliding item is placed in adifferent cell of the table
Linear probing handlescollisions by placing thecolliding item in the next(circularly) available table cell.So the i-th cell checked is: H(k,i) = (h(k)+i)mod N
Each table cell inspected isreferred to as a probe
Colliding items lump together,causing future collisions tocause a longer sequence of
probes
Example: h(x) = xmod13
Insert keys 18, 41, 22,44, 59, 32, 31, 73, in thisorder
0 1 2 3 4 5 6 7 8 9 10 11 12
41 18445932223173
0 1 2 3 4 5 6 7 8 9 10 11 12
2/19/08 14:15 Dictionaries 16
Search with Linear Probing
Consider a hash tableAthat uses linear probing
find(k) We start at cell h(k)
We probe consecutivelocations until one of thefollowing occurs An item with key kis
found, or
An empty cell is found,or
Ncells have beenunsuccessfully probed
Algorithmfind(k)
i h(k)
p0
repeat
c A[i]
ifc=
returnPosition(null)
else ifc.key () =k
returnPosition(c)
else
i(i+1)mod N
p p+1
until p=N
returnPosition(null)
2/19/08 14:15 Dictionaries 17
Updates with Linear ProbingTo handle insertions anddeletions, we introduce aspecial object, called
AVAILABLE, which replacesdeleted elements
removeElement(k) We search for an item with
key k
If such an item (k, o) isfound, we replace it with thespecial itemAVAILABLEand we return the position ofthis item
Else, we return a nullposition
insertItem(k, o) We throw an exception
if the table is full
We start at cell h(k)
We probe consecutivecells until one of thefollowing occurs A cell i is found that is
either empty or storesAVAILABLE, or
Ncells have beenunsuccessfully probed
We store item (k, o) incell i
2/19/08 14:15 Dictionaries 18
Exercise: open addressing &linear probing
Assume you have a hash table H withN=11 slots (H[0,10]) and let the hashfunction be h(k)=k mod N.
Demonstrate (by picture) the insertionof the following keys into a hash tablewith collisions resolved by linearprobing. 10, 22, 31, 4, 15, 28, 17, 88, 59
-
8/8/2019 Ch8.Dictionaries Hashing.handouts
4/7
Dictionaries 2/19/08 14:
2/19/08 14:15 Dictionaries 19
Double HashingDouble hashing uses asecondary hash function d(k)
and handles collisions byplacing an item in the firstavailable cell of the seriesh(k,i) =(h(k)+id(k)) modNfor i= 0, 1, ,N1
The secondary hash functiond(k) cannot have zero values
The table size Nmust be aprime to allow probing of allthe cells
Common choice of
compression map for thesecondary hash function:
d2(k) =qkmod q
where q
-
8/8/2019 Ch8.Dictionaries Hashing.handouts
5/7
-
8/8/2019 Ch8.Dictionaries Hashing.handouts
6/7
Dictionaries 2/19/08 14:
2/19/08 14:15 Dictionaries 31
What is a Skip ListAskip list for a set Sof distinct (key, element) items is a series of lists S0,S1 , ,Sh such that
Each listSi contains the special keys + and ListS0 contains the keys ofSin nondecreasing order
Each list is a subsequence of the previous one, i.e.,S0S1 Sh
ListSh contains only the two special keys
We show how to use a skip list to implement the dictionary ADT
56 64 78 +31 34 44 12 23 26
+
+31
64 +31 34 23
S0
S1
S2
S3
2/19/08 14:15 Dictionaries 32
SearchWe search for a keyxin a a skip list as follows:
We start at the first position of the top list
At the current positionp, we comparexwithy
key(after(p))x= y: we return element(after(p))
x> y: we scan forward
x< y: we drop down
If we try to drop down past the bottom list, we return NO_SUCH_KEY
Example: search for 78
+
S0
S1
S2
S3
+31
64 +31 34 23
56 64 78 +31 34 44 12 23 26
2/19/08 14:15 Dictionaries 33
Randomized AlgorithmsArandomized algorithmperforms coin tosses (i.e.,uses random bits) to controlits execution
It contains statements of thetype
brandom()
if b= 0
do A
else { b= 1}
do B
Its running time depends onthe outcomes of the cointosses
We analyze the expectedrunning time of a randomizedalgorithm under the followingassumptions
the coins are unbiased, and
the coin tosses are independent
The worst-case running time ofa randomized algorithm is oftenlarge but has very lowprobability (e.g., it occurs whenall the coin tosses give heads)
We use a randomized algorithmto insert items into a skip list
2/19/08 14:15 Dictionaries 34
To insert an item (x, o) into a skip list, we use a randomizedalgorithm: We repeatedly toss a coin until we get tails, and we denote with i
the number of times the coin came up heads
Ifi h, we add to the skip list new lists Sh+1, ,Si+1, eachcontaining only the two special keys
We search for xin the skip list and find the positions p0, p1 , ,piof the items with largest key less thanxin each list S0,S1, ,Si
For j 0, , i, we insert item (x, o) into list Sj after positionpjExample: insert key 15, with i= 2
Insertion
+ 10 36
+
23
23 +
S0
S1
S2
+
S0
S1
S2
S3
+ 10 362315
+ 15
+ 2315p0
p1
p2
2/19/08 14:15 Dictionaries 35
Deletion
To remove an item with key xfrom a skip list, we proceed asfollows:
We search for xin the skip list and find the positions p0, p1 , ,piof the items with key x, where positionpj is in list Sj
We remove positions p0, p
1, ,p
i
from the listsS0,S
1, ,S
i
We remove all but one list containing only the two special keys
Example: remove key 34
+4512
+
23
23 +
S0
S1
S2
+
S0
S1
S2
S3
+4512 23 34
+34
+23 34p0
p1
p2
2/19/08 14:15 Dictionaries 36
Implementation
We can implement a skip listwith quad-nodes
A quad-node stores:
item
link to the node before link to the node after
link to the node below
link to the node after
Also, we define special keysPLUS_INF and MINUS_INF,and we modify the keycomparator to handle them
x
quad-node
-
8/8/2019 Ch8.Dictionaries Hashing.handouts
7/7
Dictionaries 2/19/08 14:
2/19/08 14:15 Dictionaries 37
Space Usage
The space used by a skip listdepends on the random bits
used by each invocation of theinsertion algorithm
We use the following two basicprobabilistic facts:Fact 1: The probability of getting i
consecutive heads whenflipping a coin is 1/2i
Fact 2: If each ofn items ispresent in a set withprobabilityp, the expected sizeof the set is np
Consider a skip list with n items
By Fact 1, we insert an item in
list Si with probability 1/
2i
By Fact 2, the expected size oflist Si is n/2
i
The expected number of nodes
used by the skip list is
nnn
h
i
i
h
i
i2
2
1
200