dictionaries and their implementations cs 302 - data structures mehmet h gunes modified from...
TRANSCRIPT
![Page 1: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/1.jpg)
Dictionaries and Their Implementations
CS 302 - Data StructuresMehmet H Gunes
Modified from authors’ slides
![Page 2: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/2.jpg)
Contents
• The ADT Dictionary
• Possible Implementations
• Selecting an Implementation
• Hashing
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 3: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/3.jpg)
The ADT Dictionary
• Recall concept of a sort key in Chapter 11
• Often must search a collection of data for specific information– Use a search key
• Applications that require value-oriented operations are frequent
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 4: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/4.jpg)
The ADT Dictionary
• A collection of data about certain cities
Consider the need for searches through this data based on other
than the name of the city
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 5: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/5.jpg)
ADT Dictionary Operations
• Test whether dictionary is empty.• Get number of items in dictionary.• Insert new item into dictionary.• Remove item with given search key from
dictionary.• Remove all items from dictionary.
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 6: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/6.jpg)
ADT Dictionary Operations
• Get item with a given search key from dictionary.
• Test whether dictionary contains an item with given search key.
• Traverse items in dictionary in sorted search-key order.
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 7: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/7.jpg)
ADT Dictionary
• View interface, Listing 18-1
• UML diagram for a class of dictionariesData Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 8: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/8.jpg)
Possible Implementations
• Sorted (by search key), array-based
• Sorted (by search key), link-based
• Unsorted, array-based
• Unsorted, link-based
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 9: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/9.jpg)
Possible Implementations
• A dictionary entry
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 10: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/10.jpg)
Possible Implementations
• The data members for two sorted linear implementations of the ADT dictionary for the data: (a) array based; (b) link based
• View header file for class of dictionary entries, Listing 18-2
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 11: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/11.jpg)
Possible Implementations• The data members
for a binary search tree implementation of the ADT dictionary for the data
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 12: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/12.jpg)
Sorted Array-Based Implementation of ADT Dictionary• Consider header file for the class
ArrayDictionary, Listing 18-3
• Note definition of method add, Listing 18-A– Bears responsibility for keeping the array items
sorted
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 13: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/13.jpg)
Binary Search Tree Implementation of ADT Dictionary
• Dictionary class will use composition– Will have a binary search tree as one of its data
members– Reuses the class BinarySearchTree from
Chapter 16
• View header file, Listing 18-4
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 14: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/14.jpg)
Selecting an Implementation
• Reasons for considering linear implementations– Perspective,– Efficiency– Motivation
• Questions to ask– What operations are needed?– How often is each operation required?
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 15: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/15.jpg)
Selecting an Implementation
• Three Scenarios– Insertion and traversal in no particular order– Retrieval – consider:
• Is a binary search of a linked chain possible?• How much more efficient is a binary search of an array
than a sequential search of a linked chain?
– Insertion, removal, retrieval, and traversal in sorted order – add and remove must:
• Find the appropriate position in the dictionary.• Insert into (or remove from) this position.
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 16: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/16.jpg)
Selecting an Implementation
• Insertion for unsorted linear implementations: (a) array based; (b) link based
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 17: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/17.jpg)
Selecting an Implementation
• Insertion for sorted linear implementations: (a) array based; (b) pointer based
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 18: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/18.jpg)
Selecting an Implementation
• The average-case order of the ADT dictionary operations for various implementations
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 19: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/19.jpg)
Hashing
![Page 20: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/20.jpg)
20
The Search Problem• Unsorted list
– O(N)
• Sorted list– O(logN) using arrays (i.e., binary search)– O(N) using linked lists
• Binary Search tree– O(logN) (i.e., balanced tree)– O(N) (i.e., unbalanced tree)
• Can we do better than this?– Direct Addressing– Hashing
![Page 21: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/21.jpg)
21
Direct Addressing
• Assumptions:
– Key values are distinct
– Each key is drawn from a universe U = {0, 1, . . . , n - 1}
• Idea:
– Store the items in an array, indexed by keys
![Page 22: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/22.jpg)
22
Direct Addressing (cont’d)
• Direct-address table representation:
– An array T[0 . . . n - 1]
– Each slot, or position, in T corresponds to a key in U
– For an element x with key k, a pointer to x will be placed in location T[k] – If there are no elements with key k in the set, T[k] is empty, represented by NIL
Search, insert, delete in O(1) time!
![Page 23: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/23.jpg)
23
Direct Addressing (cont’d)
Example 1: Suppose that the are integers from 1 to 100 and that there are about 100 records.
Create an array A of 100 items and stored the record whose keyis equal to i in in A[i].
|K| = |U|
|K|: # elements in K|U|: # elements in U
![Page 24: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/24.jpg)
24
Direct Addressing (cont’d)
Example 2: Suppose that the keys are 9-digit social security numbers (SSN)
Although we could use the same idea, it would be very inefficient(i.e., use an array of 1 billion size to store 100 records)
|K| << |U|
![Page 25: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/25.jpg)
Hashing
• Binary search tree retrieval have order O(log2n)
• Need a different strategy to locate an item
• Consider a “magic box” as an address calculator– Place/retrieve item from that address in an array– Ideally to a unique number for each key
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 26: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/26.jpg)
Hashing
• Address calculator
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 27: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/27.jpg)
Hashing
• is a means used to order and access elements in a list quickly by using a function of the key value to identify its location in the list. • the goal is O(1) time
• The function of the key value is called a hash function.
![Page 28: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/28.jpg)
28
HashingIdea:
– Use a function h to compute the slot for each key
– Store the element in slot h(k)
• A hash function h transforms a key into an
index in a hash table T[0…m-1]:
h : U → {0, 1, . . . , m - 1}
• We say that k hashes to slot h(k)
![Page 29: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/29.jpg)
Hashing (cont’d)
29
U(universe of keys)
K(actualkeys)
0
m - 1
h(k3)
h(k2) = h(k5)
h(k1)
h(k4)
k1
k4 k2
k5k3
h : U → {0, 1, . . . , m - 1}
hash table size: m
![Page 30: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/30.jpg)
30
Hashing (cont’d)
Example 2: Suppose that the keys are 9-digit social security numbers (SSN)
![Page 31: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/31.jpg)
Advantages of Hashing
• Reduce the range of array indices handled:
m instead of |U|
where m is the hash table size: T[0, …, m-1]
• Storage is reduced.
• O(1) search time (i.e., under assumptions).
![Page 32: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/32.jpg)
32
Properties of Good Hash Functions
• Good hash function properties(1) Easy to compute(2) Approximates a random function i.e., for every input, every output is equally likely.
(3) Minimizes the chance that similar keys hash to the same sloti.e., strings such as pt and pts should hash to different slot.
![Page 33: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/33.jpg)
Hashing
• Pseudocode for getItem
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 34: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/34.jpg)
Hashing
• Pseudocode for remove
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 35: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/35.jpg)
Hash Functions
• Possible algorithms– Selecting digits– Folding– Modulo arithmetic– Converting a character string to an integer
• Use ASCII values• Factor the results, Horner’s rule
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 36: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/36.jpg)
Collisions
36
U(universe of keys)
K(actualkeys)
0
m - 1
h(k3)
h(k2) = h(k5)
h(k1)
h(k4)
k1
k4 k2
k5k3
Collisions occur when h(ki)=h(kj), i≠j
![Page 37: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/37.jpg)
37
Collisions (cont’d)
• For a given set K of keys: – If |K| ≤ m, collisions may or may not happen,
depending on the hash function!
– If |K| > m, collisions will definitely happen • i.e., there must be at least two keys that have the same
hash value
• Avoiding collisions completely might not be easy.
![Page 38: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/38.jpg)
Resolving Collisions
• A collision
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 39: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/39.jpg)
Resolving Collisions
• Approach 1: Open addressing– Probe for another available location– Can be done linearly, quadratically– Removal requires specify state of an item
• Occupied, emptied, removed
– Clustering is a problem– Double hashing can reduce clustering
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 40: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/40.jpg)
40
Open Addressing• Idea: store the keys in the table itself
• No need to use linked lists anymore
• Basic idea:– Insertion: if a slot is full, try another one,
until you find an empty one.
– Search: follow the same probe sequence.
– Deletion: need to be careful!
• Search time depends on the length of
probe sequences!
e.g., insert 14
probe sequence: <1, 5, 9>
![Page 41: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/41.jpg)
Generalize hash function notation:
• A hash function contains two arguments now: (i) key value, and (ii) probe number
h(k,p), p=0,1,...,m-1
• Probe sequence: <h(k,0), h(k,1), h(k,2), …. >
• Example:
e.g., insert 14
Probe sequence: <h(14,0), h(14,1), h(14,2)>=<1, 5, 9>
![Page 42: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/42.jpg)
Generalize hash function notation:
– Probe sequence must be a permutation of <0,1,...,m-1>
– There are m! possible permutations
e.g., insert 14
Probe sequence: <h(14,0), h(14,1), h(14,2)>=<1, 5, 9>
![Page 43: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/43.jpg)
Resolving Collisions
• Linear probing with h ( x ) = x mod 101
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 44: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/44.jpg)
44
Linear probing: Inserting a key• Idea: when there is a collision, check the next available
position in the table:
h(k,i) = (h1(k) + i) mod mi=0,1,2,...
• i=0: first slot probed: h1(k)• i=1: second slot probed: h1(k) + 1 • i=2: third slot probed: h1(k)+2, and so on
• How many probe sequences can linear probing generate? m probe sequences maximum
probe sequence: < h1(k), h1(k)+1 , h1(k)+2 , ....>
wrap around
![Page 45: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/45.jpg)
45
Linear probing: Searching for a key
• Given a key, generate a probe sequence using the same procedure.
• Three cases:(1) Position in table is occupied with an
element of equal key FOUND(2) Position in table occupied with a different element KEEP SEARCHING(3) Position in table is empty NOT
FOUND
0
m - 1
wrap around
![Page 46: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/46.jpg)
46
Linear probing: Searching for a key
• Running time depends on the length of the probe sequences.
• Need to keep probe sequences short to ensure fast search.
0
m - 1
wrap around
![Page 47: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/47.jpg)
47
Linear probing: Deleting a key• First, find the slot containing the key to be deleted.• Can we just mark the slot as empty?
– It would be impossible to retrieve keys inserted after that slot was occupied!
• Solution– “Mark” the slot with a sentinel value
DELETED
• The deleted slot can later be used for insertion.
0
m - 1
e.g., delete 98
![Page 48: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/48.jpg)
Primary Clustering Problem
• Long chunks of occupied slots are created.
• As a result, some slots become more likely than others.
• Probe sequences increase in length. search time increases!!
48
Slot b:2/m
Slot d:4/m
Slot e:5/m
initially, all slots have probability 1/m
![Page 49: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/49.jpg)
Resolving Collisions
• Quadratic probing with h ( x ) = x mod 101
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 50: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/50.jpg)
50
Quadratic probing
i=0,1,2,...
• Clustering is less serious but still a problem
• secondary clustering
• How many probe sequences can quadratic probing
generate?
m -- the initial position determines probe sequence
![Page 51: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/51.jpg)
Resolving Collisions
• Double hashing during the insertion of 58, 14, and 91
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 52: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/52.jpg)
52
Double Hashing(1) Use one hash function to determine the first slot.(2) Use a second hash function to determine the
increment for the probe sequence:h(k,i) = (h1(k) + i h2(k) ) mod m, i=0,1,...
• Initial probe: h1(k)
• Second probe is offset by h2(k) mod m, so on ...
• Advantage: handles clustering better• Disadvantage: more time consuming• How many probe sequences can double hashing
generate? m2
![Page 53: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/53.jpg)
53
Double Hashing: Exampleh1(k) = k mod 13
h2(k) = 1+ (k mod 11)
h(k,i) = (h1(k) + i h2(k) ) mod 13
• Insert key 14:i=0: h(14,0) = h1(14) = 14 mod 13 = 1
i=1: h(14,1) = (h1(14) + h2(14)) mod 13
= (1 + 4) mod 13 = 5i=2: h(14,2) = (h1(14) + 2 h2(14)) mod 13
= (1 + 8) mod 13 = 9
79
69
98
72
50
0
9
4
2
3
1
5
6
7
8
10
11
12
14
![Page 54: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/54.jpg)
Resolving Collisions
• Approach 2: Restructuring the hash table– Each hash location can accommodate more than
one item– Each location is a “bucket” or an array itself– Alternatively, design the hash table as an array of
linked chains – called “separate chaining”
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 55: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/55.jpg)
Resolving Collisions
• Separate chaining
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 56: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/56.jpg)
56
Chaining
• How to choose the size of the hash table m?– Small enough to avoid wasting space.– Large enough to avoid many collisions and keep
linked-lists short.– Typically 1/5 or 1/10 of the total number of
elements.
• Should we use sorted or unsorted linked lists?– Unsorted
• Insert is fast• Can easily remove the most recently inserted elements
![Page 57: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/57.jpg)
Analysis of Hashing with Chaining: Worst Case
• How long does it take to search
for an element with a given key?
• Worst case:
– All n keys hash to the same slot
then O(n) plus time to compute
the hash function
57
0
m - 1
T
chain
![Page 58: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/58.jpg)
58
Analysis of Hashing with Chaining: Average Case
• It depends on how well the hash function distributes the n keys among the m slots
• Under the following assumptions:(1) n = O(m) (2) any given element is equally likely to hash into any of the m slots
i.e., simple uniform hashing property
then O(1) time plus time to compute the hash function
n0 = 0
nm – 1 = 0
T
n2
n3
nj
nk
![Page 59: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/59.jpg)
The Efficiency of Hashing
• Efficiency of hashing involves the load factor alpha (α)
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 60: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/60.jpg)
The Efficiency of Hashing
• Linear probing – average value for α
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 61: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/61.jpg)
The Efficiency of Hashing
• Quadratic probing and double hashing – efficiency for given α
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 62: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/62.jpg)
The Efficiency of Hashing
• Separate chaining – efficiency for given α
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 63: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/63.jpg)
The Efficiency of Hashing
• The relative efficiency of four collision-resolution methods
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 64: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/64.jpg)
The Efficiency of Hashing
• The relative efficiency of four collision-resolution methods
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 65: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/65.jpg)
Maintaining Hashing Performance
• Collisions and their resolution typically cause the load factor α to increase
• To maintain efficiency, restrict the size of α– α 0.5 for open addressing– α 1.0 for separate chaining
• If load factor exceeds these limits– Increase size of hash table– Rehash with new hashing function
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 66: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/66.jpg)
What Constitutes a Good Hash Function?
• Easy and fast to compute?• Scatter data evenly throughout hash table?• How well does it scatter random data?• How well does it scatter non-random data?
• Note: traversal in sorted order is inefficient when using hashing
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 67: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/67.jpg)
Hashing and Separate Chaining for ADT Dictionary
• A dictionary entry when separate chaining is used
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
![Page 68: Dictionaries and Their Implementations CS 302 - Data Structures Mehmet H Gunes Modified from authors’ slides](https://reader035.vdocuments.site/reader035/viewer/2022062423/56649d345503460f94a0a8b4/html5/thumbnails/68.jpg)
Hashing and Separate Chaining for ADT Dictionary
• View Listing 18-5, The class HashedEntry
• Note the definitions of the add and remove functions, Listing 18-B
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013