hashing cs 110: data structures and algorithms first semester, 2010-2011
TRANSCRIPT
HashingCS 110: Data Structures and
Algorithms
First Semester, 2010-2011
Hashing
►In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast
►Example: empdata[1000]index = employee ID number►Search for employee with ID number 500► return empdata[500]►Running Time: O(1)
Hash Table
►A data structure implemented as an array of objects, where the search keys correspond to the array indices
►Insert and find operations involve straight forward array accesses: O(1) time complexity
About Hash Tables
►In the first example shown, it was relatively easy since employee number is an integer
►A few problems may arise in different situations
About Hash Table
►Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical►Need to map large integer values to
smaller array indices►Problem 2: What if the key is a word in
the English Alphabet (e.g. last names)►Need to map names to integers
(indices)
Large Values to Small Values
►Hash function: converts a number from a large range into a number from a smaller range (the range of array indices)
►Size of the array►Rule of thumb: the array size should be
about twice the size of the data set►For 50,000 words, use an array of
100,000 elements
Hash Function and Modulo
►Simplest Hash Function: achieved by using the modulo function (returns the remainder)►For example, 33 % 10 = 3►General Formula:
LargeNumber % SmallRange
Hash Functions for Names
►Sum of Digits Method►Map the alphabet A to Z to the
numbers 1 to 26 (a=1, b=2, c=3, etc)►Add the total of the letters►For example, “cats”
►c=3, a=1, t=20, s=19, 3+1+20+19=43►“cats” will be stored using index 43
►Use modulo to map to a smaller array
Collisions
►Problem►Too many words with the same index►“was”, “tin”, “give”, “tend”, “moan”,
“tick” and several other words add to 43
►These are called collisions: case where two different search keys hash to the same index value
Collisions
►Can occur even when dealing with integers►Suppose the size of the hash table is
100►Keys 158 and 358 hash to the same
value when using the modulo hash function
Collision Resolution Policy
►Need to know what to do when a collision occurs; i.e. during an insert operation; What if the array slot is already occupied?
►Most common policy: go to the next available slot►“wrap around” the array if necessary
Collision Resolution Policy
►Consequence: when searching, use the hash function, first check whether the element is the one you are looking for
►If not, try the next slots►How do you know if the element is
not in the array?
Probe Sequence
►Sequence of indices that serve as array slots where a key value would map to
►The first index in the probe sequence is the home position; the value of the hash function
►The next indices are the alternative slots
Probe Sequence
►Suppose the array size is 10, and the hash function is h(K) = K%10.
►The probe sequence for K=25 is:►5,6,7,8,9,0,1,2,3,4►Here, we assume that most common
collision resolution policy of going to the next slot: p(K,i) = I
►Goal: exhaust array slots
Hash Table Operations
►Insert object Obj with key value K► home h(K)for i 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then throw exception “error” // or overwrite it else if HT[pos] is null then HT[pos] Obj break;
Hash Table Operations
►Finding an object with key value K► home h(K)for i 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then return HT[pos] else if HT[pos] is null then throw exception “not found”
Hash Table Operations
►Although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n)
►Something to think about: characterize the worst-case scenarios for insert and find
Removing Elements
►Removing an element from a hash table during a delete operation poses a problem
►If we set the corresponding hash table entry to null, then succeeding find operations might not work properly► Recall that for the find algorithm, seeing a
null means a target element is not found but in fact the element might be in a next slot
Removing Elements
►Solution: tombstone►Arrange it so that deleted entries
seem null when inserting, but don’t seem null when searching
►Requires a simple flag on the objects stored
Hash Tables in Java
►java.util.Hashtable►Important methods for Hashtable
class►put(Object key, Object entry)►Object get(Object key)► remove(Object key)►boolean constainsKey(Object key)
Summary
►Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations►Caveat: O(n) in the worst-case
because of the possibility of collisions
Summary
►Requires a hash function(maps keys to array indices) and a collision resolution policy►Probe sequence depicts a sequence
of array slots that an object would occupy, given its key
►In Java: use the Hashtable class