hashing cs 110: data structures and algorithms first semester, 2010-2011

22
Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Upload: alexandra-butler

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

HashingCS 110: Data Structures and

Algorithms

First Semester, 2010-2011

Page 2: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Hashing

►In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast

►Example: empdata[1000]index = employee ID number►Search for employee with ID number 500► return empdata[500]►Running Time: O(1)

Page 3: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Hash Table

►A data structure implemented as an array of objects, where the search keys correspond to the array indices

►Insert and find operations involve straight forward array accesses: O(1) time complexity

Page 4: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

About Hash Tables

►In the first example shown, it was relatively easy since employee number is an integer

►A few problems may arise in different situations

Page 5: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

About Hash Table

►Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical►Need to map large integer values to

smaller array indices►Problem 2: What if the key is a word in

the English Alphabet (e.g. last names)►Need to map names to integers

(indices)

Page 6: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Large Values to Small Values

►Hash function: converts a number from a large range into a number from a smaller range (the range of array indices)

►Size of the array►Rule of thumb: the array size should be

about twice the size of the data set►For 50,000 words, use an array of

100,000 elements

Page 7: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Hash Function and Modulo

►Simplest Hash Function: achieved by using the modulo function (returns the remainder)►For example, 33 % 10 = 3►General Formula:

LargeNumber % SmallRange

Page 8: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Hash Functions for Names

►Sum of Digits Method►Map the alphabet A to Z to the

numbers 1 to 26 (a=1, b=2, c=3, etc)►Add the total of the letters►For example, “cats”

►c=3, a=1, t=20, s=19, 3+1+20+19=43►“cats” will be stored using index 43

►Use modulo to map to a smaller array

Page 9: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Collisions

►Problem►Too many words with the same index►“was”, “tin”, “give”, “tend”, “moan”,

“tick” and several other words add to 43

►These are called collisions: case where two different search keys hash to the same index value

Page 10: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Collisions

►Can occur even when dealing with integers►Suppose the size of the hash table is

100►Keys 158 and 358 hash to the same

value when using the modulo hash function

Page 11: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Collision Resolution Policy

►Need to know what to do when a collision occurs; i.e. during an insert operation; What if the array slot is already occupied?

►Most common policy: go to the next available slot►“wrap around” the array if necessary

Page 12: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Collision Resolution Policy

►Consequence: when searching, use the hash function, first check whether the element is the one you are looking for

►If not, try the next slots►How do you know if the element is

not in the array?

Page 13: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Probe Sequence

►Sequence of indices that serve as array slots where a key value would map to

►The first index in the probe sequence is the home position; the value of the hash function

►The next indices are the alternative slots

Page 14: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Probe Sequence

►Suppose the array size is 10, and the hash function is h(K) = K%10.

►The probe sequence for K=25 is:►5,6,7,8,9,0,1,2,3,4►Here, we assume that most common

collision resolution policy of going to the next slot: p(K,i) = I

►Goal: exhaust array slots

Page 15: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Hash Table Operations

►Insert object Obj with key value K► home h(K)for i 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then throw exception “error” // or overwrite it else if HT[pos] is null then HT[pos] Obj break;

Page 16: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Hash Table Operations

►Finding an object with key value K► home h(K)for i 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then return HT[pos] else if HT[pos] is null then throw exception “not found”

Page 17: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Hash Table Operations

►Although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n)

►Something to think about: characterize the worst-case scenarios for insert and find

Page 18: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Removing Elements

►Removing an element from a hash table during a delete operation poses a problem

►If we set the corresponding hash table entry to null, then succeeding find operations might not work properly► Recall that for the find algorithm, seeing a

null means a target element is not found but in fact the element might be in a next slot

Page 19: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Removing Elements

►Solution: tombstone►Arrange it so that deleted entries

seem null when inserting, but don’t seem null when searching

►Requires a simple flag on the objects stored

Page 20: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Hash Tables in Java

►java.util.Hashtable►Important methods for Hashtable

class►put(Object key, Object entry)►Object get(Object key)► remove(Object key)►boolean constainsKey(Object key)

Page 21: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Summary

►Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations►Caveat: O(n) in the worst-case

because of the possibility of collisions

Page 22: Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

Summary

►Requires a hash function(maps keys to array indices) and a collision resolution policy►Probe sequence depicts a sequence

of array slots that an object would occupy, given its key

►In Java: use the Hashtable class