hashing hashing is another method for sorting and searching data. –hashing makes it easier to add...

27
Hashing • Hashing is another method for sorting and searching data. – Hashing makes it easier to add and remove elements from a data structure. – The worst-case behavior for locating a key is linear – (n). – Java’s standard hash table class is: java.util.Hashtable

Upload: kory-miller

Post on 05-Jan-2016

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Hashing

• Hashing is another method for sorting and searching data.– Hashing makes it easier to add and

remove elements from a data structure.– The worst-case behavior for locating a key

is linear – (n).

– Java’s standard hash table class is: java.util.Hashtable

Page 2: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Hashing

• Hashing usually implements a data structure called a hash table.– A hash table is an effective data structure.– A hash table is a generalization of an

array.– A hash table requires a key to access

data.

Page 3: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Hashing

– A hash table uses an array whose length is proportional to the number of keys actually stored.

– The array index is computed from the key, rather than using the key to access the array.

• The key is a unique identifying value.

Page 4: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Hashing Functions

• Hashing requires the use of a hashing function.– The purpose of the hashing function is to

compute the storage slot from the key.• Maps key values to array indices.

– This calculation reduces the range of array indices that need to be handled.

Page 5: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Hashing Functions– If a hashing function groups key values

together, this is called clustering of the keys.• A good hashing function distributes the key values

uniformly through the array’s index range.• Any hashing function that results in clustering should

be changed.• A good hashing function has an equal likelihood of

hashing a key into any of the slots.• The java.util.Hashtable contains the method hashCode

Page 6: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Hashing Functions

• The division hash function depends upon the remainder of division.– Math.abs(H(k)) % table.length

– When using the division hash function, it is best to have a table size that is a prime number of the form 4n + 3.

– Using the division hash function can result in many collisions.

Page 7: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Hashing Functions

• The mid-square hash function converts the key to an integer, then doubles the key. The function returns the middle digits of the results.

• The multiplicative hash function converts the key to an integer and multiplies it by a constant less than one. The function returns the first few digits of the fractional part of the result.

Page 8: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Example

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

Table

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2)

Page 9: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Collisions

• A collision occurs when the hashing function calculates the same array index for two different objects and one is already stored into the array index location.– Two keys hash to the same slot.

Page 10: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Collision Example

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2) = H(k5)

Table

Page 11: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Open Addressing

• Open addressing ensures that all elements are stored directly into the hash table.– Every table slot contains either data or null.– The problem is that the table can fill up.– The good thing is that there are no external

storage locations for the table elements.

Page 12: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Open Addressing

– Open addressing attempts to resolve collisions using various methods.

Page 13: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Linear Probing

• Linear Probing resolves collisions by placing the data into the next open slot in the table.

• If this slot is open, the data is stored in the slot. • If this slot is not open, the algorithm looks at the

next slot (index) until an open slot is found.

Page 14: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Linear Probing

– It is difficult to delete items from a hash table that uses open addressing.

• Can not simply put null into the slot because may miss information. Instead place Deleted into the empty slot.

– If H’(k) is the ordinary hash function, the linear probing hash function is:

• H(k, i) = (H’(k) + 1) % m where i = 0, 1, 2, … , m and m is the number of elements that can be stored into the table.

Page 15: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Linear Probing

– A problem associated with Linear Probing is called, primary clustering.

• Primary clustering occurs when many items hash into the same slot and long runs of slots are filled up.

• This results in increased search times.

Page 16: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Linear Probing

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2) = H(k5)

Table

H(k5)

Page 17: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Double Hashing

• Double hashing is one of the best methods for dealing with collisions. – The slot location is calculated based upon

the hash function (H1(k)). If the slot is full, then a second hash function is calculated and combined with the first hash function (H(k, i)) to determine a new slot.

Page 18: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Double Hashing

– Assume that:• H1(k) = Math.abs(H(k)) % table.length

• H2(k) = 1 + Math.abs(H(k)) % (table.length – x) where x is a small value; 1, 2, or 3.

– Then:• H(k, i) = (H1(k) + i H2(k) ) % m

Page 19: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Double Hashing

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2) = H(k5)

Table

H(k5)

Page 20: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

External Chaining

• In external chaining the hash table contains an array in which each component can hold more than one element of the hash table.– Essentially, a multiple dimension array or a

linked list of elements can exist for each table slot.

• The typical implementation is that each slot contains a linked list.

Page 21: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

External Chaining

Universe of Keys - U

ActualKeys –K

K1

K5K4

K2 K3

0

m - 1

H(k3)

H(k1)

H(k4)

H(k2)

Table

H(k5)

Page 22: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Load Factor

• The load factor is a fraction that represents the number of elements stored in the table divided by the size of the table’s array.– = the number of elements stored in the table

the size of the table’s array

Page 23: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Load Factor

– If open addressing is used, then each table slot holds at most one element, therefore, the load factor can never be greater than 1.

– If external chaining is used, then each table slot can hold many elements, therefore, the load factor may be greater than 1.

Page 24: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Hashing Analysis

• The worst case analysis for hashing is the case where every key is hashed into the same slot.– (n) – linear time.

• The average time can be much faster.

Page 25: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Average Search Analysis• Searching with Linear probing.

– For a table that is not near full:• ½ ( 1 + 1 / (1 – a) )

– For a table that is full or near full:• Math.Sqrt( n ( / 8) )

• Searching with double hashing.– (-ln (1 – ) ) / where ‘l’ in ‘ln’ is ‘L’

• Searching with chained hashing.– 1 + ( / 2 )

• See Figure 11.6 in Main. Page 561

Page 26: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Coding Example

• Search Times program that demonstrates Linear, Binary, and Hashing. – The hashing uses the HashTable class.

Page 27: Hashing Hashing is another method for sorting and searching data. –Hashing makes it easier to add and remove elements from a data structure. –The worst-case

Hashing

• Java provides the HashTable class, but it also provides two other classes.– The HashMap class implements a hash

table using a map data structure.– The HashSet class implements a hash table

using sets.