csc 2300 data structures & algorithms

20
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing

Upload: selena

Post on 21-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

CSC 2300 Data Structures & Algorithms. February 27, 2007 Chapter 5. Hashing. Today – Splay Trees and Hashing. Splay Trees – Delete a Node Hashing – Overview Hashing – Separate Chaining. Splay Tree – Delete a Node. How do we delete a node? First, access the node. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSC 2300 Data Structures & Algorithms

CSC 2300Data Structures & Algorithms

February 27, 2007

Chapter 5. Hashing

Page 2: CSC 2300 Data Structures & Algorithms

Today – Splay Trees and Hashing Splay Trees – Delete a Node Hashing – Overview Hashing – Separate Chaining

Page 3: CSC 2300 Data Structures & Algorithms

Splay Tree – Delete a Node

How do we delete a node? First, access the node. This puts the node at the root. After we delete the root, we get TL and TR. Which node should we select as the new root? The largest element in TL. Find this element (easy), and rotate it to the root

position of TL.

Will this new root of TL have a right child? If not, what can we do?

Page 4: CSC 2300 Data Structures & Algorithms

Examples in Class

We will illustrate the idea with a few examples in class:

http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm

Page 5: CSC 2300 Data Structures & Algorithms

Zig-zag and Zag-zig

The web site uses a different (and perhaps more intuitive) way to describe the actions of Zig-zag and Zag-zig. We will use the definitions in the text.

Zig-zag as in text:

The web site calls it zag-zig – we do a zag first, and then a zig.

Page 6: CSC 2300 Data Structures & Algorithms

Hashing – Overview

Page 7: CSC 2300 Data Structures & Algorithms

Hash Functions

Page 8: CSC 2300 Data Structures & Algorithms

Integer Input Keys

If the input keys are integers, then a good strategy is to compute

key mod tableSize. When may this strategy become a bad choice? As an example, let tableSize=10. Can you suggest

a sequence of input keys that will all be mapped to the same cell?

What is then a good choice for tableSize?

Page 9: CSC 2300 Data Structures & Algorithms

String Input Keys

Choose tableSize as a prime number – a good choice according to the previous slide.

Since memory is cheap, construct a large table. Let tableSize=10,007. Suppose that all keys are eight or fewer characters long. Use this hash function:

What can happen to the hash table? Hint. What is the maximum integer value of an ASCII character?

Page 10: CSC 2300 Data Structures & Algorithms

String Input Keys

So, we want hashVal to be large – greater than 10,007. Multiplication may be more appropriate than addition. Try this hash function:

This function uses the first three characters of the input string. We can check that 26 x 272 > 10,007. Why do we use 26 x 272 and not 263 ? What may go wrong with distribution in this hash table?

Page 11: CSC 2300 Data Structures & Algorithms

String Input Keys

Include also the ten numbers 0 to 9. Get this hash function:

This function uses Horner’s rule. What is it?

Page 12: CSC 2300 Data Structures & Algorithms

Horner’s Rule

Problem 2.14 in text. Evaluate f(x) = anxn + an-1xn-1 + … + a2x2 + a1x + a0

Code:

poly = 0;

for( i=n; i>=0; i-- )

poly = x * poly + a[i]; Why does this algorithm work? What is its running time?

Page 13: CSC 2300 Data Structures & Algorithms

Compromises

A hash function needs not be the best with respect to table distribution.

But it should be simple and reasonably fast. If the keys are very long, the hash function will take too long to

compute. A common practice is not to use all the characters. As an example, consider a complete street address. The hash function may include only a couple of characters

from the street address, and a couple of characters from the city name and the zip code.

The idea is that the time saved in computing the hash function will make up for a slightly less evenly distributed function.

Page 14: CSC 2300 Data Structures & Algorithms

Collision Resolution

If, when an element is inserted, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it.

There are several methods for dealing with collision, we will discuss the simplest approach: separate chaining.

Page 15: CSC 2300 Data Structures & Algorithms

How to Handle Collisions

Page 16: CSC 2300 Data Structures & Algorithms

Separate Chaining

Keep a list of all elements that hash to the same value. Example. The keys are the first 10 perfect squares and the

hash function is hash(x) = x mod 10.

Page 17: CSC 2300 Data Structures & Algorithms

Hash Table with Separate Hashing

Page 18: CSC 2300 Data Structures & Algorithms

Example

Page 19: CSC 2300 Data Structures & Algorithms

Discussion

Another data structure could be used to resolve the collisions; for example, binary search trees.

Why do we use linked lists instead? We define the load factor, λ, of a hash table to be the ratio of the

number of elements in the table to the table size. The average length of a list is λ. The effort to perform a search is the constant time required to

evaluate the hash function plus the time to traverse the list. In an unsuccessful search, what is the average number of nodes

to examine? In a successful search, what is the average number of nodes to

examine? Why 1+(λ/2) and not (λ/2)?

Page 20: CSC 2300 Data Structures & Algorithms

Hashing Applet

In class, we will run some hashing examples using this web site:

http://www.engin.umd.umich.edu/CIS/course.des/cis350/hashing/WEB/HashApplet.htm