hashing. hash tables - introduction za structure that offers fast insertion and searching zinsertion...

32
Hashing

Upload: buck-hunt

Post on 31-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Hashing

Hash Tables - Introduction

A structure that offers fast insertion and searching

Insertion and searching is almost O(1)

Hashing- a range of key values is transformed into a range of array index values

Hashing - Introduction

In a dictionary, if the main key was the array index, searching and inserting items would be very fast.

Example: Empdata[1000], employee database, index=employee number- search for employee with emp. number = 500- Answer: Empdata[500]- Running Time: O(1)

Hash Tables

In the previous example, it was easy since employee number is an integer.

What if the main key is a word in the English Alphabet (i.e. last names)

How can the main key be mapped into an array index

Hash Tables

Sum of Digits Method- map the alphabet A-Z to the numbers 1 to 26 (a=1,b=2,c=3,etc.)- add the total of the letters- For example, “cats” (c=3,a=1,t=20,s=19);3+1+20+19=43-”cats” will be stored using index = 43

Hash Tables

Problem- Too may words with the same index - “was”,”tin”,”give”,”tend”,”moan”,”tick” and several other words add to 43

Hashing

Another Method (Multiply by Powers)- an integer in the numeric system is in the power of 10 - 7546=7x1000 + 5x100 + 4x10+6- 7546=7x103 + 5x102 + 4x101+6

Can do the same thing with words - Use 27 as base (26 letters + blank)

Hashing

“cats”=3*273+1*272+20*271+19 = 60,337

unique index for every wordmain drawback : takes too much

space- For up to 10 letter words(279), one 7,000,000,000,000 (7000 gigabytes)

Hashing

While the scheme was able to generate unique keys, it assigns spaces to non-words (aaaaaa,zzzzzzz,aaacccc,etc.)

Be able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array

Hashing

Hashing function- The process of converting a number in a large range into a number in a smaller range.

Hashing

“cats”=3*273+1*272+20*271+19 = 60,337

unique index for every wordmain drawback : takes too much

space- For up to 10 letter words(279), one 7,000,000,000,000 (7000 gigabytes)

Hashing

While the scheme was able to generate unique keys, it assigns spaces to non-words (aaaaaa,zzzzzzz,aaacccc,etc.)

Be able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array

Hashing

Hashing function- The process of converting a number in a large range into a number in a smaller range.

Size of smaller range - twice the size of the data set (2s)- for 50,000 words, array of 100,000 elements

Hashing

Hash Function- achieved by using the modulo function (returns the remainder)- for example, 33 mod 10 = 3

- LargeNumber mod Smallrange

Hashing

Hugenumber=C0*279+C1*278+C2*277 …. C9*270

arraysize = numberofwords * 2

arrayindex=Hugenumber mod arraysize

Hashing - Collisions

Hashing presents the risk of two elements with the same index (although better than sumofdigits).

Collision - two elements with the same index key after hashing

Collisions

Two approaches to handle collision- Open Addressing- Separate Chaining

Open Addressing - Finding the next available free cell

Separate Chaining - install a linked list at each index

Open Addressing

Three Types- Linear Probing, Quadratic Probing, and Double Hashing

Linear Probing - Finding the next available cell (x+1,x+2,etc.)- leads to clustering

Clustering

Quadratic Probing- Finds next available cell using the squares as the step method (x+1,x+4,x+27,etc)

Double Hash- Hash again using a different hash function to find next free cell- 2nd hash : step size

Separate Chaining

A linked list is installed in the array index such that entries with the same keys are attached to the linked list

1 1

2

3

986 1881 333

Hashing

Read Chapter 5 (Data Structures and Algorithms in C by Weiss)

Chapter 7 (in Goodrich and Tamassia Book)

Hash Functions implementations are presented in these chapters

Summary Notes

Data Structures and Algorithms

Conceptual ApproachJava and C Implementations are

presented in both booksFor further studies, focus more on

the mathematical aspects (look at theorems & propositions)- Proving

When to use what

General Purpose Data Structures- arrays,linked lists, trees, and hash tables- used to store and retrieve data using key values

-applications : can be used for storing personnel records, inventories, contact lists,etc.

General Purpose Data Structures

Arrays Best used : - when amount of data is reasonably small - when to amount of data is predictable in advance

General Purpose Data Structures

Linked Lists- when data stored cannot be predicted- when data will be frequently inserted and deleted

Binary Search Trees- used when arrays or linked lists are too slow - O(logN) : insertion,searching, deletion

General Purpose Data Structures

Hash Tables - fastest data storage structure- used in spell checkers and as symbol tables in compilers - may require additional memory for open addressing implementations

Special Purpose Data Structures

Stacks,Queues (Priority Queues)used by a computer program to aid in carrying out

some algorithm For example, in graph algorithms, stack and

queues were usedAbstract Data Types

- implemented by a more fundamental data structure (array,linked list)- conceptual aids

Special Purpose Data Structures

Stacks- used when you want to access the last data inserted (LIFO structure)- implemented using array or linked lists depending on size

Queues - used when you want to access the first data item (FIFO structure)

Graphs

Unique data structureDirectly model real world situations

(maps, flight-airports,etc)structure of the graph reflects the

structure of the problemmain choice is representation : adjacent

list or adjacency matrix

Sorting

For limited data elements (up to 1000-1500 entries), insertion sort may be sufficient

When bogged down, can use merge sort or quick sort (merge sort however requires additional memory)

Sorting - Running Times

Sort Average Running Time

Bubble O( n2)

Selection O( n2)

Insertion O( n2)

ShellSort O(n3/2)

Quick Sort O(n log n)

MergeSort O(n log n)