hashing. hash tables - introduction za structure that offers fast insertion and searching zinsertion...

Hashing

Hash Tables - Introduction

A structure that offers fast insertion and searching

Insertion and searching is almost O(1)

Hashing- a range of key values is transformed into a range of array index values

Hashing - Introduction

In a dictionary, if the main key was the array index, searching and inserting items would be very fast.

Example: Empdata[1000], employee database, index=employee number- search for employee with emp. number = 500- Answer: Empdata[500]- Running Time: O(1)

Hash Tables

In the previous example, it was easy since employee number is an integer.

What if the main key is a word in the English Alphabet (i.e. last names)

How can the main key be mapped into an array index

Hash Tables

Sum of Digits Method- map the alphabet A-Z to the numbers 1 to 26 (a=1,b=2,c=3,etc.)- add the total of the letters- For example, “cats” (c=3,a=1,t=20,s=19);3+1+20+19=43-”cats” will be stored using index = 43

Hash Tables

Problem- Too may words with the same index - “was”,”tin”,”give”,”tend”,”moan”,”tick” and several other words add to 43

Hashing

Another Method (Multiply by Powers)- an integer in the numeric system is in the power of 10 - 7546=7x1000 + 5x100 + 4x10+6- 7546=7x103 + 5x102 + 4x101+6

Can do the same thing with words - Use 27 as base (26 letters + blank)

Hashing

“cats”=3*273+1*272+20*271+19 = 60,337

unique index for every wordmain drawback : takes too much

space- For up to 10 letter words(279), one 7,000,000,000,000 (7000 gigabytes)

Hashing

While the scheme was able to generate unique keys, it assigns spaces to non-words (aaaaaa,zzzzzzz,aaacccc,etc.)

Be able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array

Hashing

Hashing function- The process of converting a number in a large range into a number in a smaller range.

Hashing

“cats”=3*273+1*272+20*271+19 = 60,337

unique index for every wordmain drawback : takes too much

space- For up to 10 letter words(279), one 7,000,000,000,000 (7000 gigabytes)

Hashing

While the scheme was able to generate unique keys, it assigns spaces to non-words (aaaaaa,zzzzzzz,aaacccc,etc.)

Be able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array

Hashing

Hashing function- The process of converting a number in a large range into a number in a smaller range.

Size of smaller range - twice the size of the data set (2s)- for 50,000 words, array of 100,000 elements

Hashing

Hash Function- achieved by using the modulo function (returns the remainder)- for example, 33 mod 10 = 3

- LargeNumber mod Smallrange

Hashing

Hugenumber=C0*279+C1*278+C2*277 …. C9*270

arraysize = numberofwords * 2

arrayindex=Hugenumber mod arraysize

Hashing - Collisions

Hashing presents the risk of two elements with the same index (although better than sumofdigits).

Collision - two elements with the same index key after hashing

Collisions

Two approaches to handle collision- Open Addressing- Separate Chaining

Open Addressing - Finding the next available free cell

Separate Chaining - install a linked list at each index

Open Addressing

Three Types- Linear Probing, Quadratic Probing, and Double Hashing

Linear Probing - Finding the next available cell (x+1,x+2,etc.)- leads to clustering

Clustering

Quadratic Probing- Finds next available cell using the squares as the step method (x+1,x+4,x+27,etc)

Double Hash- Hash again using a different hash function to find next free cell- 2nd hash : step size

Separate Chaining

A linked list is installed in the array index such that entries with the same keys are attached to the linked list

1 1

2

3

986 1881 333

Hashing

Read Chapter 5 (Data Structures and Algorithms in C by Weiss)

Chapter 7 (in Goodrich and Tamassia Book)

Hash Functions implementations are presented in these chapters

Summary Notes

Data Structures and Algorithms

Conceptual ApproachJava and C Implementations are

presented in both booksFor further studies, focus more on

the mathematical aspects (look at theorems & propositions)- Proving

When to use what

General Purpose Data Structures- arrays,linked lists, trees, and hash tables- used to store and retrieve data using key values

-applications : can be used for storing personnel records, inventories, contact lists,etc.

General Purpose Data Structures

Arrays Best used : - when amount of data is reasonably small - when to amount of data is predictable in advance


Linked Lists- when data stored cannot be predicted- when data will be frequently inserted and deleted

Binary Search Trees- used when arrays or linked lists are too slow - O(logN) : insertion,searching, deletion


Hash Tables - fastest data storage structure- used in spell checkers and as symbol tables in compilers - may require additional memory for open addressing implementations

Special Purpose Data Structures

Stacks,Queues (Priority Queues)used by a computer program to aid in carrying out

some algorithm For example, in graph algorithms, stack and

queues were usedAbstract Data Types

- implemented by a more fundamental data structure (array,linked list)- conceptual aids

Special Purpose Data Structures

Stacks- used when you want to access the last data inserted (LIFO structure)- implemented using array or linked lists depending on size

Queues - used when you want to access the first data item (FIFO structure)

Graphs

Unique data structureDirectly model real world situations

(maps, flight-airports,etc)structure of the graph reflects the

structure of the problemmain choice is representation : adjacent

list or adjacency matrix

Sorting

For limited data elements (up to 1000-1500 entries), insertion sort may be sufficient

When bogged down, can use merge sort or quick sort (merge sort however requires additional memory)

Sorting - Running Times

Sort Average Running Time

Bubble O( n2)

Selection O( n2)

Insertion O( n2)

ShellSort O(n3/2)

Quick Sort O(n log n)

MergeSort O(n log n)

hashing. hash tables - introduction za structure that offers fast insertion and searching zinsertion...

Documents

index key

unique index

large range

huge range of numbers

range of key values

size of smaller range

employee number search

main key