week 12 hashing

CS221A Data Structures & Algorithms

Hashing

Agenda

Hashing Concepts and Preliminaries

Hash Function

Separate Chaining

Hashing

A technique used for performing insertions, deletions and

search in a constant average time.

A scenario in which the keys themselves point directly to

records.

Information encoded directly within a key can point us to

its associated record.

Examine the key and simply know where to look.

Hashing

Determine the location of the record by performing an

arithmetic computation on its key.

Result of this computation yields the location of the

record using a table called Hash-Table.

This computation is referred as Hash-Function.

Hashing

Typical Hash-table is an array of some fixed size,

containing the keys.

A key is typically a string with an associated value.

Each key is mapped into some number in the range 0

Tablesize -1 & placed in an appropriate cell.

Mapping is provided by Hash-function.

A hash function should be simple to compute and should

ensure that any two distinct keys get different cells.

Hashing

0 Alpha

1 Beta

2 Gamma

3 Theta

4 Omega

5 Delta

6 Epsilon

7 Pie

In the figure is an ideal Hashtable.

All the Distinct Greek Alphabetical

Names hash to distinct keys.

Beta Hashes to 1.

Theta Hashes to 3.

Epsilon Hashes to 6.

Hashing

In this case Keys are the names of the contacts and hash function maps it to the index of the arrays where there phone number is stored.

The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be found.

Hashing

Only issue is picking up or figuring out the hash function

& deciding what to do when two keys hash to the same

value (phenomena know to us as Collision).

Hashing

Get the juice flowing guys;

Problem Statement:

Lets assume we have to build an application that supports a

customer service department for some company. To simplify

the operation for both representatives and customers, how

will you store the data ?

Hashing

Simple Solution:

Key the account records by telephone number, thus when

answering a call, the service representative will retrieve

account information by entering the customers telephone

number into the system.

What sort of hash function you can come up with ?

Hash Function

For integer keys, then simply returning Key % Tablesize is

generally a reasonable strategy for a hash function.

If we have 0 key 99. and our table size is 10. what will

be the worst case scenario for this hash function. ?

Hash Function

For integer keys, then simply returning Key % Tablesize is

generally a reasonable strategy for a hash function.

If we have 0 key 99. and our table size is 10. what will

be the worst case scenario for this hash function. ?

Answer:

If the all the keys end in 0.

Hash Function

For the situations like this its preferred to have the table

size as Prime.

When the keys are random integers, this function is

effective in distributing the keys evenly.

When keys are string values an effective hash function can

be adding the ASCII values and using our Mod function to

create mapping.

Hash Function

typedef unsigned int Index;

Index Hash(Char *Key, int Tablesize)

{

int HashValue = 0;

While (*Key != \0)

{

HashValue + = *Key;

}

return HashValue % TableSize;

}

Hash Function

Where is the hash function in previous slide ineffective ?

Hash Function

If the table size is large, the function doesnt distribute

keys well.

For higher prime number table size i.e. for example

10,007, suppose all keys are less than 8 characters. Most

value a char can have is 127 in ASCII. So 127*8 = 1,016 is

the largest value hash function can assume..

0 1,016 are the possible values can be assumed. Try

taking Mod on this one ..

When two or more keys hash to same function, this is

known as collision. Lesser the collision better is your has

function.

Separate Chaining

Keep a list of all elements that hash to the same value.

Separate Chaining

To perform a Find, we use the hash function to determine

which list to traverse. We then traverse the list in a

normal manner, returning the position where the item is

found.

Separate Chaining

To perform an insert, we traverse down the appropriate

list to check whether the element is already in place.

If duplicates are expected, an extra field is usually kept

and this field would be incremented in the event of a

match.

If the element turns out to be new. It is either inserted in

front of the list or at the end of the list, whichever is

easier and its frequency of retrieval.

Hashing Implementation

typedef struct ListNode *Position; typedef struct HashTbl *HashTable; typedef Position List; struct ListNode { ElementType Element; Position Next;

} struct HashTbl { int TableSize; List *TheLists;

}


HashTable InitializeTable(int TableSize)

{ HashTable H = NextPrime(TableSize);

HTheLists = malloc(sizeof(List)*HTableSize);

For (int i =0;i


Position Find(ElementType Key, HashTable H)

{ Position P;

List L = H TheLists[Hash(Key, HTableSize)];

P=L Next;

While (P != NULL && PElement !=Key)

{

// Strcmp

P= PNext;

}

return P;

}


void Insert(ElementType Key, HashTable H, ElementType RecordValue)

{ Position Pos, NewCell; List L; Pos = Find(Key, H); if (Pos == NULL) {

NewCell = malloc(sizeof(struct ListNode)); L = HTheLists[Hash(Key, HTableSize)]; NewCellNext = LNext; NewCellElement = RecordValue; LNext = NewCell;

}

}

week 12 hashing

Documents