week 12 hashing
DESCRIPTION
good stuff data structuresTRANSCRIPT
-
CS221A Data Structures & Algorithms
Hashing
-
Agenda
Hashing Concepts and Preliminaries
Hash Function
Separate Chaining
-
Hashing
A technique used for performing insertions, deletions and
search in a constant average time.
A scenario in which the keys themselves point directly to
records.
Information encoded directly within a key can point us to
its associated record.
Examine the key and simply know where to look.
-
Hashing
Determine the location of the record by performing an
arithmetic computation on its key.
Result of this computation yields the location of the
record using a table called Hash-Table.
This computation is referred as Hash-Function.
-
Hashing
Typical Hash-table is an array of some fixed size,
containing the keys.
A key is typically a string with an associated value.
Each key is mapped into some number in the range 0
Tablesize -1 & placed in an appropriate cell.
Mapping is provided by Hash-function.
A hash function should be simple to compute and should
ensure that any two distinct keys get different cells.
-
Hashing
Typical Hash-table is an array of some fixed size,
containing the keys.
A key is typically a string with an associated value.
Each key is mapped into some number in the range 0
Tablesize -1 & placed in an appropriate cell.
Mapping is provided by Hash-function.
A hash function should be simple to compute and should
ensure that any two distinct keys get different cells.
-
Hashing
0 Alpha
1 Beta
2 Gamma
3 Theta
4 Omega
5 Delta
6 Epsilon
7 Pie
In the figure is an ideal Hashtable.
All the Distinct Greek Alphabetical
Names hash to distinct keys.
Beta Hashes to 1.
Theta Hashes to 3.
Epsilon Hashes to 6.
-
Hashing
In this case Keys are the names of the contacts and hash function maps it to the index of the arrays where there phone number is stored.
The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be found.
-
Hashing
Only issue is picking up or figuring out the hash function
& deciding what to do when two keys hash to the same
value (phenomena know to us as Collision).
-
Hashing
Get the juice flowing guys;
Problem Statement:
Lets assume we have to build an application that supports a
customer service department for some company. To simplify
the operation for both representatives and customers, how
will you store the data ?
-
Hashing
Simple Solution:
Key the account records by telephone number, thus when
answering a call, the service representative will retrieve
account information by entering the customers telephone
number into the system.
What sort of hash function you can come up with ?
-
Hash Function
For integer keys, then simply returning Key % Tablesize is
generally a reasonable strategy for a hash function.
If we have 0 key 99. and our table size is 10. what will
be the worst case scenario for this hash function. ?
-
Hash Function
For integer keys, then simply returning Key % Tablesize is
generally a reasonable strategy for a hash function.
If we have 0 key 99. and our table size is 10. what will
be the worst case scenario for this hash function. ?
Answer:
If the all the keys end in 0.
-
Hash Function
For the situations like this its preferred to have the table
size as Prime.
When the keys are random integers, this function is
effective in distributing the keys evenly.
When keys are string values an effective hash function can
be adding the ASCII values and using our Mod function to
create mapping.
-
Hash Function
typedef unsigned int Index;
Index Hash(Char *Key, int Tablesize)
{
int HashValue = 0;
While (*Key != \0)
{
HashValue + = *Key;
}
return HashValue % TableSize;
}
-
Hash Function
Where is the hash function in previous slide ineffective ?
-
Hash Function
If the table size is large, the function doesnt distribute
keys well.
For higher prime number table size i.e. for example
10,007, suppose all keys are less than 8 characters. Most
value a char can have is 127 in ASCII. So 127*8 = 1,016 is
the largest value hash function can assume..
0 1,016 are the possible values can be assumed. Try
taking Mod on this one ..
When two or more keys hash to same function, this is
known as collision. Lesser the collision better is your has
function.
-
Separate Chaining
Keep a list of all elements that hash to the same value.
-
Separate Chaining
To perform a Find, we use the hash function to determine
which list to traverse. We then traverse the list in a
normal manner, returning the position where the item is
found.
-
Separate Chaining
To perform an insert, we traverse down the appropriate
list to check whether the element is already in place.
If duplicates are expected, an extra field is usually kept
and this field would be incremented in the event of a
match.
If the element turns out to be new. It is either inserted in
front of the list or at the end of the list, whichever is
easier and its frequency of retrieval.
-
Hashing Implementation
typedef struct ListNode *Position; typedef struct HashTbl *HashTable; typedef Position List; struct ListNode { ElementType Element; Position Next;
} struct HashTbl { int TableSize; List *TheLists;
}
-
Hashing Implementation
HashTable InitializeTable(int TableSize)
{ HashTable H = NextPrime(TableSize);
HTheLists = malloc(sizeof(List)*HTableSize);
For (int i =0;i
-
Hashing Implementation
Position Find(ElementType Key, HashTable H)
{ Position P;
List L = H TheLists[Hash(Key, HTableSize)];
P=L Next;
While (P != NULL && PElement !=Key)
{
// Strcmp
P= PNext;
}
return P;
}
-
Hashing Implementation
void Insert(ElementType Key, HashTable H, ElementType RecordValue)
{ Position Pos, NewCell; List L; Pos = Find(Key, H); if (Pos == NULL) {
NewCell = malloc(sizeof(struct ListNode)); L = HTheLists[Hash(Key, HTableSize)]; NewCellNext = LNext; NewCellElement = RecordValue; LNext = NewCell;
}
}