week 12 hashing

24
CS221A Data Structures & Algorithms Hashing

Upload: ahmadafaq09

Post on 06-Nov-2015

224 views

Category:

Documents


1 download

DESCRIPTION

good stuff data structures

TRANSCRIPT

  • CS221A Data Structures & Algorithms

    Hashing

  • Agenda

    Hashing Concepts and Preliminaries

    Hash Function

    Separate Chaining

  • Hashing

    A technique used for performing insertions, deletions and

    search in a constant average time.

    A scenario in which the keys themselves point directly to

    records.

    Information encoded directly within a key can point us to

    its associated record.

    Examine the key and simply know where to look.

  • Hashing

    Determine the location of the record by performing an

    arithmetic computation on its key.

    Result of this computation yields the location of the

    record using a table called Hash-Table.

    This computation is referred as Hash-Function.

  • Hashing

    Typical Hash-table is an array of some fixed size,

    containing the keys.

    A key is typically a string with an associated value.

    Each key is mapped into some number in the range 0

    Tablesize -1 & placed in an appropriate cell.

    Mapping is provided by Hash-function.

    A hash function should be simple to compute and should

    ensure that any two distinct keys get different cells.

  • Hashing

    Typical Hash-table is an array of some fixed size,

    containing the keys.

    A key is typically a string with an associated value.

    Each key is mapped into some number in the range 0

    Tablesize -1 & placed in an appropriate cell.

    Mapping is provided by Hash-function.

    A hash function should be simple to compute and should

    ensure that any two distinct keys get different cells.

  • Hashing

    0 Alpha

    1 Beta

    2 Gamma

    3 Theta

    4 Omega

    5 Delta

    6 Epsilon

    7 Pie

    In the figure is an ideal Hashtable.

    All the Distinct Greek Alphabetical

    Names hash to distinct keys.

    Beta Hashes to 1.

    Theta Hashes to 3.

    Epsilon Hashes to 6.

  • Hashing

    In this case Keys are the names of the contacts and hash function maps it to the index of the arrays where there phone number is stored.

    The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be found.

  • Hashing

    Only issue is picking up or figuring out the hash function

    & deciding what to do when two keys hash to the same

    value (phenomena know to us as Collision).

  • Hashing

    Get the juice flowing guys;

    Problem Statement:

    Lets assume we have to build an application that supports a

    customer service department for some company. To simplify

    the operation for both representatives and customers, how

    will you store the data ?

  • Hashing

    Simple Solution:

    Key the account records by telephone number, thus when

    answering a call, the service representative will retrieve

    account information by entering the customers telephone

    number into the system.

    What sort of hash function you can come up with ?

  • Hash Function

    For integer keys, then simply returning Key % Tablesize is

    generally a reasonable strategy for a hash function.

    If we have 0 key 99. and our table size is 10. what will

    be the worst case scenario for this hash function. ?

  • Hash Function

    For integer keys, then simply returning Key % Tablesize is

    generally a reasonable strategy for a hash function.

    If we have 0 key 99. and our table size is 10. what will

    be the worst case scenario for this hash function. ?

    Answer:

    If the all the keys end in 0.

  • Hash Function

    For the situations like this its preferred to have the table

    size as Prime.

    When the keys are random integers, this function is

    effective in distributing the keys evenly.

    When keys are string values an effective hash function can

    be adding the ASCII values and using our Mod function to

    create mapping.

  • Hash Function

    typedef unsigned int Index;

    Index Hash(Char *Key, int Tablesize)

    {

    int HashValue = 0;

    While (*Key != \0)

    {

    HashValue + = *Key;

    }

    return HashValue % TableSize;

    }

  • Hash Function

    Where is the hash function in previous slide ineffective ?

  • Hash Function

    If the table size is large, the function doesnt distribute

    keys well.

    For higher prime number table size i.e. for example

    10,007, suppose all keys are less than 8 characters. Most

    value a char can have is 127 in ASCII. So 127*8 = 1,016 is

    the largest value hash function can assume..

    0 1,016 are the possible values can be assumed. Try

    taking Mod on this one ..

    When two or more keys hash to same function, this is

    known as collision. Lesser the collision better is your has

    function.

  • Separate Chaining

    Keep a list of all elements that hash to the same value.

  • Separate Chaining

    To perform a Find, we use the hash function to determine

    which list to traverse. We then traverse the list in a

    normal manner, returning the position where the item is

    found.

  • Separate Chaining

    To perform an insert, we traverse down the appropriate

    list to check whether the element is already in place.

    If duplicates are expected, an extra field is usually kept

    and this field would be incremented in the event of a

    match.

    If the element turns out to be new. It is either inserted in

    front of the list or at the end of the list, whichever is

    easier and its frequency of retrieval.

  • Hashing Implementation

    typedef struct ListNode *Position; typedef struct HashTbl *HashTable; typedef Position List; struct ListNode { ElementType Element; Position Next;

    } struct HashTbl { int TableSize; List *TheLists;

    }

  • Hashing Implementation

    HashTable InitializeTable(int TableSize)

    { HashTable H = NextPrime(TableSize);

    HTheLists = malloc(sizeof(List)*HTableSize);

    For (int i =0;i

  • Hashing Implementation

    Position Find(ElementType Key, HashTable H)

    { Position P;

    List L = H TheLists[Hash(Key, HTableSize)];

    P=L Next;

    While (P != NULL && PElement !=Key)

    {

    // Strcmp

    P= PNext;

    }

    return P;

    }

  • Hashing Implementation

    void Insert(ElementType Key, HashTable H, ElementType RecordValue)

    { Position Pos, NewCell; List L; Pos = Find(Key, H); if (Pos == NULL) {

    NewCell = malloc(sizeof(struct ListNode)); L = HTheLists[Hash(Key, HTableSize)]; NewCellNext = LNext; NewCellElement = RecordValue; LNext = NewCell;

    }

    }