![Page 1: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/1.jpg)
COSC 2007 Data Structures II
Chapter 12Advanced Implementation of
Tables III
![Page 2: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/2.jpg)
2
Topics
Hashing Definition
Hash function Key Hash value collision
Open hashing
![Page 3: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/3.jpg)
3
Common Problem
A common pattern in many programs is to store and look up data Find student record, given ID# Find person address, given phone #
Because it is so common, many data structures for it have been investigated
![Page 4: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/4.jpg)
4
Phone Number Problem
Problem: phone company wants to implement caller ID.
given a phone number (the key), look up person’s name or address(the data)
lots of phone numbers (P=107-1) in a given area code
only a small fraction of them are in use Nobody has a phone number :0000000 or 0000001
![Page 5: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/5.jpg)
5
Comparison of Time Complexity (average)Operation Insertion Deletion Search
Unsorted Array O(1) O(n) O(n)Unsorted reference O(1) O(n) O(n)
Sorted Array O(n) O(n) O(logn)
Sorted reference O(n) O(n) O(n)
BST O(logn) O(logn) O(logn)
Can we do better than O(logn)?
![Page 6: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/6.jpg)
6
Can we do better than O(log N)?
All previous searching techniques require a specified amount of time (O(logn) or O(n))
Time usually depends on number of elements (n) stored in the table
In some situations searching should be almost instantaneous Examples
911 emergency system Air-traffic control system
![Page 7: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/7.jpg)
7
Can we do better than O(log N)?
Answer: Yes … sort of, if we're lucky. General idea: take the key of the data record
you’re inserting, and use that number directly as the item number in a list (array).
Search is O(1), but huge amount of space wasted.
Null Null Null Null
259
-162
3
Xu
000
-000
0
000
-000
1
000
-000
2
•••
Null ••• Null Sub
263
-304
9
••• •••
![Page 8: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/8.jpg)
8
Hashing Basic idea:
Don't use the data value directly. Given an array of size B, use a hash function,
h(x), which maps the given data record x to some (hopefully) unique index (“bucket”) in the array.
0
1
h(x)
B-1
xh
![Page 9: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/9.jpg)
9
What is Hash Table?
The simplest kind of hash table is an array of records.
This example has 101 records.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]
![Page 10: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/10.jpg)
10
What is Hash Table?
Each record has a special
field, called its key. In this example, the key
is a long integer field
called Number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]
[ 4 ]
Number 256-2879
8888 Queen St.Linda Kim
![Page 11: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/11.jpg)
11
What is Hash Table?
The number is person's
phone number,
and the rest is
person name or address.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]
[ 4 ]
Number 256-2879
![Page 12: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/12.jpg)
12
What is Hash Table?
When a hash table is in use, some spots contain valid records, and other spots are "empty".
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322
![Page 13: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/13.jpg)
13
Inserting a New Record? In order to insert a new record,
the key must somehow be
converted to an array index. The index is called the
hash value of the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322
Number 265-1556
![Page 14: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/14.jpg)
14
Inserting a New Record? Typical way to create a hash value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322
Number 265-1556
(Number mod 101)
What is (265-1556 mod 101) ?
![Page 15: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/15.jpg)
15
Inserting a New Record? Typical way to create a hash value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322
Number 265-1556
(Number mod 101)
What is (2651556 mod 101) ? 3
![Page 16: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/16.jpg)
16
Inserting a New Record? The hash value is used for
the location of the
new record.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322
Number 265-1556
[3]
![Page 17: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/17.jpg)
17
Inserting a New Record?
The hash value is used for the location of the new record.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685
![Page 18: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/18.jpg)
18
What is Hashing? What is hashing?
Each item has a unique key. Use a large array called a Hash Table. Use a Hash Function.
Hashing is like indexing in that it involves associating a key with a relative record address.
Hashing, however, is different from indexing in two important ways: With hashing, there is no obvious connection between the key and the
location. With hashing two different keys may be transformed to the same address.
A Hash function is a function h(K) which transforms a key K into an address.
![Page 19: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/19.jpg)
19
What is Hashing?
An address calculator (hashing function) is used to determine the location of the item
Address Calculator
(Hash function)
Array
(Hash table)Search key
N-1
0
![Page 20: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/20.jpg)
20
What Can Be Hashed?
Anything! Can hash on numbers, strings, structures, etc. Java defines a hashing method for general objects
which returns an integer value.
![Page 21: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/21.jpg)
21
Where do we use Hashing?
Databases (phone book, student name list). Spell checkers. Computer chess games. Compilers.
![Page 22: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/22.jpg)
22
Hashing and Tables Hashing gives us another implementation of Table
ADT Hashing operations
Initialize all locations in Hash Table are empty.
Insert Search Delete
Hash the key; this gives an index; use it to find the value stored in the table in O(1) Great improvement over Log N.
![Page 23: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/23.jpg)
23
Hashing Insert pseudocode
tableInsert (newItem)
i = the array index that the address calculator gives you for the new item’s search keytable[i]=newItem
Retrieval pseudocodetableRerieve (searchKey)
i = array index for searchKey given by the hash functionif (table[i].getKey( ) == searchKey)
return table[i] else
return null
![Page 24: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/24.jpg)
24
Hashing
Deletion pseudocodetableDelete (searchKey)
i = array index for searchKey given by the hash function
success=(tabke[I].getKey() equals searchKey
if (success)
Delete the item from table[i]
Return success
![Page 25: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/25.jpg)
25
Hash Tables
Table size Entries are numbered 0 to TSIZE-1
Mapping Simple to compute Ideally 1-1: not possible Even distribution
Main problems Choosing table size Choosing a good hash function What to do on collisions
![Page 26: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/26.jpg)
26
How to choose the Table Size?
H (Key) = Key mod TSIZETSIZE = 10
20
22
541526
49
0123456789
152022264954
0123456789
110210320460520600
0123456789
110210320460520600
10
110210,320
520
600
460
TSIZE = 11
![Page 27: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/27.jpg)
27
How to choose a Hashing Function?
The hash function we choose depends on the type of the key field (the key we use to do our lookup). Finding a good one can be hard
Rule Be easy to calculate. Use all of the key. Spread the keys uniformly.
![Page 28: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/28.jpg)
28
How to choose a Hashing Function?
Example: Student Ids (integers)
h(idNumber) = idNumber % B
eg. h(678921) = 678921 % 100 = 21 Names (char strings)
h(name) = (sum over the ascii values) % B
eg. h(“Bill”) = (66+105+108+108) % 101 = 86
![Page 29: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/29.jpg)
29
Collision
Here is another new record to
insert, with a hash value of 2.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685
Number 2641455
My hashvalue is [2].
![Page 30: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/30.jpg)
30
What to do on collisions?
Open hashing (separate chaining) Close hashing (open address)
Linear Probing Quadratic Probing Double hashing
![Page 31: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/31.jpg)
31
Keep a list of all elements that hash to the same value.
Open hashing (separate chaining)
0123456789
01 81
4 64
16 36
9 49
25
0149162536496481
![Page 32: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/32.jpg)
32
Open hashing (separate chaining)
Secondary Data Structure List Search tree another hash table
We expect small collision List
Simple Small overhead
0123456789
01 81
4 64
16 36
9 49
25
![Page 33: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/33.jpg)
33
Operations with Chaining
Insert with chaining Apply hash function to get a position. Insert key into the Linked List at this position.
Search with chaining Apply hash function to get a position. Search the Linked List at this position.
![Page 34: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/34.jpg)
34
Open hashing (separate chaining)
public class ChainNode{
Private KeyedItem item; private ChainNode next;
public ChainNode(KeyedItem newItem, ChainNode nextNode) {item = newItem;next= nextNode;
// set and get methods }
} // end of ChainNode
![Page 35: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/35.jpg)
35
Open hashing (separate chaining)
public class HashTable{
private final int HASH_TABLE_SIZE = 101; // size of hash table private ChainNode [] table; //hash table
private int size; //size of hash table
public HashTable() {table = new ChainNode [HASH_TABLE_SIZE];size =0;
}
public bool tableIsEmpty() { return size ==0;} public int tableLength() { return size;} public void tableInsert(KeyedItem newItem) throws
HashException {} public boolean tableDelete(Comparable searchKey) {} public KeyedIten tableRetrieve(Comparable searchKey) {}} // end of hashtable
![Page 36: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/36.jpg)
36
Open hashing (separate chaining)
tableInsert(newItem)if (table is not full) {
searchKey= the search key of newItem
i = hashIndex (searchKey)
node= reference to a new node containing newItem
node.setNext (table[I]);table[I] = node
}else //table full
throw new HashException ()
![Page 37: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/37.jpg)
37
Open hashing (separate chaining)tableRetrieve (searchKey)
i = hashIndex (searchKey)
node= table [I];
while ((node !=null)&& node.getItem().getKey()!= searchKey )
node=getNext ()
if (node !=null)return node.getITem()
elsereturn null
![Page 38: COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables III](https://reader034.vdocuments.site/reader034/viewer/2022051314/55142479550346ec488b5998/html5/thumbnails/38.jpg)
38
Evaluation of Chaining
Disadvantages of Chaining More complex to implement. Search and Delete are harder. We need to know: The
number of elements in the table (N); the number of buckets (B); the quality of the hash function
Worse case (O(n)) for searching
Advantage of Chaining Insertions is easy and quick. Allows more records to be stored.
The size of table is dynamic