cs121 data structures cs121 © jas 2004 tables an abstract table, t, contains table entries that are...

34
CS121 © JAS 2004 CS121 Data Structures Tables An abstract table, T, contains table entries that are either empty, or •pairs of the form (K, I) where K is a key and I is some information associated with key K.

Upload: jody-potter

Post on 03-Jan-2016

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

TablesAn abstract table, T, contains table entries that are either

•empty, or

•pairs of the form (K, I) where K is a key and I is some information associated with key K.

Page 2: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

The operations we may require are:1. Initialise the table, T, to be the empty table. The empty table can be seen as filled with entries (Ko, Io)

where Ko is a special empty key, distinct from all other nonempty keys.

2. Determine whether or not the table, T, is full.3. Insert a new entry (K,I), into the table, T, provided

T is not already full.4. Delete the table entry (K,I) from the table T.5. Given, K, retrieve the information, I, from the

entry (K,I) 6. Update the table entry (K,I) in table, T, by

replacing it with a new table entry (K,I') 7. Enumerate the table entries (K,I) in table, T, in

increasing order of their keys, K.

Page 3: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Implementing a Table using a (Binary Search) Tree

• Initialise – create empty tree• Empty – test for empty tree (root is nil)• Full – tree is full only when memory exhausted• Insertion – add entry to tree • Deletion – delete a node (and reshape tree)• Retrieve – search tree• Update – search for entry and modify it• Enumerate – traverse tree in appropriate orderInitialise, Empty – O(1)Insert, Delete, Retrieve, Update – O(logn)Enumerate – O(n)

Page 4: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Hash Tables

Principle – calculate a value (hash value) from the key to determine the position of the entry in the table

If the key field is alphabetic a simple hash function is to associate an integer value with each letter; sum them and return a value modulo the table size

Page 5: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

• Associate the value 1 with a, 2 with b, and so on up to 26 with z

• Let the table size be 10 (indexes 0..9)

cat3+1+20 = 24mod 10 = 4

cat

dog4+15+7 = 26mod 10 = 6

test 20+5+19+20 = 64mod 10 = 4

dog

test

Page 6: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

• Unless our table size is such that it is big enough to contain all possible entries we will get collisions

Page 7: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Probability of Collisions

von Mises Birthday Paradox • for 23 or more people, there is a greater than 50%

chance that two or more will have the same birthday.• for 88 or more – three or more will have the same

birthday.• Hence for a 365 entry table (based on birthdays),

after 23 insertions there will be a greater than 50% chance that an insertion will result in a collision.

• This is counter-intuitive because the table is in fact only 23/365 = 6.3% full.

• This is known as the load factor.

Page 8: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

The Theory

For M people – probability same birthday P(M) – probability they don’t Q(M). Q(M) = 1 – P(M).For M = 1 Q(1) = 1 (nobody to share with!)For M = 2 Q(2) = 364/365

(only 364 days not the other person’s birthday)

For M = 3 Q(3) = (364x363) / (3652)For M > 1Q(M) = (364x353x…x(366-M)) / (365 (M-1))

P(22) = 0.476P(23) = 0.507

Page 9: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

• When a collision occurs we need to find an alternative space for the entry

• The simplest method is a linear search – look forward for the next empty space – open addressing

Page 10: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Primary clustering

Page 11: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Solution to clustering is to use a different search path for different key values - double hashing

If the initial hash function is based on remainder then a typical second hash would be based on the quotient

if a <> b and a mod t = b mod t

then a div t <> b div t

Page 12: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

cat3+1+20 = 24mod 10 = 4

cat

dog4+15+7 = 26mod 10 = 6

dogtest

test

Recall our initial example and let the rehash be divide by table size

test 20+5+19+20 = 64mod 10 = 4div 10 = 6

Page 13: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

The same principles of hashing and collision resolution are used for both insertion and retrieval/update.

The series of entries that are examined to see if they are empty is called the probe sequence.

Insertion terminates when we find an empty entry to insert into.

Retrieval/update terminates when either we find the entry we are looking for, or we have searched the whole table.

Page 14: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Need to ensure that all possible entries are probed.

Obvious for a linear probe sequence (increment by 1).

For a double hash probe sequence it is less obvious unless the table size is a prime number (or table is even and probe step size is odd).

To check whole table has been searched we can count the number of probes.

Alternative is define a full table as one with one empty space. A search thus ends when an empty space has been found.

This requires the insertion routine to know that it doesn’t fill the last space.

Page 15: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Some code:-

Hashing by Open Addressing

public class OpenAddress{ private Object[] Elt; private int Tablesize; private Object EmptyCell;

Page 16: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

OpenAddress(int tablesize)

{ int index;

Tablesize = tablesize;

EmptyCell = null;

Elt = new Object[tablesize];

for (index=0;index<tablesize;++index)

Elt[index] = EmptyCell;

}

Page 17: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

public void Store (Object newElt) throws OAException{ int index = Hash (newElt.key); probe = Hash2 (newElt.key); for(int cnt = 1; cnt < Tablesize; cnt++){ if (Elt[index].equals(EmptyCell))

{Elt[index] = newElt; return;} else if (Elt[index].key.equals(newElt.key)) throw new OAException(“already in”); else index = (index+probe)%Tablesize; } throw new OAException(“table full”);}

Page 18: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

public Object Retrieve(Object SearchKey) throws OAException{ int index = Hash(Searchkey); probe = Hash2(Searchkey); for (int cnt = 1; cnt < Tablesize; cnt++0 { if (Elt[index].equals(EmptyCell)) throw new OAException(“not in table”); else if (Elt[index].key.equals(SearchKey)) return Elt[index]; else index = (index + probe)%Tablesize; } throw new OAException(“not in table”);}

Page 19: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

cat

dog

testcat

dog

test

Alternative approach is to use chaining

Page 20: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Operations on a table

Initialisation • Chaining – set chain pointers to nil – O(n)• Open addressing – set keys to empty key – O(n)

Insert, Retrieval, Update• Chaining – use hash function, then search linked

list – O(s) where s is average list length• Open addressing – use hash functions to find

entry – O(k) where k is some constant based on

complexity of hash functions

Page 21: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Deletion

• Chaining – delete entry from list – O(s)• Open addressing – must mark entry has

having been deleted to avoid stopping search early – O(k)

Enumeration• Not normally feasible

Page 22: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Design of Hash functions – step one generate a key value – preconditioning

(1) Using codes for letters –

a=1 ascii posn

cat 3+1+20=24 99+97+116=312 3*1282+2*1281+20*1280

=49428

tac 20+1+3=24 116+97+99=312 20*1282+2*1281+3*1280

=327939

Page 23: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

(2) Using bit patterns

cat 1100011 1100001 1110100 = 1634548

tac 1110100 1100001 1100011 = 1913059

If the bit patterns get too long then a section of the bit pattern can be selected to use for the hash function.

Page 24: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Step two – generate an address from the key value

• Division by table size• Folding – The key is divided into sections, and the sections

are added together. – if Key = 013402122 ; hash(Key) = 013+402+122 = 537– addition, subtraction and multiplication could be used

• Midsquare – square the key and select a number of bits from the middle– if Key = 12345; Key2 = 152399025; select middle – 399

• Truncate – select last n digits– if Key = 13402122; select last 3 digits – 122

Page 25: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Hash Table Reordering

When a hash table is nearly full many items are not at the locations given by their hash address so many comparisons are made to locate them, and if an item is not present an entire list of rehash positions must be searched.

To address this problem the table can be reordered.

Page 26: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Ordered Hash Tables (Amble and Knuth)

All items that hash to the same address are maintained in descending order of the key. Assume that a nil entry is less than all possible keys.

A search can then terminate whenever a key less than that being searched for is found.

On insertion when a collision occurs the keys are compared and the rehash/search for free space process continues with the smaller key.

Page 27: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

0

1

2

3

4

5

6

7

8

H(cat)=(3+1+20)/9=24/9=6; rh=2

H(act)=(1+3+20)/9=24/9=6; rh=2

H(tac)=(20+1+3)/9=24/9=6; rh=2

cattac

act

Page 28: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Ordered Hash Tables reduce the average number of probes to determine an entry is in the table.

However, the number of probes for insertion is not reduced and equals the number required for an unsuccessful search in an unordered table.

Also ordered tables require significant data movement.

Page 29: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Brent’s Method

Works with double hashing.

Rehash search key until empty slot found.

Then consider all keys in rehash path and determine if placing one of these in an empty slot would require fewer rehashes.

If so place search key in this position and move existing key to empty rehash slot.

Page 30: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

0

1

2

3

4

5

6

7

8

Page 31: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Brent’s method reduces the average number of probes for a successful search, but not for an unsuccessful search.

Insertion is also more complex.

Brent’s method can be applied recursively – the displaced item is again inserted using Brent’s method (checking keys on rehash path for possible replacement – but if carried to extreme would yield unacceptably long insertion times.

Page 32: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

Consider a hash function which used the initial letter of the key word and inserted into a table of size 26

If collisions are resolved by chaining we have an index table – indexed by initial letter

If the chains are replaced by the same hash structure using the second letter of the key word as the hash function- we have a 26-ary tree

Page 33: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures

With an index table (or a 26-ary tree!) we regain the property that we can list the entries in sorted form

However, it is also possible to combine hash tables, trees and index tables

Eg Use an index table according to first letter, then use hash tables for each letter sub-table

Page 34: CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is

CS121 © JAS 2004

CS121 Data Structures