ch8.dictionaries hashing.handouts

8/8/2019 Ch8.Dictionaries Hashing.handouts

1/7

Dictionaries 2/19/08 14:

2/19/08 14:15 Dictionaries 1

Dictionaries & Hashing (Ch 8)

01234 451-229-0004

981-101-0002

025-612-0001


Outline and Reading

Dictionary ADT (8.1)

Hash Tables (8.2)

Ordered Dictionaries (8.3)

Binary search (8.3.3)

Lookup table (8.3.2)

Skip Lists (8.4)


Dictionary ADT (8.1.1)The dictionary ADT models asearchable collection of key-element items

The main operations of adictionary are searching,inserting, and deleting items

Multiple items with the samekey are allowed

Applications: address book

credit card authorization

mapping host names (e.g.,cs16.net) to internet addresses(e.g., 128.148.34.101)

Dictionary ADT methods:

find(k): if the dictionary has

an item with key k, returnsthe position of this element,

else, returns a null position.

insertItem(k, o): inserts item(k, o) into the dictionary

removeElement(k): if thedictionary has an item withkey k, removes it from thedictionary and returns its

element. An error occurs ifthere is no such element.

size(), isEmpty()

keys(), Elements()


Unordered Sequence- Log File (8.1.2)

A log file is a dictionary implemented by means of an unsortedsequence

We store the items of the dictionary in a sequence (based on a doubly-linked lists or a circular array), in arbitrary order

Performance: insertItem takes O(1) time since we can insert the new item at the

beginning or at the end of the sequence

find and removeElement take O(n) time since in the worst case (the item isnot found) we traverse the entire sequence to look for an item with thegiven key

Space - can be O(n), where n is the number of elements in the dictionary

The log file is effective only for dictionaries of small size or fordictionaries on which insertions are the most common operations, whilesearches and removals are rarely performed (e.g., historical record oflogins to a workstation)


Direct Address Table

A direct address table is a dictionary in which

The keys are in the range {0,1,2,,N}

Stored in an array of size N - T[0,N]

Item with key k stored in T[k]

Performance: insertItem, find, and removeElement all take O(1) time

Space - requires space O(N), independent of n, the number ofitems stored in the dictionary

The direct address table is not space efficient unless the rangeof the keys is close to the number of elements to be stored inthe dictionary, I.e., unless n is close to N.


Hash Tables

Hashing

Hash table (an array) of size N, H[0,N]

Hash function h that maps keys to indices in H

Issues Hash functions - need method to transform key to an index in H

that will have nice properties.

Collisions - some keys will map to the same index of H (otherwisewe have a Direct Address Table). Several methods to resolve thecollisions

Chaining - put elts that hash to same location in a linked list

Open addressing - if a collision occurs, have a method to select anotherlocation in the table.

Probe sequences


2/7



Hash Functions andHash Tables (8.2)

Ahash functionh maps keys of a given type to

integers in a fixed interval [0,N1]Example:

h(x) =xmodN

is a hash function for integer keys

The integer h(x) is called the hash value of key x

A hash table for a given key type consists of

Hash function h

Array (called table) of size N

When implementing a dictionary with a hash table, the goal is tostore item (k, o) at index i=h(k)


Example

We design a hash table for

a dictionary storing items(SSN, Name), where SSN(social security number) is anine-digit positive integer

Our hash table uses anarray of sizeN= 10,000 andthe hash functionh(x) = last four digits ofx

0

1234

999799989999

451-229-0004

981-101-0002

200-751-9998

025-612-0001


Hash Functions (8.2.2)

A hash function isusually specified as the

composition of twofunctions:

Hash code map: h1:keysintegers

Compression map: h2: integers [0,N1]

The hash code map isapplied first, and thecompression map isapplied next on theresult, i.e.,

h(x) = h2(h1(x))

The goal of the hashfunction is todisperse the keys inan apparently randomway


Hash Code Maps (8.2.3)Memory address: We reinterpret the memory

address of the key object asan integer

Good in general, except fornumeric and string keys

Integer cast: We reinterpret the bits of the

key as an integer

Suitable for keys of lengthless than or equal to thenumber of bits of the integertype (e.g., char, short, intand float on many machines)

Component sum: We partition the bits of

the key into componentsof fixed length (e.g., 16or 32 bits) and we sumthe components(ignoring overflows)

Suitable for numeric keysof fixed length greaterthan or equal to thenumber of bits of theinteger type (e.g., longand double on many

machines)


Hash Code Maps (cont.)

Polynomial accumulation: We partition the bits of the key

into a sequence of componentsof fixed length (e.g., 8, 16 or 32bits)

a0 a1 an1 We evaluate the polynomial

p(z)= a0+a1z +a2z2+

+an1zn1

at a fixed valuez, ignoringoverflows

Especially suitable for strings(e.g., the choice z=33gives atmost 6 collisions on a set of50,000 English words)

Polynomial p(z) can beevaluated in O(n) time

using Horners rule:

The followingpolynomials aresuccessively computed,each from the previousone in O(1) time

p0(z)= an1

pi(z)= ani1+zpi1(z)

(i=1, 2, , n 1)

We havep(z) = pn1(z)


Compression Maps (8.2.4)

Division: h2 (y) = y mod N

The sizeNof thehash table is usuallychosen to be a prime

The reason has to dowith number theory

and is beyond thescope of this course

Multiply, Add andDivide (MAD): h2 (y) =(ay + b)mod N

a and b arenonnegative integerssuch thata mod N 0

Otherwise, everyinteger would map tothe same value b


3/7



Collision Handling(8.2.5)

Collisions occur whendifferent elements aremapped to the samecell

Chaining: let eachcell in the table pointto a linked list ofelements that mapthere

Chaining is simple,but requiresadditional memoryoutside the table

0

1234 451-229-0004 981-101-0004

025-612-0001


Exercise: chaining

Assume you have a hash table H withN=9 slots (H[0,8]) and let the hashfunction be h(k)=k mod N.

Demonstrate (by picture) the insertionof the following keys into a hash tablewith collisions resolved by chaining. 5, 28, 19, 15, 20, 33, 12, 17, 10


Linear ProbingOpen addressing: thecolliding item is placed in adifferent cell of the table

Linear probing handlescollisions by placing thecolliding item in the next(circularly) available table cell.So the i-th cell checked is: H(k,i) = (h(k)+i)mod N

Each table cell inspected isreferred to as a probe

Colliding items lump together,causing future collisions tocause a longer sequence of

probes

Example: h(x) = xmod13

Insert keys 18, 41, 22,44, 59, 32, 31, 73, in thisorder

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18445932223173

0 1 2 3 4 5 6 7 8 9 10 11 12


Search with Linear Probing

Consider a hash tableAthat uses linear probing

find(k) We start at cell h(k)

We probe consecutivelocations until one of thefollowing occurs An item with key kis

found, or

An empty cell is found,or

Ncells have beenunsuccessfully probed

Algorithmfind(k)

i h(k)

p0

repeat

c A[i]

ifc=

returnPosition(null)

else ifc.key () =k

returnPosition(c)

else

i(i+1)mod N

p p+1

until p=N

returnPosition(null)


Updates with Linear ProbingTo handle insertions anddeletions, we introduce aspecial object, called

AVAILABLE, which replacesdeleted elements

removeElement(k) We search for an item with

key k

If such an item (k, o) isfound, we replace it with thespecial itemAVAILABLEand we return the position ofthis item

Else, we return a nullposition

insertItem(k, o) We throw an exception

if the table is full

We start at cell h(k)

We probe consecutivecells until one of thefollowing occurs A cell i is found that is

either empty or storesAVAILABLE, or

Ncells have beenunsuccessfully probed

We store item (k, o) incell i


Exercise: open addressing &linear probing

Assume you have a hash table H withN=11 slots (H[0,10]) and let the hashfunction be h(k)=k mod N.

Demonstrate (by picture) the insertionof the following keys into a hash tablewith collisions resolved by linearprobing. 10, 22, 31, 4, 15, 28, 17, 88, 59


4/7



Double HashingDouble hashing uses asecondary hash function d(k)

and handles collisions byplacing an item in the firstavailable cell of the seriesh(k,i) =(h(k)+id(k)) modNfor i= 0, 1, ,N1

The secondary hash functiond(k) cannot have zero values

The table size Nmust be aprime to allow probing of allthe cells

Common choice of

compression map for thesecondary hash function:

d2(k) =qkmod q

where q


5/7


6/7



What is a Skip ListAskip list for a set Sof distinct (key, element) items is a series of lists S0,S1 , ,Sh such that

Each listSi contains the special keys + and ListS0 contains the keys ofSin nondecreasing order

Each list is a subsequence of the previous one, i.e.,S0S1 Sh

ListSh contains only the two special keys

We show how to use a skip list to implement the dictionary ADT

56 64 78 +31 34 44 12 23 26

+

+31

64 +31 34 23

S0

S1

S2

S3


SearchWe search for a keyxin a a skip list as follows:

We start at the first position of the top list

At the current positionp, we comparexwithy

key(after(p))x= y: we return element(after(p))

x> y: we scan forward

x< y: we drop down

If we try to drop down past the bottom list, we return NO_SUCH_KEY

Example: search for 78

+

S0

S1

S2

S3

+31

64 +31 34 23

56 64 78 +31 34 44 12 23 26


Randomized AlgorithmsArandomized algorithmperforms coin tosses (i.e.,uses random bits) to controlits execution

It contains statements of thetype

brandom()

if b= 0

do A

else { b= 1}

do B

Its running time depends onthe outcomes of the cointosses

We analyze the expectedrunning time of a randomizedalgorithm under the followingassumptions

the coins are unbiased, and

the coin tosses are independent

The worst-case running time ofa randomized algorithm is oftenlarge but has very lowprobability (e.g., it occurs whenall the coin tosses give heads)

We use a randomized algorithmto insert items into a skip list


To insert an item (x, o) into a skip list, we use a randomizedalgorithm: We repeatedly toss a coin until we get tails, and we denote with i

the number of times the coin came up heads

Ifi h, we add to the skip list new lists Sh+1, ,Si+1, eachcontaining only the two special keys

We search for xin the skip list and find the positions p0, p1 , ,piof the items with largest key less thanxin each list S0,S1, ,Si

For j 0, , i, we insert item (x, o) into list Sj after positionpjExample: insert key 15, with i= 2

Insertion

+ 10 36

+

23

23 +

S0

S1

S2

+

S0

S1

S2

S3

+ 10 362315

+ 15

+ 2315p0

p1

p2


Deletion

To remove an item with key xfrom a skip list, we proceed asfollows:

We search for xin the skip list and find the positions p0, p1 , ,piof the items with key x, where positionpj is in list Sj

We remove positions p0, p

1, ,p

i

from the listsS0,S

1, ,S

i

We remove all but one list containing only the two special keys

Example: remove key 34

+4512

+

23

23 +

S0

S1

S2

+

S0

S1

S2

S3

+4512 23 34

+34

+23 34p0

p1

p2


Implementation

We can implement a skip listwith quad-nodes

A quad-node stores:

item

link to the node before link to the node after

link to the node below

link to the node after

Also, we define special keysPLUS_INF and MINUS_INF,and we modify the keycomparator to handle them

x

quad-node


7/7



Space Usage

The space used by a skip listdepends on the random bits

used by each invocation of theinsertion algorithm

We use the following two basicprobabilistic facts:Fact 1: The probability of getting i

consecutive heads whenflipping a coin is 1/2i

Fact 2: If each ofn items ispresent in a set withprobabilityp, the expected sizeof the set is np

Consider a skip list with n items

By Fact 1, we insert an item in

list Si with probability 1/

2i

By Fact 2, the expected size oflist Si is n/2

i

The expected number of nodes

used by the skip list is

nnn

h

i

i

h

i

i2

2

1

200

ch8.dictionaries hashing.handouts

Documents