data structures and algorithms - anisnazer.com filedata structures and algorithms hashing eng. anis...

32
Data Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Upload: others

Post on 31-Aug-2019

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Data Structures And Algorithms

Hashing

Eng. Anis NazerFirst Semester 2016-2017

Page 2: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Searching

● Search: find if a “key” exists in a given set● Searching algorithms:

– linear (sequential) search– binary search– Search based on a hash function

Page 3: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Linear/sequential Search● Algorithm:

– go through the elements one by one, if found, return

● Code:

● What is the complexity ?

bool linearSearch( int A[], int size, int key){ for ( i=0 ; i < size ; i++) if (A[i] == key ) return true; return false;}

Page 4: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Binary Search● Assumption: the array elements are sorted● Algorithm:

– compare key with element at the middle● if ( key == element)

– return true;● if ( key > element )

– search left sub array● else

–search right sub array● Question: when to stop? how to determin key is not found?● What is the complexity ?

Page 5: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Binary Search

Code:

bool binarySearch( int A[], int size, int key){ int L = 0 , R = size – 1; int M = (L+R) / 2; while ( L <= R ) { if if ( key == A[M] ) return true; else ( key > A[M] ) L = M+1; else R = M – 1; M = (L+R)/2; } return false;}

Page 6: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Hash function● Hash function is a function that gives the result based

on the input or part of the input.● Example of a hash function:

f(x) = x % 10● Assume we store the elements in an array based on the

hash function

– the index of value “x” is f(x)– A[ f(x) ] = x

Page 7: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Hash function● Example: store the following in an array of size 10,

given that the hash function is

f(x) = x % 10

1 , 18 , 15, 930, 77, 29

● is 44 in the array ?

f(44) = 44 % 10 = 4, A[4] is empty → 44 not in array

0 1 2 3 4 5 6 7 8 9

930 1 15 77 18 29

Page 8: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Hash function● What is the advantage of using a hash function ?● What is the problem when using a hash function ?

– two inputs hash to the same value● ex. f(x) = x % 10

f(15) = 5f(225) = 5

● What to do if two values hash to the same index?

Page 9: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision● Collision: when two distinct values v1 and v2

hash to the same index● How to deal with collisions?

– Use a perfect hash function:● i.e. no two values hash to the same index● this is practically impossible since the data is

unknown● A good hash function is a function that avoids

collisions

Page 10: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Hash functions● Some examples of hash functions:

– Division– Folding– Mid-Square– Extraction– Radix transformation

Page 11: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Hash functions● Division: based on the modulo operator:

– h(x) = x % (array size)– It is better to have array size a prime number

Page 12: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Hash functions● Folding: the key is divided into parts, and the

parts are processed to generate the index (address)

– Example: divide the key into parts of three digits, then add the digits, then take the modulo array sizeID = 199805535, array size = 101

h(199805535) = (199 + 805 + 535 ) % 101= 24

Page 13: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Hash functions● Mid-Square: The key is squared and the middle is taken

Example: key = 3121 , size = 1000

3121^2 = 9740641,

middle = 406● It is better to use a power of 2 size and use the middle of

the binary representation

Example: key = 3121 , size = 1024

3121^2 = 9740641 = 100101001010000101100001

→ h(3121) = 0101000010 = 322

Page 14: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Hash functions● Extraction: take a part of the key,

Example: take the first 4 digits of the ID number:

h(199805535) = 5535● This method is a useful when part of the key is

common in the data,

– ID numbers usually start with the same digits

Page 15: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Hash functions● Radix transformation: the key is converted to

another number system, and the value is divided modulo array size:

Example: key = 345 , size = 100, base 9

h(345) = ( (423) % 100 ) = 23

h(245) = ( (309) % 100 ) = 9

Page 16: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● Collision: two keys hash to the same address (index)● How to deal with collision:

– Use a perfect hash function, not practical– Open addressing: Find an availble position to place the

colliding key● linear probing● quadratic probing● double hashing

– Chaining: use a linked list to store the keys

Page 17: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● Linear probing: look for the next available

position, wrap around the end of the array● Ex. h(x) = x % 10 , size = 10

16, 22, 77, 48, 35, 62, 47, 99

0 1 2 3 4 5 6 7 8 9

Page 18: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● Linear probing tends to create clusters.

– elements tend to group near each other● The empty position following a cluster has a

higher chance to be filled.

– this is proportional to the cluster size,– the bigger the cluster, the higher the

probability

Page 19: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● Quadratic probing: look for positions using a

quadratic formula:

h(x) + i

i = 1 , -1 , 4, -4, 9, -9, ….● Ex. h(x) = x % 10 , size = 10

16, 22, 77, 48, 35, 62, 47, 99

0 1 2 3 4 5 6 7 8 9

Page 20: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● Assume key = 9, h(x) = x %19 and the array

is full except A[3], what is the sequence of indices (probes) that are tried?

● Quadratic probing avoids clustering but will generate “secondary clusters” since two elements that hash to the same index, will generate the same probe sequence

Page 21: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● How to know when to stop if the key is not in

the array ?● If the size of the array is a prime number of the

form 4j + 3 , where j is an integer, the probing sequence is guarenteed to cover all the indices

Page 22: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● Double hashing: if a collision occures, use another

hash function● probe sequence:

h(x), h(x)+h2(x), h(x) + 2h2(x), h(x)+3h2(x)● Example:

– h(x) = x%19– h2(x) = x%13– What are the probe sequences for x = 3, x = 22

Page 23: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Comparison

Page 24: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017
Page 25: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● Chaining: store a pointer to a linked list in the

array, and store the data in the linked list● The list can be sorted for efficiency● Chaining requires more space to store the

pointers

Page 26: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● Separate chaining:

Page 27: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Collision resolution● Coalesced chaining:

– 2D array: Size x 2 → A[size][2]– the second column stores the index of the next element

in the chain● Example: store the following data,

12, 23, 15, 72, 49, 35, 9, 22

h(x) = x % 10

-2 → position is available

-1 → element is last in the chain

collision resolution: linear probing

Page 28: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Example

12, 23, 15, 72, 49, 35, 9, 220

1

2

3

4

5

6

7

8

9

Page 29: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Example

12, 23, 15, 72, 49, 35, 9, 220 9 -1

1 -2

2 12 4

3 23 -1

4 72 7

5 15 6

6 35 -1

7 22 -1

8 -2

9 49 0

Page 30: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Deletion● What happens if you delete a value from a

hash table ?

Example: arrange the data: 11, 34, 62, 4, 91

– use h(x) = x%10, and linear probing– then delete data 34, 62– then search for 4

0 1 2 3 4 5 6 7 8 9

Page 31: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

Deletion● The position of the deleted item should not be

marked as empty, why ?● Can we reuse the position of the deleted element ?● if you have many delete operations and few insert

operations, you should rehash the table after a number of deletions

● Rehash: arrange the data using a different table size and/or different hash function

Page 32: Data Structures And Algorithms - anisnazer.com fileData Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2016-2017

THE END●