data structures and algorithms searching algorithms

Data Structures and Data Structures and Algorithms Algorithms

Searching Searching AlgorithmsAlgorithms

M. B. FayekM. B. Fayek

CUFE 2006CUFE 2006

AgendaAgenda

1.1. IntroductionIntroduction

2.2. Sequential SearchSequential Search

3.3. Binary SearchBinary Search

4.4. Interpolation SearchInterpolation Search

5.5. Indexed SearchIndexed Search

1. Introduction1. Introduction What is a Search?What is a Search?

“ “ Searching is the task of finding a certain Searching is the task of finding a certain data item (record) in a large collection of data item (record) in a large collection of such items.” such items.”

A A key fieldkey field that identifies the item sought for that identifies the item sought for is given. (For simplification we consider only is given. (For simplification we consider only the key field instead of the complete record.) the key field instead of the complete record.)

If the item is If the item is foundfound either its location or the either its location or the complete item is returned.complete item is returned.

If the item is If the item is not foundnot found an indication is an indication is given, usually by returning a non-existing given, usually by returning a non-existing index such as -1.index such as -1.

AgendaAgenda






2. Sequential Search2. Sequential Search Sequential Search is also called Sequential Search is also called

Exhaustive SearchExhaustive Search because the because the complete collection is searched.complete collection is searched.

2. Sequential Search2. Sequential Search

15 13 6 20 21 17 8 41

NOYESKey item = list[i] ?

Keyitem = 20

i =0 i =1 i =2

return i =2 as found location !

2. Sequential Search2. Sequential Search The first implementation will be: The first implementation will be:

for i = 0 to n dofor i = 0 to n do

get next item Aiget next item Ai

if Ai == k return iif Ai == k return i

endforendfor

return -1return -1

2. Sequential Search2. Sequential Search Another pseudo code is: Another pseudo code is:

i =0i =0

while i < n and item Ai <> kwhile i < n and item Ai <> k

i <-- i+1i <-- i+1

if i < n if i < n

return ireturn i

else else

return -1 return -1

Check boundary conditionsCheck boundary conditions! ! ←←

2. Sequential Search2. Sequential Search How How is the algorithm implemented?is the algorithm implemented? The The way the collection is constructedway the collection is constructed

affects the way the next item Ai is affects the way the next item Ai is retrieved.retrieved.

In a In a static arraystatic array: Ai is the indexed item : Ai is the indexed item A[i]A[i]

In a In a linked listlinked list: Ai is the next node to be : Ai is the next node to be fetched by following the “next pointer” in fetched by following the “next pointer” in the present node. In this case usually the the present node. In this case usually the address of the node found (a pointer to the address of the node found (a pointer to the found node) is returned or a NULL pointer found node) is returned or a NULL pointer to indicate that it was not found to indicate that it was not found

In a In a filefile: Ai is the next record retrieved : Ai is the next record retrieved from the file from the file

2. Sequential Search2. Sequential Search ComplexityComplexity:: The The basic operationbasic operation is the is the comparisoncomparison For a collection of n data items there are For a collection of n data items there are

several several casescases:: Best caseBest case: item found at the first location: item found at the first location Number of comparisons = 1Number of comparisons = 1 Worst CaseWorst Case: item found at the last location : item found at the last location

or item not foundor item not found Number of comparisons = nNumber of comparisons = n Average caseAverage case = (1+n)/2 = (1+n)/2

2. Sequential Search 2. Sequential Search EnhancementsEnhancements

Sequential Search may be Sequential Search may be enhanced using several enhanced using several techniques:techniques:

1.1. Sorting before searching Sorting before searching (Presorting)(Presorting)

2.2. Sentinel SearchSentinel Search

3.3. Probabilistic SearchProbabilistic Search

2. Sequential Search 2. Sequential Search EnhancementsEnhancements1. Presorting1. Presorting AA good questiongood question to ask before searching is to ask before searching is

whether the collection iswhether the collection is sorted sorted or notor not?? HowHow do we use that info? If do we use that info? If sortedsorted the the

search is terminated as soon as the value of search is terminated as soon as the value of the indexed item in the collection exceeds the indexed item in the collection exceeds that of the search item.that of the search item.

What is the What is the effecteffect?? This will not affect the This will not affect the worst case of finding the element at the last worst case of finding the element at the last position, but it will decrease the average position, but it will decrease the average number of comparisons if logic position of the number of comparisons if logic position of the item were somewhere before the end of the item were somewhere before the end of the list and the element was not found.list and the element was not found.

A more efficient search is the binary search.A more efficient search is the binary search.


2. Sentinel Search2. Sentinel Search The basic loop in sequential sort include 2 The basic loop in sequential sort include 2 comparisons at each iterationcomparisons at each iteration

while( while( (i< n)(i< n) && && (key < > A (key < > A [ i ])[ i ]) ) )

To decrease the number of comparisons to one To decrease the number of comparisons to one per iteration a sentinel value = key is inserted at per iteration a sentinel value = key is inserted at the end of the array (beyond its end, i.e. at n) the end of the array (beyond its end, i.e. at n)

Hence the first comparison is redundant. The Hence the first comparison is redundant. The search will always stop finding key either within search will always stop finding key either within A (if it already existed) or outside A if it A (if it already existed) or outside A if it originally did not exist.originally did not exist.

A check on the location of key will indicate if it A check on the location of key will indicate if it existed or not. existed or not.


3. Probabilistic Search3. Probabilistic Search The basic idea here is that popular The basic idea here is that popular elements of the list that are searched for elements of the list that are searched for more frequently should require less more frequently should require less comparisons to findcomparisons to find

This is implemented by enhancing the This is implemented by enhancing the location of an element found in the array location of an element found in the array when searched for, one location ahead by when searched for, one location ahead by swapping it with the element before it.swapping it with the element before it.

Hence, each time an element is found the Hence, each time an element is found the number of comparisons needed to find it number of comparisons needed to find it next time is decremented by one next time is decremented by one

2. Sequential Search2. Sequential Search Modifying the Modifying the firstfirst sequential algorithm sequential algorithm

for the case of sorted list would be :for the case of sorted list would be :

for i = 0 to n dofor i = 0 to n do

if Ai > k return -1 // as list is if Ai > k return -1 // as list is sorted the sorted the

// possible location has been // possible location has been passedpassed

if Ai == k return iif Ai == k return i

return -1return -1

2. Sequential Search2. Sequential Search Modifying the Modifying the secondsecond sequential sequential

algorithm for the case of sorted list algorithm for the case of sorted list would be :would be :

i =0i =0

while i < n and next item while i < n and next item Ai < kAi < k

i <-- i+1i <-- i+1

if if Ai == k and i < nAi == k and i < n

return ireturn i

elseelse

return -1return -1

AgendaAgenda






3. Binary Search3. Binary Search How How does it work? does it work? Basic ideaBasic idea that dividing the list at that dividing the list at

each search step into 2 sublists and each search step into 2 sublists and checking the mid itemchecking the mid item the range to the range to be searched for possible location is be searched for possible location is either the left or right sublist (i.e. either the left or right sublist (i.e. desreased to half ).desreased to half ).

NoteNote however, that the however, that the determination of the determination of the middle itemmiddle item in the collection is a simple task if the in the collection is a simple task if the data collection is represented in memory by a data collection is represented in memory by a sequential array, whereas it is not so if the sequential array, whereas it is not so if the collection is represented using a linked list. Hence collection is represented using a linked list. Hence we will assume that the collection is a sequential we will assume that the collection is a sequential array.array.

2. Sequential Search2. Sequential Search

15 13 65 20 21 27 38 41

NOYESKey item = list[mid] ?

Keyitem = 20

n = 8 mid =4

return i =2 as found location !

Key item < list[mid]

mid =2

3 comparisons!

mid =3

Key item > list[mid]

3. Binary Search3. Binary Search For the For the same input and outputsame input and output specs as specs as

before before thethe algorithm is: algorithm is:

low = 0; high = n-1; low = 0; high = n-1; whilewhile (low < high) do (low < high) do

{{ mid = (low+high)/2 mid = (low+high)/2 ifif ( k < A [mid] ) then high = mid -1 ( k < A [mid] ) then high = mid -1

else if else if ( k > A [mid] then low = ( k > A [mid] then low = mid +1mid +1

elseelse return mid // found return mid // found }}

return -1 // not foundreturn -1 // not found

3. Binary Search3. Binary Search Complexity:Complexity: For a collection of n data items:For a collection of n data items: In each step: the mid item is compared to In each step: the mid item is compared to

k and the range of search is divided by 2k and the range of search is divided by 2 This is repeated until the range is zero (at This is repeated until the range is zero (at

the worst case).the worst case). i.e. we should i.e. we should askask: how many times will we : how many times will we

divide n by 2 till the length of sublists is divide n by 2 till the length of sublists is zero?zero?

→ → loglog22 n … which is better than n … which is better than nn

AgendaAgenda






4.Interpolation Search4.Interpolation Search What What is meant by interpolation?is meant by interpolation? Here we try to Here we try to guess more preciselyguess more precisely

where the search key resides. where the search key resides. Instead of calculating the middle as Instead of calculating the middle as

the physical middle (low+high)/2 it is the physical middle (low+high)/2 it is calculated in a weighted manner w.r.t. calculated in a weighted manner w.r.t. to the value of k relative to max and to the value of k relative to max and min values in the listmin values in the list

)(][][

][lowhigh

lowAhighA

lowAklowmid

4. Interpolation Search4. Interpolation Search AnalysisAnalysis: : Calculations are more complex for midCalculations are more complex for mid Significant Improvement in search Significant Improvement in search

time especially when values of data time especially when values of data items in collection are evenly items in collection are evenly distributed.distributed.

AgendaAgenda






5. Indexed Search5. Indexed Search

WhatWhat is an index? is an index? Similar to the index of a book (e.g. Similar to the index of a book (e.g.

telephone book), items in the index point telephone book), items in the index point to significant items in the collection.to significant items in the collection.

This implies that in this search an This implies that in this search an additional table is used … the index additional table is used … the index table, where each item in the index table table, where each item in the index table points to a specific location in the points to a specific location in the original search list.original search list.


Algorithm:Algorithm:// // InputInput: Search array A of n items + index table of d items : Search array A of n items + index table of d items

+ key item k+ key item k

////OutputOutput: Location of item with search key or false key: Location of item with search key or false key

Step 1Step 1: : Determine search rangeDetermine search range for key for key within index table by specifying (within index table by specifying (iiminmin to to iimaxmax) inside original search list ) inside original search list

Step 2Step 2: : Search sequentiallySearch sequentially for key in for key in range (irange (iminmin to i to imaxmax) ) inside original search inside original search listlist


Algorithm:Algorithm:

1111

22

3388

55

7711

88

9922

1111

44

77

1111

1155

1177

3388

5533

6677

7711

7744

8833

9922

Index

Table

Searching for key =53

{

00

11

22

33

44

55

66

77

88

99

1100

1111

Pos

Step Step 11

Step Step 22

Pos = Pos = 55++11= 6= 6

1


Analysis: Assuming that: Analysis: Assuming that: the original table is of size nthe original table is of size n Index is of size dIndex is of size dStep 1Step 1: Determine search range has average : Determine search range has average

complexity:complexity:O( d/2)O( d/2)

Step 2Step 2: Search for key in : Search for key in range (irange (iminmin to i to imaxmax)) inside original search list, assume average inside original search list, assume average range length = n/krange length = n/k

)2

/

2(

dndOtygeComplexiTotalAvera

data structures and algorithms searching algorithms

Documents