search algorithms prepared by john reif, ph.d. analysis of algorithms

Search Algorithms

Prepared by

John Reif, Ph.D.

Analysis of Algorithms

Search Algorithms

a) Binary Search: average caseb) Interpolation Searchc) Unbounded Search (Advanced material)

Readings

• Reading Selection:• CLR, Chapter 12

Binary Search Trees

• (in sorted Table of) keys k0, …, kn-1

• Binary Search Tree property:at each node x

key (x) > key (y) y nodes on left subtree of x

key (x) < key (z) z nodes on right subtree of x

Binary Search Trees (cont’d)


• Assume1) Keys inserted into tree in random order2) Search with all keys equally likely

length = # of edges n + 1 = number of leaves

• Internal path length I= sum of lengths of all internal paths of length > 1 (from root to nonleaves)


• External path length E= sum of lengths of all external

paths (from root to leaves) = I + 2n

Successful Search

• Expected # comparisons

n

n-1

i

0

C = 1 + (I/n)

= (C' 1) / ni

Unsuccessful Search

• Expected # comparisons

n

n

n-1

i

i=0

n

i=0

C' = E/(n+1) = (I+2n)/(n+1)

= (n C +n)/(n+1)

= (C' 2) / (n+1)

2 = 2 ln(n)

(i+1)

= 1.386 log n

Model of Random Real Inputs over an Interval

• InputSet S of n keys each independently randomly chosen over real interval |L,U| for 0 < L < U

• Operations• Comparison operations , operations:

r =largest integer below or equal to r r = smallest integer above of equal to r

Sorting and Selection with Random Real Inputs

• Results 1) Sorting in 0(n) expected time

2) Selection in 0(loglogn) expected time

Bucket Sorting with Random Inputs

ii

i 1 n B[i] empty list

n(x -L) i 1 n add x to B 1

(U-L)

i 1 to n do sort (B[i])

B[1] B[2] B[n]

begin

for to do

for to do

for

output

end

Input set of n keys, S randomly chosen over [L,U]

AlgorithmBUCKEY-SORT(S):

Bucket Sorting with Random Inputs (cont’d)

• Theorem The expected time T of BUCKET-SORT is 0(n)

• Generalizes to case keys have nonuniform distribution

-j

n-j

j=0

|B[i]| is 1

variable with parameters n, p = n

Hence c>1 i,j Prob {|B[i]| > j} < c

So T n c (jlogj) = O(n)

Proofupper bounded by a Binomial

Random Search Table

• Table X = (x0 < x1 < …< xn < xn+1) where x1, …, xn random reals chosen independently from real interval (x0, xn+1)

• Selection ProblemInput key Y

Problem find index k* with Xk* = Y

Note k* has Binomial distribution with parameters n,p = (Y-X0)/(Xn+1-X0)

AlgorithmPSEUDO INTERPOLATION-SEARCH (X,Y)

Random TableX = (X0 , X1 , …, Xn , Xn+1)

Algorithmpseudo interpolation search (X,Y)

[0] k pn where p = (Y-X0)/(Xn+1-X0)

[1] if Y = Xk then return k

Algorithm PSEUDO INTERPOLATION-SEARCH (X,Y) (cont’d)[2] If Y > Xk then

output pseudo interpolation search (X’,Y)

where

k'+ n

k' = k, k+ n , k+2 n ,...

Y < X exit with

for

if then

k' k'+ nX' = (X ,...,X )

Algorithm PSEUDO INTERPOLATION-SEARCH (X,Y) (cont’d)[3] Else if Y < Xk then

output pseudo interpolation search (X’,Y)

where

k'- n

k' = k, k- n , k-2 n ,...

Y > X exit with

for

if then

k'k'- nX" = (X ,..., X )

Probability Distribution of Pseudo Interpolation Search

Probabilistic Analysis of Pseudo Interpolation Search

• k* is Binomial with mean pn variance 2 = p(1-p)n

• So approximates normal as n

• Hence

*k pn

*k pnProb Z (Z)/Z

2-Z

2ewhere (Z) =

2

Probabilistic Analysis of Pseudo Interpolation Search (cont’d)• So Prob ( i probes used in given call)

• where

*

i i

< Prob(|k pn | > (i - 2) n )

(Z )/Z

i

(i - 2) n (i - 2)Z = 2(i - 2)

p(i-p)

1 since p(1-p)

4

Probabilistic Analysis of Pseudo Interpolation Search (cont’d)• Lemma C 2.03 where

C = expected number of probes

in given call

i>1

i>1

i ii 3

C = i Prob(i probes used)

= Prob( i probes used)

2 (Z )/Z 2.03

proof

Probabilistic Analysis of Pseudo Interpolation Search (cont’d)• Theorem

Pseudo Interpolation Search has expected time

T(n) C + T( n )

C loglogn

proof

T(n) C loglogn

Algorithm INTERPOLATION-SEARCH (X,Y)

1) Initialize k np comment k = E(k*) 2) If Xk = Y then return k

3) If Xk < Y then

output INTERPOLATION-SEARCH (X’,Y) where X’ = (xk, …, xn+1 )

4) Else Xk > Y and

output INTERPOLATION-SEARCH (X”,Y) where X” = (x0, …, xk )

Probability Distribution of INTERPOLATION-SEARCH (X,Y)

Advanced Material: Probabilistic Analysis of Interpolation Search

• Lemma

• ProofSince k* is Binomial with parameters p,n

*α

1Prob(|k pn | 0( nlogn))

nwhere α is constant

2-Z

2*

α

2

2 e 1Prob(|k pn| Zσ)

nZ 2

for = p(1-p)n and Z = 0( logn )

Probabilistic Analysis of Interpolation Search (cont’d)

• Theorem• The expected number of comparisons

of Interpolation Search is

21T(n) loglogn + c (logloglogn)

Probabilistic Analysis of Interpolation Search (cont’d)

• Proof

α α

21

21

21

1 nT(n) 1+ (1- ) T (O( n log n ))

n n

1 loglog n log n + c logloglog n log n ) o(1)

1 1 log logn + c (logloglogn)

2

loglogn + c (logloglogn) since log2=1

Advance Material:Unbounded Search

• Input table X[1], X[2], …where for j = 1, 2, …

• Unbounded Search Problem• Find n such that X[n-1] = 0 and

X[n]=1• Cost for algorithm A:

• CA(n)=m if algorithm A uses m evaluations to determine that n is the solution to the unbounded search problem

0 j < nX[j] =

1 j n

Applications of Unbounded Search

1) Table Look-up in an ordered, infinite table2) Binary encoding of integers

if Sn represents integer n,then Sn is not a prefix of any Sj for n j{S1, S2, …} called a prefix set

3) Idea: use Sn (b1, b2, …, bCA(n))

where bm = 1if the m’th evaluation of X is 1in algorithm A for unbounded search

Unary Unbounded Search Algorithm

• Algorithm B0

• Try X[1], X[2], …, until X[n] = 1

• Cost CB0(n) = n

Binary Unbounded Search Algorithm

• Algorithm B1

i

m

m-1 m

m-1

m-1

try X[2 -1] for i=1,2,...,m

until X[2 -1]=1

( m = logn 1 where 2 n 2 1)

2 binary search over 2 elements

log(2 ) = m-1= logn

1st stage

cost

nd stage

cost

To

1B C (n) = 2 logn 1tal Cost

Binary Unbounded Search Algorithm (cont’d)

Double Unbounded Search Search

• Algorithm B2

m1 1

1

(2 1) (2 1)

1

B 1 1

1

try X[2 -1], ..., X[2 ] = 1

where m = logn 1

(cost is C (m ) = 2 logm 1)

2 same as 2nd stage of B after m was found.

1st stage

nd stage

0

2 1 0

B

B B 1 B

Cost is C (n) = m-1= logn

C (n) = C (m ) + C (n)

= 2( log ( logn+1) 1 ) logn

Total Cost

Double Unbounded Search Algorithm (cont’d)

Ultimate Unbounded Search Algorithm

Search Algorithms

Prepared by

John Reif, Ph.D.

Analysis of Algorithms

search algorithms prepared by john reif, ph.d. analysis of algorithms

Documents