search algorithms prepared by john reif, ph.d. analysis of algorithms
TRANSCRIPT
Search Algorithms
Prepared by
John Reif, Ph.D.
Analysis of Algorithms
Search Algorithms
a) Binary Search: average caseb) Interpolation Searchc) Unbounded Search (Advanced material)
Readings
• Reading Selection:• CLR, Chapter 12
Binary Search Trees
• (in sorted Table of) keys k0, …, kn-1
• Binary Search Tree property:at each node x
key (x) > key (y) y nodes on left subtree of x
key (x) < key (z) z nodes on right subtree of x
Binary Search Trees (cont’d)
Binary Search Trees (cont’d)
• Assume1) Keys inserted into tree in random order2) Search with all keys equally likely
length = # of edges n + 1 = number of leaves
• Internal path length I= sum of lengths of all internal paths of length > 1 (from root to nonleaves)
Binary Search Trees (cont’d)
• External path length E= sum of lengths of all external
paths (from root to leaves) = I + 2n
Successful Search
• Expected # comparisons
n
n-1
i
0
C = 1 + (I/n)
= (C' 1) / ni
Unsuccessful Search
• Expected # comparisons
n
n
n-1
i
i=0
n
i=0
C' = E/(n+1) = (I+2n)/(n+1)
= (n C +n)/(n+1)
= (C' 2) / (n+1)
2 = 2 ln(n)
(i+1)
= 1.386 log n
Model of Random Real Inputs over an Interval
• InputSet S of n keys each independently randomly chosen over real interval |L,U| for 0 < L < U
• Operations• Comparison operations , operations:
r =largest integer below or equal to r r = smallest integer above of equal to r
Sorting and Selection with Random Real Inputs
• Results 1) Sorting in 0(n) expected time
2) Selection in 0(loglogn) expected time
Bucket Sorting with Random Inputs
ii
i 1 n B[i] empty list
n(x -L) i 1 n add x to B 1
(U-L)
i 1 to n do sort (B[i])
B[1] B[2] B[n]
begin
for to do
for to do
for
output
end
Input set of n keys, S randomly chosen over [L,U]
AlgorithmBUCKEY-SORT(S):
Bucket Sorting with Random Inputs (cont’d)
• Theorem The expected time T of BUCKET-SORT is 0(n)
• Generalizes to case keys have nonuniform distribution
-j
n-j
j=0
|B[i]| is 1
variable with parameters n, p = n
Hence c>1 i,j Prob {|B[i]| > j} < c
So T n c (jlogj) = O(n)
Proofupper bounded by a Binomial
Random Search Table
• Table X = (x0 < x1 < …< xn < xn+1) where x1, …, xn random reals chosen independently from real interval (x0, xn+1)
• Selection ProblemInput key Y
Problem find index k* with Xk* = Y
Note k* has Binomial distribution with parameters n,p = (Y-X0)/(Xn+1-X0)
AlgorithmPSEUDO INTERPOLATION-SEARCH (X,Y)
Random TableX = (X0 , X1 , …, Xn , Xn+1)
Algorithmpseudo interpolation search (X,Y)
[0] k pn where p = (Y-X0)/(Xn+1-X0)
[1] if Y = Xk then return k
Algorithm PSEUDO INTERPOLATION-SEARCH (X,Y) (cont’d)[2] If Y > Xk then
output pseudo interpolation search (X’,Y)
where
k'+ n
k' = k, k+ n , k+2 n ,...
Y < X exit with
for
if then
k' k'+ nX' = (X ,...,X )
Algorithm PSEUDO INTERPOLATION-SEARCH (X,Y) (cont’d)[3] Else if Y < Xk then
output pseudo interpolation search (X’,Y)
where
k'- n
k' = k, k- n , k-2 n ,...
Y > X exit with
for
if then
k'k'- nX" = (X ,..., X )
Probability Distribution of Pseudo Interpolation Search
Probabilistic Analysis of Pseudo Interpolation Search
• k* is Binomial with mean pn variance 2 = p(1-p)n
• So approximates normal as n
• Hence
*k pn
*k pnProb Z (Z)/Z
2-Z
2ewhere (Z) =
2
Probabilistic Analysis of Pseudo Interpolation Search (cont’d)• So Prob ( i probes used in given call)
• where
*
i i
< Prob(|k pn | > (i - 2) n )
(Z )/Z
i
(i - 2) n (i - 2)Z = 2(i - 2)
p(i-p)
1 since p(1-p)
4
Probabilistic Analysis of Pseudo Interpolation Search (cont’d)• Lemma C 2.03 where
C = expected number of probes
in given call
i>1
i>1
i ii 3
C = i Prob(i probes used)
= Prob( i probes used)
2 (Z )/Z 2.03
proof
Probabilistic Analysis of Pseudo Interpolation Search (cont’d)• Theorem
Pseudo Interpolation Search has expected time
T(n) C + T( n )
C loglogn
proof
T(n) C loglogn
Algorithm INTERPOLATION-SEARCH (X,Y)
1) Initialize k np comment k = E(k*) 2) If Xk = Y then return k
3) If Xk < Y then
output INTERPOLATION-SEARCH (X’,Y) where X’ = (xk, …, xn+1 )
4) Else Xk > Y and
output INTERPOLATION-SEARCH (X”,Y) where X” = (x0, …, xk )
Probability Distribution of INTERPOLATION-SEARCH (X,Y)
Advanced Material: Probabilistic Analysis of Interpolation Search
• Lemma
• ProofSince k* is Binomial with parameters p,n
*α
1Prob(|k pn | 0( nlogn))
nwhere α is constant
2-Z
2*
α
2
2 e 1Prob(|k pn| Zσ)
nZ 2
for = p(1-p)n and Z = 0( logn )
Probabilistic Analysis of Interpolation Search (cont’d)
• Theorem• The expected number of comparisons
of Interpolation Search is
21T(n) loglogn + c (logloglogn)
Probabilistic Analysis of Interpolation Search (cont’d)
• Proof
α α
21
21
21
1 nT(n) 1+ (1- ) T (O( n log n ))
n n
1 loglog n log n + c logloglog n log n ) o(1)
1 1 log logn + c (logloglogn)
2
loglogn + c (logloglogn) since log2=1
Advance Material:Unbounded Search
• Input table X[1], X[2], …where for j = 1, 2, …
• Unbounded Search Problem• Find n such that X[n-1] = 0 and
X[n]=1• Cost for algorithm A:
• CA(n)=m if algorithm A uses m evaluations to determine that n is the solution to the unbounded search problem
0 j < nX[j] =
1 j n
Applications of Unbounded Search
1) Table Look-up in an ordered, infinite table2) Binary encoding of integers
if Sn represents integer n,then Sn is not a prefix of any Sj for n j{S1, S2, …} called a prefix set
3) Idea: use Sn (b1, b2, …, bCA(n))
where bm = 1if the m’th evaluation of X is 1in algorithm A for unbounded search
Unary Unbounded Search Algorithm
• Algorithm B0
• Try X[1], X[2], …, until X[n] = 1
• Cost CB0(n) = n
Binary Unbounded Search Algorithm
• Algorithm B1
i
m
m-1 m
m-1
m-1
try X[2 -1] for i=1,2,...,m
until X[2 -1]=1
( m = logn 1 where 2 n 2 1)
2 binary search over 2 elements
log(2 ) = m-1= logn
1st stage
cost
nd stage
cost
To
1B C (n) = 2 logn 1tal Cost
Binary Unbounded Search Algorithm (cont’d)
Double Unbounded Search Search
• Algorithm B2
m1 1
1
(2 1) (2 1)
1
B 1 1
1
try X[2 -1], ..., X[2 ] = 1
where m = logn 1
(cost is C (m ) = 2 logm 1)
2 same as 2nd stage of B after m was found.
1st stage
nd stage
0
2 1 0
B
B B 1 B
Cost is C (n) = m-1= logn
C (n) = C (m ) + C (n)
= 2( log ( logn+1) 1 ) logn
Total Cost
Double Unbounded Search Algorithm (cont’d)
Ultimate Unbounded Search Algorithm
Search Algorithms
Prepared by
John Reif, Ph.D.
Analysis of Algorithms