randomized algorithms: quicksort and selection€¦ · randomized quicksort is arandomized...

30
Randomized Algorithms: Quicksort and Selection Version of October 2, 2014 Randomized Algorithms: Quicksort and Selection Version of 1 / 30

Upload: others

Post on 07-Jul-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Randomized Algorithms: Quicksort and SelectionVersion of October 2, 2014

Randomized Algorithms: Quicksort and Selection Version of October 2, 20141 / 30

Outline

Outline:

Quicksort

Average-Case Analysis of QuickSortRandomized quicksort

Selection

The selection problemFirst solution: Selection by sortingRandomized Selection

Randomized Algorithms: Quicksort and Selection Version of October 2, 20142 / 30

Quicksort: Review

Quicksort(A, p, r)

beginif p < r then

q = Partition(A, p, r);Quicksort(A, p, q − 1);Quicksort(A, q + 1, r);

end

end

Partition(A, p, r) reorders items in A[p . . . r ];items < A[r ] are to its left; items > A[r ] to its right.

Showed that if input is a random input (permutation) of nitems, then average running time is O(n log n)

Randomized Algorithms: Quicksort and Selection Version of October 2, 20143 / 30

Average Case Analysis of Quicksort

Formally, the average running time can be defined as follows:

In is the set of all n! inputs of size nI ∈ In is any particular size-n inputR(I ) is the running time of the algorithm on input I

Then, the average running time over the random inputs is∑I∈In

Pr(I )R(I ) =1

n!

∑I∈In

R(I ) = O(n log n)

Only fact that was used was that A[r ] was a random item inA[p . . . r ], i.e., the partition item is equally likely to be anyitem in the subset.

Randomized Algorithms: Quicksort and Selection Version of October 2, 20144 / 30

Outline

Outline:

Quicksort

Average-Case Analysis of QuickSortRandomized Quicksort

Selection

The selection problemFirst solution: Selection by sortingRandomized Selection

Randomized Algorithms: Quicksort and Selection Version of October 2, 20145 / 30

Randomized-Partition(A, p, r)

Idea:

In the algorithm Partition(A, p, r), A[r ] is always used as thepivot x to partition the array A[p..r ]

In the algorithm Randomized-Partition(A, p, r), we randomlychoose j , p ≤ j ≤ r , and use A[j ] as pivot

Idea is that if we choose randomly, then the chance that weget unlucky every time is extremely low.

Randomized Algorithms: Quicksort and Selection Version of October 2, 20146 / 30

Randomized-Partition(A, p, r)...

Let random(p, r) be a pseudorandom-number generator thatreturns a random number between p and r

Randomized-Partition(A, p, r)

beginj = random(p, r);exchange A[r ] and A[j ];Partition(A, p, r);

end

Randomized Algorithms: Quicksort and Selection Version of October 2, 20147 / 30

Randomized-Quicksort Algorithm

We make use of the Randomized-Partition idea to develop a newversion of quicksort

Randomized-Quicksort(A, p, r)

beginif p < r then

q = Randomized-Partition(A, p, r);Randomized-Quicksort(A, p, q − 1);Randomized-Quicksort(A, q + 1, r);

end

end

Randomized Algorithms: Quicksort and Selection Version of October 2, 20148 / 30

Running Time of Randomized-Quicksort

Let I ∈ In be any input.

The running time R(I ) depends upon the random choicesmade by the algorithm in the step

random(p, r); exchange A[r ] and A[j ]

This can be different for different random choices.

We are actually interested inE (R(I )), the Expected (average) Running Time (ERT)

average now is not over the input, which is fixedaverage is over the random choices made by the algorithm.

Randomized Algorithms: Quicksort and Selection Version of October 2, 20149 / 30

Running Time of Randomized-Quicksort

Let I ∈ In be any input.

Want E (R(I )), the Expected Running Time, where average istaken over random choices of algorithm.

Suprisingly, we can use almost exactly the same analysis that weused for the average-case analysis of Quicksort. Recall that onlyfacts that we used were

Item used as a pivot is random among all items

this statement is true in all subproblems as well.

Those two facts are still valid here, so the expected running timestill satisfies

Cn = n − 1 +1

n

∑1≤k≤n

(Ck−1 + Cn−k)

which we already proved was O(n log n).

Randomized Algorithms: Quicksort and Selection Version of October 2, 201410 / 30

Running Time of Randomized-Quicksort

Just saw that for any fixed input of size n, ERT is O(n log n)

Randomized Quicksort is a Randomized AlgorithmMakes Random choices to determine what algorithm does nextWhen rerun on same input, algorithm can make differentchoices and have different running timesRunning time of Randomized Algorithm is worst case ERTover all inputs I . In our case

maxI∈In

E [R(I )] = O(n log n)

Contrast with Average Case AnalysisWhen rerun on same input, algorithm always does same things,so R(i) is deterministic.Given a probability distribution on inputs, calculate averagerunning time of algorithm over all inputs∑

I∈In

Pr(I )R(I )

Randomized Algorithms: Quicksort and Selection Version of October 2, 201411 / 30

Outline

Outline:

Quicksort

Average-Case Analysis of QuickSortRandomized Quicksort

Selection

The Selection problemFirst solution: Selection by sortingRandomized Selection

Randomized Algorithms: Quicksort and Selection Version of October 2, 201412 / 30

The Selection Problem

Definition (Selection Problem)

Given a sequence of numbers 〈a1, . . . , an〉, and an integer i ,1 ≤ i ≤ n, find the ith smallest element. When i = dn/2e, this iscalled the median problem.

Example

Given 〈1, 8, 23, 10, 19, 33, 100〉, the 4th smallest element is 19.

Question

How can this problem be solved efficiently?

Randomized Algorithms: Quicksort and Selection Version of October 2, 201413 / 30

Outline

Outline:

Quicksort

Average-Case Analysis of QuickSortRandomized quicksort

Selection

The Selection problemFirst solution: Selection by sortingRandomized Selection

Randomized Algorithms: Quicksort and Selection Version of October 2, 201414 / 30

First Solution: Selection by Sorting

1 Sort the elements in ascending order with any algorithm ofcomplexity O(n log n).

2 Return the ith element of the sorted array.

The complexity of this solution is O(n log n)

Question

Can we do better?

Answer: YES, by using Randomized-Partition(A, p, r)!

Randomized Algorithms: Quicksort and Selection Version of October 2, 201415 / 30

Outline

Outline:

Quicksort

Average-Case Analysis of QuickSortRandomized quicksort

Selection

The Selection problemFirst solution: Selection by sortingRandomized Selection

Randomized Algorithms: Quicksort and Selection Version of October 2, 201416 / 30

Randomized-Select(A, p, r , i), 1 ≤ i ≤ r − p + 1

Problem: Select the ith smallest element in A[p..r ], where1 ≤ i ≤ r − p + 1Solution: Apply Randomized-Partition(A, p, r), getting

p q r

k =kth element

q−p+1

1 i = kpivot is the solution

2 i < kthe ith smallest element in A[p..r ] must be the ith smallestelement in A[p..q − 1]

3 i > kthe ith smallest element in A[p..r ] must be the (i − k)thsmallest element in A[q + 1..r ]

If necessary, recursively call the same procedure to the subarrayRandomized Algorithms: Quicksort and Selection Version of October 2, 201417 / 30

Randomized-Select(A, p, r , i), 1 ≤ i ≤ r − p + 1

if p = r thenreturn A[p]

endq = Randomized-Partition(A, p, r) ;k = q − p + 1 ;if i = k then return A[q];// the pivot is the answer

else if i < k thenreturn Randomized-Select(A, p, q − 1, i)

elsereturn Randomized-Select(A, q + 1, r , i − k)

end

To find the ith smallest element in A[1..n], callRandomized-Select(A, 1, n, i)

Randomized Algorithms: Quicksort and Selection Version of October 2, 201418 / 30

Running Time of Randomized-Select(A, 1, n, i)

Recall that if pivot q is kth item in order, then algorithm is

If i = k , stop. If i < k ⇒ A[p..q − 1]. If i > k ⇒ A[q + 1..r ].

Let m = p − r + 1.

Note that if k = p + bm2 c was always true, this would halve theproblem size at every step and the running time would be at most

n +n

2+

n

22+

n

23+ . . . = n

(1 +

1

2+

1

22+

1

23

)≤ 2n

This isn’t a realistic analysis because q is chosen randomly, so k isactually random number between p..r .

Randomized Algorithms: Quicksort and Selection Version of October 2, 201419 / 30

Running Time of Randomized-Select(A, 1, n, i)

Recall that if pivot q is kth item in order then algorithm is

If i = k , stop. If i < k ⇒ A[p..q − 1]. If i > k ⇒ A[q + 1..r ].

Let m = p − r + 1.

Suppose that we could guarantee that p + m4 ≤ k ≤ p + 3

4m.

This would be enough to force linearity because the recursive callwould always be to a subproblem of size ≤ 3

4m and the runningtime of the entire algorithm would be at most

n +3

4n +

(3

4

)2

n +

(3

4

)3

n + . . . ≤ 4n

Randomized Algorithms: Quicksort and Selection Version of October 2, 201420 / 30

Running Time of Randomized-Select(A, 1, n, i)

Set m = p − r + 1. We saw that if

p +m

4≤ k ≤ p +

3

4m

then algorithm is linear.

While this is not always true, we can easily see that

Pr

(p +

m

4≤ k ≤ p +

3

4m

)≥ 1

2.

This means that each stage of the algorithm has probability atleast 1/2 of reducing the problem size by 3/4.A careful anlysis will show that this implies an O(n) expectedrunning time.

Randomized Algorithms: Quicksort and Selection Version of October 2, 201421 / 30

Running Time of Randomized-Select(A, 1, n, i)

More formally, suppose t’th call to the algorithm is A(pt , rt , it).Let Mt = rt − pt + 1 be size of array in the subproblem andkt location of the random pivot in that subarray. Note

p1 = 1, r1 = n, M1 = n

Mt+1 ≤ Mt − 1

Total cost of the algorithm is bounded by∑

t Mt

Set Et to be event that is true if

pt +Mt

4≤ kt ≤ pt +

3

4Mt ,

and false otherwise. Then

Pr(Et) ≥ 1/2If Et occurs then Mt+1 ≤ 3

4Mt .

Randomized Algorithms: Quicksort and Selection Version of October 2, 201422 / 30

Running Time of Randomized-Select(A, 1, n, i)

Recall thatM1 = n; Mt+1 ≤ Mt − 1; If Et ⇒ Mt+1 ≤ 3

4Mt .

Note that Et is undefined after the algorithm ends, i.e., Mt ≤ 1. For

larger t, define Et by flipping fair coin and setting Et True if HEAD seen.

Now define M ′t as follows

M ′1 = n

If Et ⇒ M ′t+1 = 34M′t . If (not Et) ⇒ M ′t+1 = M ′t .

Then ∀t, Mt ≤ M ′t .

In particular, since∑

t Mt bounds the algorithm’s runtime,∑t M′t also bounds the algorithm’s runtime!

Randomized Algorithms: Quicksort and Selection Version of October 2, 201423 / 30

Review of Geometric Random Variables

Consider a p-biased coin, i.e., a coin with with probability p ofturning up Heads and (1− p) of Tails.

Let X be the number of flips until seeing the first Head

X is a Geometric Random Variable with parameter p

Pr(X = i) = (1− p)i−1p

E (X ) = 1p

In particular, if the coin is fair, i.e., p = 1/2, then E (X ) = 2

If at every step the coin probability can change,BUT the probability of Heads is always ≥ 1/2,

then E (X ) ≤ 2.

In this case we say X is bounded by a geometric randomvariable with p = 1/2

Randomized Algorithms: Quicksort and Selection Version of October 2, 201424 / 30

Running Time of Randomized-Select(A, 1, n, i)

Given sequence of events E1,E2,E3, . . . with ∀t, Pr(Et) ≥ 1/2

Set Z0 = 1 and Zi to be the location of the ith true Et .

Set Xi = Zi+1 − Zi .

Xi is time from Zi until next success so it is bounded by ageometric random variable with p = 1/2.⇒ Then E (Xi ) ≤ 2

Recall M1 = n; If Et , set Mt+1 = 34Mt . Else Mt+1 = Mt .

Then∑

t M′t =

∑i Xi

(34

)in (why)

By linearity of expectation

E

(∑t

M ′t)

)=∑i

E (Xi )

(3

4

)i

n ≤ 2n∑i

(3

4

)i

= 8n

QED

Randomized Algorithms: Quicksort and Selection Version of October 2, 201425 / 30

Running Time of Randomized-Select(A, 1, n, i)

Worst Case:

T (n) = n − 1 + T (n − 1),T (n) = O(n2).

Expected Running Time:

O(n)

Expected running time much better than worst case!

Randomized Algorithms: Quicksort and Selection Version of October 2, 201426 / 30

Randomized Quicksort vs Randomized Selection

Question

Why does Randomized Selection take O(n) time while RandomizedQuicksort takes O(n log n) time?

Answer:

Randomized Selection needs to work on only one of the twosubproblems.

Randomized Quicksort needs to work on both of the twosubproblems.

Randomized Algorithms: Quicksort and Selection Version of October 2, 201427 / 30

Epilogue

How do we generate a random number?

Dice, coin flipping, roulette wheels, ...

How does a computer generate a random number?

By hardware: electronic noise, thermal noise, etc. Expensivebut “true” random numbers in some sense

By software: pseudorandom numbers. A long sequence ofseemingly random numbers whose pattern is difficult to find

Pseudorandom numbers are good enough for mostapplications

Randomized Algorithms: Quicksort and Selection Version of October 2, 201428 / 30

Another Analysis of the Running Time ofRandomized-Select(A, 1, n, i)

T (n): upper bound on the expected number of comparisons madeby Randomized-Select(A, 1, n, i) for any iT (1) = 0For n > 1, we get

T (n) ≤ n initial partition+∑n

k=1

(1n · T (max{k − 1, n − k}

)) recursion, assume the bad case

T (n) ≤ n +2

n

n−1∑k=bn/2c

T (k)

Which is a complicated recurrence!We use the guess & induction methodGuess:

T (n) ≤ c n, for all n

for some constant c to be figured out later.Randomized Algorithms: Quicksort and Selection Version of October 2, 201429 / 30

Proof that T (n) ≤ c n

Induction step: Assume that T (m) ≤ c m for all m ≤ n − 1. Thentry to show T (n) ≤ cn:

T (n) ≤ n +2

n

n−1∑k=bn/2c

T (k)

≤ n +2

n

n−1∑k=bn/2c

ck

· · ·≤ 3c

4n +

c

2+ n

We want 3c4 n + c

2 + n ≤ cn, or n ≥ 2cc−4 .

If we choose c ≥ 12. Then the induction step works for n ≥ 3.Induction basis: T (1) ≤ c · 1, T (2) ≤ c · 2.So if we choose c = max{12,T (1),T (2)/2}, then the entire proofworks.

Randomized Algorithms: Quicksort and Selection Version of October 2, 201430 / 30