quicksort average case analysis an incompressibility approach brendan lucier august 2, 2005

27
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005

Upload: mauricio-rowberry

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

QuickSort Average Case Analysis

An Incompressibility Approach

Brendan Lucier

August 2, 2005

Outline

Introduction to QuickSort Overview of Argument The Main Result

Outline

Introduction to QuickSort Overview of Argument The Main Result

The QuickSort Algorithm

QuickSort( Array A )If |A| = 0 return

Let p = A[1]

Let B = ( x in A, x < p ) in stable order

Let C = ( x in A, x > p ) in stable order

QuickSort(B)

QuickSort(C)

A = B p C

Sorting Algorithm We assume input is a permutation We use Deterministic Quicksort

Pivot is always selected as first element in permutation

QuickSort -- An Example

3 6 5 7 1 2 8 4

QuickSort -- An Example

3 6 5 7 1 2 8 4

3

QuickSort -- An Example

6 5 71 2 8 4

3

QuickSort -- An Example

6 5 71 2 8 4

3

1 6

QuickSort -- An Example

4 72 85

3

1 6

QuickSort -- An Example

4 72 85

3

1 6

2 5 7

QuickSort -- An Example

4 8

3

1 6

2 5 7

QuickSort -- An Example

3

1 6

2 5 7

4 8

4 8

QuickSort -- An Example

3

1 6

2 5 7

4 8

We call this the QuickSort Tree for the permutation (3,6,5,7,1,2,8,4)

Properties of QuickSort Trees Each node in the tree represents an element as it is chosen to be a pivot. Let T(y) be the subtree of descendents of y, and let R(y) = |T(y)|. QuickSort uses fewer than ΣR(y) comparisons. We note that ΣR(y) is less than n times the height of the QuickSort Tree. To show that QuickSort runs in O(nlogn) time on average, we just need

to show that QuickSort trees have average height O(logn).

3

1 6

2 5 7

4 8

(3,6,5,7,1,2,8,4)

Outline

Introduction to QuickSort Overview of Argument The Main Result

Heuristic Argument Each pivot is equally likely to be any element in its range. Call a pivot balanced if it occurs in the middle half of its range. We would expect

that half of pivots are balanced. Take a root to leaf path in a QuickSort tree. We expect half of the nodes in the path

are balanced, hence split their range in the ratio (1/4,3/4) at most. Thus the range of nodes along our path is reduced to at most 3/4 of its previous

size when a balanced node occurs. This means that at most log4/3n = dlogn balanced nodes can be on a path.

Since half of nodes are expected to be balanced, we expect that a path will have length 2dlogn = O(logn).

NOTE: This argument does NOT give us the result we want, since it only talks about expected path length. We need an argument that shows that the average maximum path length is O(logn).

Main Idea Suppose we have a QuickSort tree with a long path,

say length k. At most dlogn of the nodes are balanced. Half of the values for a pivot are balanced, the other

half are not. Thus, knowing if a pivot is balanced or unbalanced is worth a bit of information.

We could encode balanced/unbalanced information for each pivot on our path; this is worth k bits of information. Say we encode as a binary string, with 1 meaning balanced, 0 meaning unbalanced.

But we know that at most dlogn nodes are balanced, so our string has at most dlogn 1’s. If we assume that our path is quite long (k >> 2dlogn) then this string has far fewer 1’s than 0’s, and is therefore very compressible.

String 010010100

Specifying a Path

We want to compress a path, but we need to somehow encode which path we are compressing.

We could use a bit for each left / right choice (0 = left, 1 = right) but this adds too much extra information.

Instead, we will use the same trick as before: arrange our choices so that a sufficiently long path must have very few instances of one choice.

Encode the path as: (0 = follow the larger subtree, 1 = follow the smaller subtree). Then each 1 causes the range to fall by at least 1/2, so a path can have only logn 1’s. If our path has length k >> 2logn, this path encoding will have far more 0s than 1s,

and can therefore be very heavily compressed. We can choose k large enough so that specifying the path AND its

balanced/unbalanced information can still be done in fewer than k bits, so we save bits overall.

Outline

Introduction to QuickSort Overview of Argument The Main Result

The Main Result

Lemma: There is a constant c so that if π is logn-incompressible, then the QuickSort tree for π has height less than clogn. Proof: The rest of this presentation

Corollary: The average height of QuickSort trees is O(logn) Proof: We know that only 2-logn = 1/n of all permutations are logn-

compressible. These could all have height as high as n. The rest must have height < clogn from the lemma. So the average height is bounded by (1-1/n)clogn + (1/n)(n) < clogn + 1 = O(logn) as required.

Encoding Permutations We want to encode permutations in a way that uses the recursive

structure of QuickSort trees. Here is a recursive encoding scheme for a permutation of length n:

Specify the pivot, p. Specify the locations of all values less than the pivot. Encode the two sub-permutations.

There are n options for the first value, (n-1 choose p-1) for the second value, and (p-1)!, (n-p)! choices for the two subpermutations.

The total encoding length is then logn + log(n-1 choose p-1) + log(p-1)! + log(n-p)!

= log[ n(n-1 choose p-1)(p-1)!(n-p)! ]

= log(n!) If we encode the sub-permutations recursively, we get the same

result by induction.

Encoding Permutations (con’t)

Take a path Y = (y1, …, yclogn) in the QuickSort tree. Suppose we know whether each yi is balanced or not.

Now modify E(π) so that whenever the pivot is yi for some i, we use one fewer bits to represent yi. If yi is balanced, we index yi among the balanced values.

Otherwise we index yi among the unbalanced values.

Then the total length of E(π) is log(n!) - clogn. All that is left to do is specify the path Y and specify the

balanced/unbalanced information.

Compressing Sparse Strings

Lemma: Suppose binary string x of length n has at most tn 1’s, where t < 1/2. Then x can be represented in H(t)n + O(logn) bits.

Sketch Proof: Encode the number of 1s in x, then encode the locations of those 1s. Some manipulation with Sterling’s Approximation yields the desired result. QED.

In particular, if |x| = clogn and n1(x) < dlogn where c > 2d, then x can be encoded in H(d/c)clogn + O(loglogn) bits. Note that this encoding is self-delimiting if we know n.

Encoding Balance Information

Our encoding of π requires that we provide balance information about Y, in addition to E(π).

Let x be the string whose ith bit is 1 iff yi is balanced. Let z be the string whose ith bit is 1 iff yi+1 is in the smaller range of yi. Then |xz|=2clogn. Recall that n1(z) ≤ logn and n1(x) ≤ dlogn, so n1(xz) ≤ (d+1)logn.

Therefore |E(xz)| ≤ H( )2clogn + O(loglogn).d+1 2c

We now give a full encoding of a permutation. Encode the permutation as E(xz)E(π), which has length at most

log(n!) - [c(1-2H( )]logn + O(loglogn). Simply take c large enough that [c(1-2H( )] > 1. Then we have

C(π | n, p ) ≤ |E(xz)E(π)| ≤ log(n!) - logn. The program p extracts π by decoding E(xz), retrieving x and z, then

decoding E(π) by using z to find the values yi and using x to interpret the encodings of the yis.

Thus π is logn-compressible if the QuickSort tree for π has height at least clogn for sufficiently large c.

We conclude that QuickSort trees have average height O(logn), and hence the QuickSort algorithm runs in average time O(nlogn).

Fully Specifying a Permutation

d+1 2c

d+1 2c

Summary

We prove an average-case upper bound for QuickSort by analyzing the average height of a QuickSort Tree.

Our approach was to separate balance information for a long path from the rest of the encoding, then heavily compress the balance information.

The compression works because a long path must have fewer balanced than unbalanced nodes.

Fin

Thank You