sorting: implementation 15-211 fundamental data structures and algorithms klaus sutner february 24,...

33
Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Upload: theodora-roxanne-boone

Post on 17-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Sorting: Implementation

15-211

Fundamental Data Structures and Algorithms

Klaus Sutner

February 24, 2004

Page 2: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Announcements

Homework #5

Midterm

March 4

Review: March 2

Page 3: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Today

- Recall Sorting

- Implementation Issues

- Average case RT for quicksort

- Timing Results

Page 4: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Total Recall: Sorting Algorithms

Page 5: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

The Bible

Robert Sedgewick

Algorithms in CParts 1-4Fundamentals, Data Structures, Sorting, Searching

Addison-Wesley 1998

Page 6: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Multiple Keys

We could use a special comparator function (this would require a special function for each combination of keys).

Easier is often to

- first sort by name- stable sort by year

Done!

Page 7: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Sorting Review

Several simple, quadratic algorithms (worst case and average).

- Bubble Sort- Selection Sort- Insertion Sort

Only Insertion Sort of practical interest: running time linear in number of inversion of input sequence.

Constants small. Also stable.

Page 8: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Sorting Review

Asymptotically optimal O(n log n) algorithms (worst case and average).

- Merge Sort- Heap Sort

Merge Sort purely sequential and stable.

But requires extra memory: 2n + O(log n).

Page 9: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Quick Sort

Overall fastest. In place.

BUT:

Worst case quadratic.

Not stable.

Implementation details messy.

Page 10: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Picking An Algorithm

First Question: Is the input short?

Short means something like n < 500.

In this case Insertion Sort is probably the best choice.

Don't bother with asymptotically faster methods.

Page 11: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Picking An Algorithm

Second Question: Does the input have special properties?

E.g., if the number of inversions is small, Insertion Sort may be the best choice.

Or linear sorting methods may be appropriate.

Page 12: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Otherwise: Quick Sort

Large inputs, comparison based method, stability not required (recall our stabilizer trick, though).

Quick Sort is worst case quadratic, why should it be the default candidate?

On average, Quick Sort is O(n log ), and the constants are quite small.

Page 13: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Average ???

Average case analysis requires a probability distribution on the inputs: we have to average the running times.

t(n) = px t(x)

where the sum is over all instances of size n and px is the probability of getting instance x.

Often simply assume uniform distribution: every instance (of a certain size) is equally likely.

Page 14: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

A Computation

Can we write down a recurrence equation?

Can we solve the equation?

At least approximately?

Is the solution (if any) practically relevant?

(see handout from last time)

Page 15: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Implementing Quick Sort

Page 16: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Pivot Selection

Ideally, the pivot should be the median.

Much too slow to be of practical value.

Instead either

- pick the pivot at random, or

- take the median of a small sample.

Page 17: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Partitioning

Partitioning is easy if we use extra scratch space. But we would like to partition in place.

Need to move elements within the same given block of the big array.

Basic idea: use two pointers, sweep across block from left and right till an out-of-place element is encountered. Swap them.

Page 18: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

1. Doing quicksort in place

85 24 63 50 17 31 96 45

85 24 63 45 17 31 96 50

L R

85 24 63 45 17 31 96 50

L R

31 24 63 45 17 85 96 50

L R

Page 19: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

1. Doing quicksort in place

31 24 63 45 17 85 96 50

L R

31 24 17 45 63 85 96 50

R L

31 24 17 45 50 85 96 63

31 24 17 45 63 85 96 50

L R

Page 20: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Pseudo Code

i = lo – 1; j = hi;while( true ) {

while( A[++i] < p );while( p < a[--j] ) if( j==lo ) break;if( i >= j ) break;swap( i, j );

}swap( i, hi );return i;

Page 21: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Getting Out

Using Quick Sort on very short arrays is a bad idea: the overhead becomes too large.

So, when the block becomes short we should exit Quick Sort and switch to Insertion Sort.

But not locally:

quicksort( A, lo, hi ) {if( hi – lo < magic_number )

insertionsort( A, lo, hi );else …

Page 22: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Getting Out

Just do nothing when the block is short. Then do one global cleanup with insertion sort.

quicksort( A, 0, n ) insertionsort( A, 0, n );

This is linear, since the number of inversions is linear.

Page 23: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Magic Number

The best way to determine the magic number is to run real-world tests.

It seems that for current architectures, some value in the range 5 to 20 will work best.

Page 24: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Equal Elements

Note that ideally pivoting should produce three sub-blocks:

left: < pmiddle: == pright: > p

Then the recursion could ignore the middle part, possibly omitting many elements.

Page 25: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Equal Elements

Three natural strategies:

Both pointers stop.Only one pointer stops.Neither pointer stops.

Fact: The first strategy works best overall.

Page 26: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Equal Elements

There are clever implementations that partition into three sub-blocks.

This is amazingly hard to get both right and fast.

Try it!

Page 27: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Application:Quick Select

Page 28: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Selection (Order Statistics)

A classical problem: given a list, find the k-th element in the ordered list.

The brute-force approach sorts the whole list first, and thus produces more information than required.

Can we get away with less than n log n work (in a comparison based world)?

Page 29: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Easy Cases

Needless to say, when k is small there are easy answers.

- Scan the array and keep track of the k smallest.

- Use a Selection Sort approach.

But how about general k?

Page 30: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Selection and Partitioning

qselect( A, lo, hi, k ) {if( hi <= lo ) return;i = partition( A, lo, hi );if( i > k ) qselect( A, lo, i-1, k );if( i < k ) qselect( A, i+1, hi, k );

}

This looks like a typo.

What’s really going on here?

Page 31: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Quick Select

What should we expect as running time?

As usual, if there is a ghost in the machine, it could force quadratic behavior.

But on average this algorithm is linear.

Don’t get any ideas about using this to find the median in the pivoting step of Quick Sort!

Page 32: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

Some Timing Results

Page 33: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004

The Real World

Beyond asymptotic analysis, it is always a good idea to do some real world testing.

Construct a small test-bed:

- automate testing- flexible but simple- organize the data in a useful way