sorting and lower bounds 15-211 fundamental data structures and algorithms ananda guna january 27,...
TRANSCRIPT
![Page 1: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/1.jpg)
Sorting and Lower bounds
15-211 Fundamental Data Structures and Algorithms
Ananda Guna
January 27, 2005
![Page 2: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/2.jpg)
Recap
![Page 3: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/3.jpg)
Sorting Comparison
We can categorize sorting algorithms into two major classes
Fast Sorts versus Slow Sorts
O (N log2 N) O (N2)
slow sorts are easy to code and sufficient when the amount of data is smallN n*n N * log(N)
10 100 33100 10,000 664
1,000 1,000,000 9,96610,000 1,000,000,000 132,877
100,000 10,000,000,000 1,660,964
![Page 4: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/4.jpg)
Basic Sorting Algorithms
Bubble Sortfix flips until no more flips
Insertion Sort Insert a[i] to the sorted array [0…i-1]
AdvantagesSimple to implemen
Good for small data sets
DisadvantagesO(n2) algorithms
![Page 5: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/5.jpg)
Recursive Sorting Algorithms
QuickSort
Average case – O(n log n)
Worst Case – O(n2)
Merge Sort
All cases O(n log n)
Extra Memory – O(n)
![Page 6: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/6.jpg)
Comparison Chart
Almost sorted
In reverse order
Random order
All equal
bubble
insertion
merge
quicksort
![Page 7: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/7.jpg)
Analysis of recursive sorting
Suppose it takes time T(N) to sort N elements.
Suppose also it takes time N to combine the two sorted arrays.
Then:
T(1) = 1
T(N) = 2T(N/2) + N, for N>1
Solving for T gives the running time for the recursive sorting algorithm.
![Page 8: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/8.jpg)
QuickSort Example
Sort the following using qsort
![Page 9: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/9.jpg)
Quicksort implementation
![Page 10: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/10.jpg)
Implementation issues
Quick sort can be very fast in practice, but this depends on careful coding
Three major issues:
1. dividing the array in-place
2. picking the right pivot
3. avoiding quicksort on small arrays
![Page 11: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/11.jpg)
2. Picking the pivot
In real life, inputs to a sorting routine are often not completely random
So, picking the first or last element to be the pivot is usually a bad choice
One common strategy is to pick the middle element
this is an OK strategy
![Page 12: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/12.jpg)
2. Picking the pivot
A more sophisticated approach is to use random sampling
think about opinion polls
For example, the median-of-three strategy:
take the median of the first, middle, and last elements to be the pivot
![Page 13: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/13.jpg)
3. Avoiding small arrays
While quicksort is extremely fast for large arrays, experimentation shows that it performs less well on small arrays
For small enough arrays, a simpler method such as insertion sort works better
The exact cutoff depends on the language and machine, but usually is somewhere between 10 and 30 elements
![Page 14: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/14.jpg)
A complication!
What should happen if we encounter an element that is equal to the pivot?
Four possibilities:
L stops, R keeps going
R stops, L keeps going
L and R stop
L and R keep going
![Page 15: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/15.jpg)
A complication!
What should happen if we encounter an element that is equal to the pivot?
Four possibilities:
L stops, R keeps going (right list longer)
R stops, L keeps going (left list longer)
L and R stop (lists equal)
L and R keep going (left list longer)
![Page 16: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/16.jpg)
Quick Sort Algorithm
Partitioning Step Choose a pivot element say a = v[j] Determine its final position in the sorted array
• a > v[I] for all I < j
a < v[I] for all I > j Recursive Step
Perform above step on left array and right array An early look at quicksort code (incomplete)
void quicksort(int[] A , int left, int right) {
int I;
if (right > left) {
pivot = Pivot(A, left, right);
I = partition(A, left, right, pivot);
quicksort(A, left, I-1, pivot);
quicksort(A, I+1, right, pivot);
}
}
![Page 17: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/17.jpg)
Quick Sort Code ctd..// Suppose that the pivot is p
// Partition(): rearrange A into 2 sublists
// S1 = { x A | x < p } and S2 = { x A | x > p }
int Partition(int[] A, int left, int right) {
if (A[left] > A[right]) swap(A[left], A[right]);
char pivot = A[left];
int i = left;
int j = right+1;
do {
do ++i; while (A[i] < pivot);
do --j; while (A[j] > pivot);
if (i < j) {
Swap(A[i], A[j]);
}
} while (i < j);
Swap(A[j], A[left]);
return j; // j is the position of the pivot after rearrangement
}
![Page 18: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/18.jpg)
Quick Sort Analysis
![Page 19: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/19.jpg)
Worst-case behavior
105 47 13 17 30 222 5 195
47 13 17 30 222 19 105
47 105 17 30 222 19
13
17
47 105 19 30 22219
If always pick the smallest (or largest) possible pivot
then O(n2) steps
![Page 20: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/20.jpg)
Best-case analysis
In the best case, the pivot is always the median element.
In that case, the splits are always “down the middle”.
Hence, same behavior as mergesort.
That is, O(N log N).
![Page 21: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/21.jpg)
Average-case analysis
Consider the quicksort tree:
105 47 13 17 30 222 5 19
5 17 13 47 30 222 10519
5 17 30 222 105
13 47
105 222
![Page 22: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/22.jpg)
Average-case analysis
At each level of the tree, there are less than N nodes.
So, time spent at each level is O(N).
On average, how many levels?That is, what is the expected height of the
tree?
If on average there are O(log N) levels, then quicksort is O(N log N) on average.
![Page 23: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/23.jpg)
Expected height of qsort tree
Assume that pivot is chosen randomly.
And that ½ the pivots are good, and ½ are bad.
Which elements in the list below are “good” pivots?
5 13 17 19 30 47 105 222
![Page 24: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/24.jpg)
Expected height of qsort tree
Assume that pivot is chosen randomly.
And that ½ the pivots are good, and ½ are bad.
When is a pivot “good”? “Bad”?
5 13 17 19 30 47 105 222
Probability of a good pivot is 0.5.
After good pivot, each partition is at most 3/4 size of original array.
![Page 25: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/25.jpg)
Expected height of qsort tree
So, if we descend k levels in the tree, each time being lucky enough to pick a “good” pivot, the maximum size of the kth child is:
N(3/4)(3/4) … (3/4) (k times)
= N(3/4)k
But on average, only half of the pivots will be good, so
kth child has size N(3/4)k/2
![Page 26: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/26.jpg)
Expected height of qsort tree
But, if the kth child is a leaf, then
N(3/4)k/2 = 1
Thus, the expected height
k = 2log4/3N = O(log N)
![Page 27: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/27.jpg)
Summary of quicksort
A fast sorting algorithm in practice.
Can be implemented in-place.
But is O(N2) in the worst case.
O(N log N) average-case performance.
![Page 28: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/28.jpg)
Shell Sort
![Page 29: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/29.jpg)
Shellsort
Shellsort, like bubble sort and insertion sort, is based on performing exchanges on inverted pairs.
Start by picking a decrement sequence hk, hk-1, …, h1, where h1=1 and for each hi > hi-1.
Start with hk and exchange each pair of inverted array elements that are k elements apart.
Continue with hk-1, …, h1.
![Page 30: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/30.jpg)
Shellsort
Example with sequence 3, 1.
105 47 13 99 30 222
99 47 13 105 30 222
99 30 13 105 47 222
99 30 13 105 47 222
30 99 13 105 47 222
30 13 99 105 47 222
...
Several inverted pairs fixed in one exchange.
![Page 31: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/31.jpg)
Shellsort characteristics
The running time for shellshort depends on the decrement sequence chosen.
hk=N/2, hk-1=hk/2:
Worst-case O(N2).
Let hk=2i-1, for largest 2i-1<N. hk-1=2i-
1-1.
Example: 15, 7, 3, 1.
Worst-case O(N3/2).
Other sequences achieve O(N4/3).
![Page 32: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/32.jpg)
Non-Comparison based Sorting
![Page 33: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/33.jpg)
Non-comparison-based sorting
If we can do more than just compare pairs of elements, we can sometimes sort more quickly
Two simple examples are bucket sort and radix sort
![Page 34: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/34.jpg)
Bucket Sort
![Page 35: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/35.jpg)
Bucket sort
In addition to comparing pairs of elements, we require these additional restrictions:
all elements are non-negative integers
all elements are less than a predetermined maximum value
Elements are usually keys paired with other data
![Page 36: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/36.jpg)
Bucket sort
1 3 3 1 2
1 2 3
![Page 37: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/37.jpg)
Bucket sort characteristics
Runs in O(N) time.
Easy to implement each bucket as a linked list.
Is stable:
If two elements (A,B) are equal with respect to sorting, and they appear in the input in order (A,B), then they remain in the same order in the output.
![Page 38: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/38.jpg)
Work area
![Page 39: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/39.jpg)
Radix Sort
![Page 40: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/40.jpg)
Radix sort
If your integers are in a larger range then do bucket sort on each digit
Start by sorting with the low-order digit using a STABLE bucket sort.
Then, do the next-lowest,and so on
![Page 41: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/41.jpg)
Radix sort
Example:
0 1 00 0 01 0 10 0 11 1 10 1 11 0 01 1 0
20517346
01234567
0 1 00 0 01 0 01 1 01 0 10 0 11 1 10 1 1
0 0 01 0 01 0 10 0 10 1 01 1 01 1 10 1 1
0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1
Each sorting step must be stable.
![Page 42: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/42.jpg)
Radix sort characteristics
Each sorting step can be performed via bucket sort, and is thus O(N).
If the numbers are all b bits long, then there are b sorting steps.
Hence, radix sort is O(bN).
Also, radix sort can be implemented in-place (just like quicksort).
![Page 43: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/43.jpg)
Not just for binary numbers
Radix sort can be used for decimal numbers and alphanumeric strings.
0 3 22 2 40 1 60 1 50 3 11 6 91 2 32 5 2
0 3 10 3 22 5 21 2 32 2 40 1 50 1 61 6 9
0 1 50 1 61 2 32 2 40 3 10 3 22 5 21 6 9
0 1 50 1 60 3 10 3 21 2 31 6 92 2 42 5 2
![Page 44: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/44.jpg)
Why comparison-based?
Bucket and radix sort are much faster than any comparison-based sorting algorithm
Unfortunately, we can’t always live with the restrictions imposed by these algorithms
In such cases, comparison-based sorting algorithms give us general solutions
![Page 45: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/45.jpg)
Lower Bound for the Sorting Problem
![Page 46: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/46.jpg)
How fast can we sort?
We have seen several sorting algorithms with O(N log N) running time.
In fact, O(N log N) is a general lower bound for the sorting algorithm.
A proof appears in Weiss.
Informally…
![Page 47: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/47.jpg)
Upper and lower bounds
N
dg(N)T(N)
T(N) = O(f(N))T(N) = (g(N))
cf(N)
![Page 48: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/48.jpg)
Decision tree for sorting
a<b<ca<c<bb<a<cb<c<ac<a<bc<b<a
a<b<ca<c<bc<a<b
b<a<cb<c<ac<b<a
a<b<ca<c<b
c<a<b
a<b<c a<c<b
b<a<cb<c<a
c<b<a
b<a<c b<c<a
a<b b<a
b<c c<bc<aa<c
b<c c<b a<c c<a
N! leaves.
So, tree has height log(N!).
log(N!) = (N log N).
![Page 49: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/49.jpg)
Summary on sorting bound
If we are restricted to comparisons on pairs of elements, then the general lower bound for sorting is (N log N).
A decision tree is a representation of the possible comparisons required to solve a problem.
![Page 50: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/50.jpg)
Quickselect – finding median
![Page 51: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/51.jpg)
World’s Fastest Sorters
![Page 52: Sorting and Lower bounds 15-211 Fundamental Data Structures and Algorithms Ananda Guna January 27, 2005](https://reader035.vdocuments.site/reader035/viewer/2022062421/56649e535503460f94b49d91/html5/thumbnails/52.jpg)
Sorting competitions
There are several world-wide sorting competitionsUnix CoSort has achieved 1GB in under one
minute, on a single Alpha http://www.cosort.com
Berkeley’s NOW-sort sorted 8.4GB of disk data in under one minute, using a network of 95 workstations http://now.cs.berkeley.edu/
Sandia Labs was able to sort 1TB of data in under 50 minutes, using a 144-node multiprocessor machine