sorting
TRANSCRIPT
TABLE OF CONTENTS
INTRODUCTION
CLASSIFICATION OF SORTING ALGORITHMS
BUBBLE SORT
INSERTION SORT
SHELL SORT
HEAP SORT
MERGE SORT
QUICK SORT
BUCKET SORT
RADIX SORT
INTRODUCTION
A sorting algorithm is an algorithm that puts a list in a specific order. Sorting is
the building block of any program. It helps in optimizing the performance of the
program, and also helps in reducing the code to a great level. This has made
sorting algorithms an area of interest for the Computer Scientists.
The list is an abstract data type that implements an ordered collection of values
where the values may occur more than once in the list. The order in which the list
is sorted can be numerical or lexicographical.
This report aims to give a glimpse about the various sorting algorithms present. It
gives a detailed (but limited) explanation on the working and the uses of some of
the most widely used sorting algorithms. With regard to the different types of
sorting algorithms at hand, the report presents us the basis on which the algorithms
are classified, some of them being complexity, stability, and general method
followed.
For every sorting algorithm, we have provided with a code fragment for the
facility of the readers. The report is to be used as a study material and not just a
reading one.
CLASSIFICATION OF SORTING ALGORITHMS
Sorting algorithms are classified based on various factors. These factors also
influence the efficiency of the sorting algorithm. They are often classified by:
Computational Complexity – Comparisons can be made as worst, average
and best behaviour in terms of the Big O Notation a. For an sorting algorithm
of a list of size , good behaviour is and bad behaviour is
. Ideal behaviour for a sorting algorithm is .
Memory Usage – There are in-place algorithms and out-of-place algorithms.
In-place algorithms require just or extra memory, beyond
the items being sorted. Therefore they don’t need to create auxiliary locations
for data to be temporarily stored, as in out-of-place sorting algorithms.
Recursion – Some algorithms are either recursive or non-recursive, while
others may be both (e.g., merge sort).
Stability – Stable sorting algorithms maintain the relative order of records
with equal keys (i.e., values). As an example the following set of values is to be
sorted by their first component. ( 2 , 6 ) ( 2 , 1 ) ( 3 , 5 ) ( 4 , 3 ) ( 7 , 2 ) .
They can be sorted in two ways :
( 2 , 6 ) ( 2 , 1 ) ( 3 , 5 ) ( 4 , 3 ) ( 7 , 2 ) - Order Not Changed
( 2 , 1 ) ( 2 , 6 ) ( 3 , 5 ) ( 4 , 3 ) ( 7 , 2 ) - Order Changed
The algorithm that does not change the relative order, is the stable algorithm.
The unstable algorithms can always be specially implemented to be stable, by
keeping the order of actual data as the tie breaker for equal values. This
would require additional computational cost and memory.
Comparison – Algorithms can be classified on whether the sorting algorithm
is a comparison sort. A comparison sort examines the data only by comparing
two elements with a comparison operator. Most of the widely used sorts are
comparison sorts.
General Method – According to the method followed by the sorting algorithm
it is classified as insertion, exchange, selection, merging, etc. Examples of
exchange sorts and selection sorts are bubble sort and heap sort respectively.
All these factors will be considered when we will be doing a detailed study of
the widely used and popular sorting algorithms.
BUBBLE SORT
Bubble sort is a simplistic and straightforward method of sorting data that is
used in computer science education. It works by repeatedly steeping through
the list to be sorted, comparing each pair of adjacent items and swapping
them if they are in the wrong order. The pass through the list is repeated until
no swaps are needed, which indicates that the list is sorted. The smaller
elements bubble their way to the top, and hence the name bubble sort. Bubble
sort is a comparison sort.
Bubble sort has the worst case and average complexity both О(n²), where n is
the number of items being sorted. In this algorithm for sorting 100 elements, there
are 10000 comparisons made. There are lot of other sorting algorithms which
substantially work better with the worst or average case of O(n log n). Insertion
sort, which also has a worst case of О(n²), perform better than bubble sort.
Therefore the use of bubble sort is not practical when n is large. The performance
of bubble sort also depends on the position of the elements. Large elements at the
beginning of the list do not pose a problem, as they are quickly swapped. Small
elements towards the end, however, move to the beginning extremely slowly.
Cocktail sort is a variant of bubble sort that solves this problem, but it still retains
the О(n²), worst case complexity.
Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest
number to greatest number using bubble sort algorithm. In each step, elements
written in bold are being compared.
First Pass:
( 5 1 4 2 8 ) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and
swaps them.
( 1 5 4 2 8 ) ( 1 4 5 2 8 ), Swap since 5 > 4
( 1 4 5 2 8 ) ( 1 4 2 5 8 ), Swap since 5 > 2
( 1 4 2 5 8 ) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),
algorithm does not swap them.
Second Pass:
( 1 4 2 5 8 ) ( 1 4 2 5 8 )
( 1 4 2 5 8 ) ( 1 2 4 5 8 )
( 1 2 4 5 8 ) ( 1 2 4 5 8 )
( 1 2 4 5 8 ) ( 1 2 4 5 8 )
Now, the array is already sorted, but our algorithm does not know if it is
completed. The algorithm needs one whole pass without any swap to know it is
sorted.
Third Pass:
( 1 2 4 5 8 ) ( 1 2 4 5 8 )
( 1 2 4 5 8 ) ( 1 2 4 5 8 )
( 1 2 4 5 8 ) ( 1 2 4 5 8 )
( 1 2 4 5 8 ) ( 1 2 4 5 8 )
Finally, the array is sorted, and the algorithm can terminate.
The performance of bubble sort can be improved marginally. When the first pass
is over, the greatest element comes over to the last position i.e. the n-1 position in
the array. For further passes, that position is not to be compared. So now, each
pass can be one step shorter than the previous pass. This can shorten the number of
passes by half although the complexity still remains О(n²).
Due to its simplicity and straightforwardness, bubble sort is often used to
introduce the concept of an algorithm to introductory computer science students.
The Jargon file, which famously calls bogo-sort ‘the archetypical perversely awful
algorithm’, also calls bubble sort ‘the generic bad algorithm’. D. Knuth, in his
popular book ‘The Art of Computer Programming’ concludes that bubble sort
seems to have nothing in it to recommend. Researchers like Owen Astrachan have
shown by experimental results that insertion sort performs better even on random
lists. They have gone as far as to recommend that it no longer be taught.
INSERTION SORT
Insertion sort is a simple sorting algorithm that is relatively efficient for small
lists and mostly-sorted lists, and often is used as a part of more sophisticated
algorithms. It is a comparison sort, in which the sorted list is built one entry
at a time. It works by taking elements from the list one by one and inserting
them in their correct position into a new sorted list. The new list and the
remaining elements can share the array’s space, but insertion is expensive,
requiring shifting all following elements over by one.
Insertion sort has an average and worst complexity of O(n2). The best case is
when an already sorted list is sorted, the running time being linear i.e. O(n). Worst
case is when the list is in the reverse order. Since in average cases also, the
running time is quadratic, it is not considered suitable for large lists. However it is
one of the fastest, when the list contains less than 10 elements. A slight variant of
insertion sort, Shell sort is more efficient for larger lists. The next chapter
gives more detail on shell sort. Insertion sort is stable, and is an in-place
algorithm, requiring a constant amount of additional memory space. It can sort a
list as it receives it. Compared to other advanced algorithms like quick sort,
heap sort, or merge sort, insertion sort is much less efficient on large lists.
The following example will show the algorithm for insertion sort. Let us consider
the array of numbers “5 1 4 2 8”. In each step, elements written in bold are being
compared.
First Pass:
( 5 1 4 2 8 ) ( 1 5 4 2 8 ), Here, algorithm moves down the list searching for an
element less than 5, (which is 1 in this case) and brings it to the front by swaps.
Second Pass:
( 1 5 4 2 8 ) ( 1 4 5 2 8 ) Here 4<5, therefore it is brought to its position
( 1 4 5 2 8 ) ( 1 4 5 2 8 ) Here 1>4, hence no swapping is done
Third Pass:
( 1 4 5 2 8 ) ( 1 4 2 5 8 )
( 1 4 2 5 8 ) ( 1 2 4 5 8 ) Here the list is sorted, but it keeps checking that the
whole list is sorted or not.
( 1 2 4 5 8 ) ( 1 2 4 5 8 )
Fourth Pass:
( 1 2 4 5 8 ) ( 1 2 4 5 8 )
Instead of swapping, direct shifting can be done by using binary search, and
finding out where the element is to be inserted.
Binary search is only efficient if the number of comparisons is more than the
number of swaps. Since insertion is very tedious in arrays, one can use a linked-
list for the sort. But in linked-list, binary search cannot be done as random access
is not allowed in linked lists. In 2004 Bender, Farach-Colton, and Mosteiro
produced a new variant of insertion sort, called library sort, that leaves a small
number of unused gaps spread throughout the array. The benefit is that shifting of
elements be done only till a gap is reached.
void insertionSort(int[] arr) {
int i, j, newValue;
for (i = 1; i < arr.length; i++) {
newValue = arr[i];
j = i;
while (j > 0 && arr[j - 1] > newValue) {
arr[j] = arr[j - 1];
j--;
}
arr[j] = newValue;
}
}
SHELL SORT
Shell sort was invented by Donald Shell, in 1959. The sort was given its name
upon its inventor. It is an improvised version of insertion sort. It combines
insertion sort and bubble sort to give much more efficiency than both of these
traditional algorithms. Shell sort improves insertion sort by comparing elements
separated by a gap of several positions. This lets an element take "bigger steps"
toward its expected position. Multiple passes over the data are taken with smaller
and smaller gap sizes. The last step of Shell sort is a plain insertion sort, but by
then, the array of data is guaranteed to be almost sorted. An implementation of
shell sort can be described as arranging the data sequence in a two-
dimensional array and then sorting the columns of the array using insertion
set. The effect is that the data sequence is partially sorted. The process above
is repeated, but each time with a smaller number of columns. In the last step,
the array consists of only one column. Actually, the data sequence is not held
in a two-dimensional array, but in a one-dimensional array that is indexed
properly.
Though the shell sort is a simple algorithm, finding its complexity is a laborious
task. The original shell sort algorithm has O(n2) complexity for comparisons and
exchanges. The gap sequence is a major factor in the shell sort that improves or
deteriorates the performance of the algorithm. The original gap sequence
suggested by Donald Shell was to begin with N/2 and halve the number until it
reaches 1. With this gap sequence, the worst case running time is O(n2). The other
gap sequences that are in use popularly and their worst case running time is as
follows. O(n3 / 2) for Hibbard's increments of 2k − 1, O(n4 / 3) for Sedgewick's
increments of , or , or O(nlog2n) for Pratt's
increments 2i3j, and possibly unproven better running times. The existence of an
O(nlogn), (which is the optimal performance for comparison sort algorithms)
worst-case implementation of Shell sort was precluded by Poonen, Plaxton, and
Suel.
Let 3 7 9 0 5 1 6 8 4 2 0 6 1 5 7 3 4 9 8 2 be the data sequence to be sorted. First,
it is arranged in an array with 7 columns (left), then the columns are sorted (right):
3 7 9 0 5 1 6
8 4 2 0 6 1 5
7 3 4 9 8 2
3 3 2 0 5 1 5
7 4 4 0 6 1 6
8 7 9 9 8 2
Data elements 8 and 9 have now already come to the end of the sequence, but a
small element (2) is also still there. In the next step, the sequence is arranged in 3
columns, which are again sorted:
3 3 2
0 5 1
5 7 4
4 0 6
1 6 8
7 9 9
8 2
0 0 1
1 2 2
3 3 4
4 5 6
5 6 8
7 7 9
8 9
Now the sequence is almost completely sorted. When arranging it in one column
in the last step, it is only a 6, an 8 and a 9 that have to move a little bit to their
correct position.
The best known sequence according to research by Marcin Ciura is 1, 4, 10, 23,
57, 132, 301, 701, 1750. This study also concluded that "comparisons rather than
moves should be considered the dominant operation in Shellsort." Another
sequence that performs very well on large arrays is the Fibonacci numbers
(leaving out one of the starting 1's) to the power of twice the golden ratio, which
gives the following sequence: 1, 9, 34, 182, 836, 4025, 19001, 90358, 428481,
2034035, 9651787, 45806244, 217378076, 1031612713, ….
Algorithm Shellsort
void shellsort (int[] a, int n){ int i, j, k, h, v; int[] cols = {1391376, 463792, 198768, 86961, 33936, 13776, 4592, 1968, 861, 336, 112, 48, 21, 7, 3, 1} for (k=0; k<16; k++) { h=cols[k]; for (i=h; i<n; i++) { v=a[i]; j=i; while (j>=h && a[j-h]>v) { a[j]=a[j-h]; j=j-h; } a[j]=v; } }}
MERGE SORT
Merge sort is a comparison sort which is very much effective on large lists, with a
worst case complexity of O(n log n) It was invented by John Von Neumann in the
year 1945. Merge sort is an example of divide and conquer algorithm. The
algorithm followed by merge sort is as follows –
1. If the list is of length 0 or 1, then it is already sorted. Otherwise: 2. Divide the unsorted list into two sub-lists of about half the size. 3. Sort each sub-list recursively by re-applying merge sort. 4. Merge the two sub-lists back into one sorted list.
The two main ideas behind the algorithm is sorting small lists takes lesser time and
steps than sorting long lists, and creating a sorted list from two sorted lists is easier
than from two unsorted lists. Merge sort is a stable sort i.e. the order of equal
inputs is preserved in the sorted list.
As mentioned above, merge sort has an average and worst case performance of
O(n log n) in sorting n objects. When comparing with quick sort, merge sort’s
worst case is found equal with quick sort’s best case. In the worst case, merge sort
does about 39% fewer comparisons than quick sort does in the average case. The
main disadvantage that merge sort has is that as many recursive implementations
of the algorithm is done, so much method call overhead is created, thus taking
time and memory. But it is not difficult to code an iterative, non – recursive merge
sort, avoiding all method call overheads. Also merge sort does not sort in – place,
therefore it requires an extra memory to be allocated for the sorted output to be
stored in. One of the main advantage of merge sort is that it has O(n) complexity,
if the input is already sorted, which is equivalent to running through the list and
checking if it is presorted. Sorting in place is possible using linked lists and is very
complicated. In such cases, heap sort is more preferable. Merge sort is a very
stable sort as long as the merge operation is done properly.
Consider a list of “3, 5, 4, 9, 2” to be sorted using merge sort. First the list will be
divided into smaller lists. Here in this case
Now let us consider the comparisons that take place in the algorithm. According to
the algorithm, if the list contains 0 or 1 no. of elements the list is already sorted,
and it is merged to form a larger sorted list. Accordingly, 3 and 5 will be merged
as ( 3 5 ) and 9 and 2 will be merged as ( 2 9 ).
Now 4 and ( 2 9 ) are two sorted lists that are to be merged. After comparisons the
new sorted list will be ( 2 4 9 ). Now the two sorted lists to be merged are ( 3 5 )
and ( 2 4 9 ). The comparisons done are shown below, and the elements being
compared are shown in “bold”.
( 3 5 ) ( 2 4 9 ) ( 2 )
( 3 5 ) ( 4 9 ) ( 2 3 )
( 5 ) ( 4 9 ) ( 2 3 4 )
( 5 ) ( 9 ) ( 2 3 4 5 9 )
Thus the merged list is ( 2 3 4 5 9 ) which is the required sorted output.
Various programming languages use either merge sort or a variant of the algorithm
as their in-built method for sorting.
3 5 4 9 2
3 5 4 9 2
3 5 4 9 2
9 2
public int[] mergeSort(int array[])
{ if(array.length > 1){
int elementsInA1 = array.length/2;int elementsInA2 = elementsInA1;if((array.length % 2) == 1)
elementsInA2 += 1;int arr1[] = new int[elementsInA1];int arr2[] = new int[elementsInA2];for(int i = 0; i < elementsInA1; i++)
arr1[i] = array[i];
for(int i = elementsInA1; i < elementsInA1 + elementsInA2; i++)
arr2[i - elementsInA1] = array[i];
arr1 = mergeSort(arr1);arr2 = mergeSort(arr2);
int i = 0, j = 0, k = 0;
while(arr1.length != j && arr2.length != k){
if(arr1[j] < arr2[k]){
array[i] = arr1[j];i++;j++;
}
else{
array[i] = arr2[k];i++;k++;
}}
while(arr1.length != j){ array[i] = arr1[j];
i++;j++;
}while(arr2.length != k){ array[i] = arr2[k];
i++;k++;
}}
return array;}
HEAPSORT
Heapsort is a much more efficient version of selection sort. It works similar to
selection sort, by determining the largest (or smallest) element of the list, placing
that at the end ( or beginning ) of the list, the continuing with the rest of the list,
but accomplishes the task more efficiently with the use of a data structure called a
heap, which is a special type of binary tree. It is always guaranteed that the root
element of the heap is always the largest element in max_heap ( or smallest
element in min_heap ). When the largest element is removed from the heap, there
is no need to find the next largest element, as the heap rearranges itself so that the
next largest element becomes the root. For the heap data structure to find the next
largest element and to move it to the top, it takes only O(log n) time. Therefore the
whole Heapsort algorithm takes just O(n log n) time.
Heap is a specialized tree based data structure, where if B is a child node of A,
then key (A) ≥ key (B). Therefore in a heap, the root element is always the largest
element. The various operations that can be done on a heap data structure are
insert new element, delete an element from the root, and so on. For elementary
Heapsort algorithms, the binary heap data structure is widely used. The operations
that can be done and their algorithms are given in Appendix B.
As Heapsort has O( n log n) complexity it is always compared with quick sort and
merge sort. Quick sort has O(n²) complexity, which is very insecure and inefficient
for large lists. However quick sort works better on smaller lists because of cache
and other factors. Since Heapsort is more secure, it is used in embedded systems
where security is a great concern. When comparing with merge sort the main
advantage Heapsort has, is that Heapsort requires only a constant amount of
auxiliary storage space, in contrast to merge sort which requires O(n) auxiliary
space. Merge sort has many advantages over Heapsort, some being that merge sort
is stable and can be easily adaptable to linked lists and lists on slow media disks.
Let us study an example which demonstrates the working of Heapsort. For the list
( 11 9 34 25 17 109 53 ) the heap data structure will be
An interesting alternative to Heapsort is introsort which combines quick sort and
Heapsort, keeping the worst case property of Heapsort and average case property
of Quicksort.
Quicksort
Main article: Quicksort
Quicksort is a divide and conquer algorithm which relies on a partition operation: to partition an array, we choose an element, called a pivot, move all smaller elements before the pivot, and move all greater elements after it. This can be done efficiently in linear time and in-place. We then recursively sort the lesser and greater sublists. Efficient implementations of quicksort (with in-place partitioning) are typically unstable sorts and somewhat complex, but are among the fastest sorting algorithms in practice. Together with its modest O(log n) space usage, this makes quicksort one of the most popular sorting algorithms, available in many standard libraries. The most complex issue in quicksort is choosing a good pivot element; consistently poor choices of pivots can result in drastically slower O(n²) performance, but if at each step we choose the median as the pivot then it works in O(n log n).
Bucket sort
Main article: Bucket sort
Bucket sort is a sorting algorithm that works by partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. Thus this is most effective on data whose values are limited (e.g. a sort of a million integers ranging from 1 to 1000). A variation of this method called the single buffered count sort is faster than quicksort and takes about the same time to run on any set of data.
Radix sort
Main article: Radix sort
Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n · k) time by treating them as bit strings. We first sort the list by the least significant bit while preserving their relative order using a stable sort. Then we sort them by the next bit, and so on from right to left, and the list will end up sorted. Most often, the counting sort algorithm is used to accomplish the bitwise sorting, since the number of values a bit can have is minimal - only '1' or '0'.
[hide] v • d • e
Sorting algorithms
Theory
Computational complexity theory | Big O notation | Total order | Lists | Stability | Comparison sort
Exchange sorts Bubble sort | Cocktail sort |
Odd-even sort | Comb sort | Gnome sort | Quicksort
Selection sortsSelection sort | Heapsort | Smoothsort | Cartesian tree sort | Tournament sort
Insertion sortsInsertion sort | Shell sort | Tree sort | Library sort | Patience sorting
Merge sortsMerge sort | Strand sort | Timsort
Non-comparison sortsRadix sort | Bucket sort | Counting sort | Pigeonhole sort | Burstsort | Bead sort
Others
Topological sorting | Sorting network | Bitonic sorter | Batcher odd-even mergesort | Pancake sorting
Ineffective/humorous sorts
Bogosort | Stooge sort
In mathematics, computer science, and related fields, big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. Big O notation allows its users to simplify functions in order to concentrate on their growth rates: different functions with the same growth rate may be represented using the same O notation.
Notation Name Example
constantDetermining if a number is even or odd; using a constant-size lookup table or hash table
inverse Ackermann
Amortized time per operation using a disjoint set
iterated logarithmic
The find algorithm of Hopcroft and Ullman on a disjoint set
log-logarithmicAmortized time per operation using a bounded priority queue [4]
logarithmicFinding an item in a sorted array with a binary search
polylogarithmicDeciding if n is prime with the AKS primality test
fractional power Searching in a kd-tree
linearFinding an item in an unsorted list; adding two n-digit numbers
linearithmic, loglinear, or quasilinear
Performing a Fast Fourier transform; heapsort, quicksort (best case), or merge sort
quadratic
Multiplying two n-digit numbers by a simple algorithm; adding two n×n matrices; bubble sort (worst case or naive implementation), shell sort, quicksort (worst case), or insertion sort
cubic
Multiplying two n×n matrices by simple algorithm; finding the shortest path on a weighted digraph with the Floyd-Warshall algorithm; inverting a (dense) nxn matrix using LU or Cholesky decomposition
polynomial or algebraic
Tree-adjoining grammar parsing; maximum matching for bipartite graphs (grows faster than cubic if and only if c > 3)
L-notation Factoring a number using the special or
general number field sieve
exponential or geometric
Finding the (exact) solution to the traveling salesman problem using dynamic programming; determining if two logical statements are equivalent using brute force
factorial or combinatorial
Solving the traveling salesman problem via brute-force search; finding the determinant with expansion by minors.
double exponential
Deciding the truth of a given statement in Presburger arithmetic
http://www.algolist.net/Algorithms/Sorting/Insertion_sorthttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.1393