analysis of algorithms cs 477/677 lecture 8 instructor: monica nicolescu
DESCRIPTION
CS 477/677 - Lecture 83 How Fast Can We Sort? Insertion sort, Bubble Sort, Selection Sort Merge sort Quicksort What is common to all these algorithms? –These algorithms sort by making comparisons between the input elements To sort n elements, comparison sorts must make (nlgn) comparisons in the worst case (n 2 ) (nlgn)TRANSCRIPT
Analysis of AlgorithmsCS 477/677
Lecture 8Instructor: Monica Nicolescu
CS 477/677 - Lecture 8 2
Quick Announcement
• Hand imaging collection (hand shape)
• Reza Amayeh: [email protected]
• Lab address: LME building room 314
• Time:
– Mon, Wed, Fri: 10:00 am until 5:00pm
– Thu, Thr: 3:00 pm until 5:00pm
CS 477/677 - Lecture 8 3
How Fast Can We Sort?
• Insertion sort, Bubble Sort, Selection Sort
• Merge sort
• Quicksort
• What is common to all these algorithms?
– These algorithms sort by making comparisons between the
input elements
• To sort n elements, comparison sorts must make
(nlgn) comparisons in the worst case
(n2)(nlgn)(nlgn)
CS 477/677 - Lecture 8 4
Decision Tree Model• Represents the comparisons made by a sorting algorithm on an
input of a given size: models all possible execution traces
• Control, data movement, other operations are ignored
• Count only the comparisons
• Decision tree for insertion sort on three elements:
node
leaf:
one execution trace
CS 477/677 - Lecture 8 5
Counting Sort
• Assumption: – The elements to be sorted are integers in the range 0
to k• Idea:
– Determine for each input element x, the number of elements smaller than x
– Place element x into its correct position in the output array
303203521 2 3 4 5 6 7 8
A 774221 2 3 4 5
C 80
533322001 2 3 4 5 6 7 8
B
CS 477/677 - Lecture 8 6
Analysis of Counting SortAlg.: COUNTING-SORT(A, B, n, k)1. for i ← 0 to k2. do C[ i ] ← 03. for j ← 1 to n4. do C[A[ j ]] ← C[A[ j ]] + 15. C[i] contains the number of elements equal to i6. for i ← 1 to k7. do C[ i ] ← C[ i ] + C[i -1]8. C[i] contains the number of elements ≤ i9. for j ← n downto 110. do B[C[A[ j ]]] ← A[ j ]11. C[A[ j ]] ← C[A[ j ]] - 1
(k)
(n)
(k)
(n)
Overall time: (n + k)
CS 477/677 - Lecture 8 7
Analysis of Counting Sort
• Overall time: (n + k)• In practice we use COUNTING sort when k = O(n)
running time is (n)• Counting sort is stable
– Numbers with the same value appear in the same order in
the output array
– Important when satellite data is carried around with the
sorted keys
CS 477/677 - Lecture 8 8
Radix Sort
• Considers keys as numbers in a base-R number– A d-digit number will occupy a field of d
columns
• Sorting looks at one column at a time– For a d digit number, sort the least significant
digit first– Continue sorting on the next least significant
digit, until all digits have been sorted– Requires only d passes through the list
CS 477/677 - Lecture 8 9
RADIX-SORTAlg.: RADIX-SORT(A, d)
for i ← 1 to ddo use a stable sort to sort array A on digit i
• 1 is the lowest order digit, d is the highest-order digit
CS 477/677 - Lecture 8 10
Analysis of Radix Sort
• Given n numbers of d digits each, where each
digit may take up to k possible values, RADIX-
SORT correctly sorts the numbers in (d(n+k))
– One pass of sorting per digit takes (n+k) assuming
that we use counting sort
– There are d passes (for each digit)
CS 477/677 - Lecture 8 11
Correctness of Radix sort• We use induction on number of passes through each
digit• Basis: If d = 1, there’s only one digit, trivial• Inductive step: assume digits 1, 2, . . . , d-1 are sorted
– Now sort on the d-th digit– If ad < bd, sort will put a before b: correct
a < b regardless of the low-order digits– If ad > bd, sort will put a after b: correct
a > b regardless of the low-order digits– If ad = bd, sort will leave a and b in the
same order and a and b are already sorted on the low-order d-1 digits
CS 477/677 - Lecture 8 12
Bucket Sort• Assumption:
– the input is generated by a random process that distributes elements uniformly over [0, 1)
• Idea:– Divide [0, 1) into n equal-sized buckets– Distribute the n input values into the buckets– Sort each bucket– Go through the buckets in order, listing elements in each one
• Input: A[1 . . n], where 0 ≤ A[i] < 1 for all i • Output: elements ai sorted• Auxiliary array: B[0 . . n - 1] of linked lists, each list
initially empty
CS 477/677 - Lecture 8 13
BUCKET-SORT
Alg.: BUCKET-SORT(A, n)
for i ← 1 to n do insert A[i] into list B[nA[i]]for i ← 0 to n - 1
do sort list B[i] with insertion sort
concatenate lists B[0], B[1], . . . , B[n -1] together in order
return the concatenated lists
CS 477/677 - Lecture 8 14
Example - Bucket Sort
.78
.17
.39
.26
.72
.94
.21
.12
.23
.68
0
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
10
.21
.12 /
.72 /
.23 /
.78
.94 /
.68 /
.39 /
.26
.17
/
/
/
/
CS 477/677 - Lecture 8 15
Example - Bucket Sort
0
1
2
3
4
5
6
7
8
9
.23
.17 /
.78 /
.26 /
.72
.94 /
.68 /
.39 /
.21
.12
/
/
/
/
.17.12 .23 .26.21 .39 .68 .78.72 .94 /
Concatenate the lists from 0 to n – 1 together, in order
CS 477/677 - Lecture 8 16
Correctness of Bucket Sort
• Consider two elements A[i], A[ j]• Assume without loss of generality that A[i] ≤ A[j]• Then nA[i] ≤ nA[j]
– A[i] belongs to the same group as A[j] or to a group with a lower index than that of A[j]
• If A[i], A[j] belong to the same bucket:– insertion sort puts them in the proper order
• If A[i], A[j] are put in different buckets:– concatenation of the lists puts them in the proper order
CS 477/677 - Lecture 8 17
Analysis of Bucket Sort
Alg.: BUCKET-SORT(A, n)
for i ← 1 to n
do insert A[i] into list B[nA[i]]for i ← 0 to n - 1
do sort list B[i] with insertion sort
concatenate lists B[0], B[1], . . . , B[n -1] together in order
return the concatenated lists
O(n)
(n)
O(n)
(n)
CS 477/677 - Lecture 8 18
Conclusion
• Any comparison sort will take at least nlgn to sort an
array of n numbers
• We can achieve a better running time for sorting if we
can make certain assumptions on the input data:
– Counting sort: each of the n input elements is an integer in the
range 0 to k– Radix sort: the elements in the input are integers represented
with d digits
– Bucket sort: the numbers in the input are uniformly distributed
over the interval [0, 1)
CS 477/677 - Lecture 8 19
A Job Scheduling Application
• Job scheduling– The key is the priority of the jobs in the queue
– The job with the highest priority needs to be executed next
• Operations– Insert, remove maximum
• Data structures– Priority queues– Ordered array/list, unordered array/list
CS 477/677 - Lecture 8 20
Example
CS 477/677 - Lecture 8 21
PQ Implementations & Cost
Worst-case asymptotic costs for a PQ with N items
Insert Remove max
ordered arrayordered listunordered arrayunordered list
N
N
N
N
1
1
1
1
Can we implement both operations efficiently?
CS 477/677 - Lecture 8 22
Background on Trees• Def: Binary tree = structure composed of a finite
set of nodes that either:– Contains no nodes, or– Is composed of three disjoint sets of nodes: a root
node, a left subtree and a right subtree
2
14 8
1
16
4
3
9 10
root
Right subtreeLeft subtree
CS 477/677 - Lecture 8 23
Special Types of Trees
• Def: Full binary tree = a binary tree in which each node is either a leaf or has degree exactly 2.
• Def: Complete binary tree = a binary tree in which all leaves have the same depth and all internal nodes have degree 2.
Full binary tree
2
14 8
1
16
7
4
3
9 10
12
Complete binary tree
2
1
16
4
3
9 10
CS 477/677 - Lecture 8 24
The Heap Data Structure
• Def: A heap is a nearly complete binary tree with the following two properties:– Structural property: all levels are full, except
possibly the last one, which is filled from left to right– Order (heap) property: for any node x
Parent(x) ≥ x
Heap
5
7
8
4It doesn’t matter that 4 in level 1 is smaller than 5 in level 22
CS 477/677 - Lecture 8 25
Definitions• Height of a node = the number of edges on a longest
simple path from the node down to a leaf• Depth of a node = the length of a path from the root to
the node• Height of tree = height of root node
= lgn, for a heap of n elements
2
14 8
1
16
4
3
9 10
Height of root = 3
Height of (2)= 1 Depth of (10)= 2
CS 477/677 - Lecture 8 26
Array Representation of Heaps• A heap can be stored as an
array A.– Root of tree is A[1]– Left child of A[i] = A[2i]– Right child of A[i] = A[2i + 1]– Parent of A[i] = A[ i/2 ]– Heapsize[A] ≤ length[A]
• The elements in the subarray A[(n/2+1) .. n] are leaves
• The root is the maximum element of the heap
A heap is a binary tree that is filled in order
CS 477/677 - Lecture 8 27
Heap Types
• Max-heaps (largest element at root), have the max-heap property: – for all nodes i, excluding the root:
A[PARENT(i)] ≥ A[i]
• Min-heaps (smallest element at root), have the min-heap property:– for all nodes i, excluding the root:
A[PARENT(i)] ≤ A[i]
CS 477/677 - Lecture 8 28
Operations on Heaps
• Maintain the max-heap property– MAX-HEAPIFY
• Create a max-heap from an unordered array– BUILD-MAX-HEAP
• Sort an array in place– HEAPSORT
• Priority queue operations
CS 477/677 - Lecture 8 29
Operations on Priority Queues
• Max-priority queues support the following
operations:
– INSERT(S, x): inserts element x into set S– EXTRACT-MAX(S): removes and returns element of
S with largest key
– MAXIMUM(S): returns element of S with largest key
– INCREASE-KEY(S, x, k): increases value of element
x’s key to k (Assume k ≥ x’s current key value)
CS 477/677 - Lecture 8 30
Maintaining the Heap Property
• Suppose a node is smaller than a child– Left and Right subtrees of i are max-heaps
• Invariant: – the heap condition is violated only at that
node
• To eliminate the violation:– Exchange with larger child– Move down the tree– Continue until node is not smaller than
children
CS 477/677 - Lecture 8 31
Maintaining the Heap Property
• Assumptions:– Left and Right
subtrees of i are max-heaps
– A[i] may be smaller than its children
Alg: MAX-HEAPIFY(A, i, n)1. l ← LEFT(i)2. r ← RIGHT(i)3. if l ≤ n and A[l] > A[i]4. then largest ←l5. else largest ←i6. if r ≤ n and A[r] > A[largest]7. then largest ←r8. if largest i9. then exchange A[i] ↔ A[largest]10. MAX-HEAPIFY(A, largest, n)
CS 477/677 - Lecture 8 32
ExampleMAX-HEAPIFY(A, 2, 10)
A[2] violates the heap property
A[2] A[4]
A[4] violates the heap property
A[4] A[9]
Heap property restored
CS 477/677 - Lecture 8 33
MAX-HEAPIFY Running Time
• Intuitively:
– A heap is an almost complete binary tree must
process O(lgn) levels, with constant work at each
level
• Running time of MAX-HEAPIFY is O(lgn) • Can be written in terms of the height of the heap,
as being O(h)– Since the height of the heap is lgn
CS 477/677 - Lecture 8 34
Building a Heap
Alg: BUILD-MAX-HEAP(A)1. n = length[A]
2. for i ← n/2 downto 13. do MAX-HEAPIFY(A, i, n)
• Convert an array A[1 … n] into a max-heap (n = length[A])
• The elements in the subarray A[(n/2+1) .. n] are leaves
• Apply MAX-HEAPIFY on elements between 1 and n/2
2
14 8
1
16
7
4
3
9 10
1
2 3
4 5 6 7
8 9 10
4 1 3 2 16 9 10 14 8 7A:
CS 477/677 - Lecture 8 35
Readings
• Chapter 8, 6