csc 2300 data structures & algorithms march 27, 2007 chapter 7. sorting

20
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

CSC 2300Data Structures & Algorithms

March 27, 2007

Chapter 7. Sorting

Page 2: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Today – Sorting

Quicksort Implementation

Quickselect Algorithm

Decision Trees Bucket Sort Summary

Page 3: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Quicksort – Efficient Implementation Is recursion good for all values of N? No. What should we do for small values of N?

Page 4: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Worst Case Bound

Quicksort: O(N2). Heapsort: O(N log N). Can we combine the two algorithms to achieve

a worst-case O(N log N) bound? Problem 7.27. Modify quicksort to call heapsort if the level of

recursion has reached 2 log N. Why would this work? Consider worst case analysis.

Page 5: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Quicksort Code

Lines 19 and 20 show why Quicksort is so fast.

Page 6: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Selection Problem

Page 1 of text. Given a set of N numbers, determine the

kth largest number. How to solve this problem? Sort the numbers in decreasing order, and

return the number in the kth position. Can you improve on this scheme?

Page 7: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Selection Problem – Modified Sort Problem: Given a set of N numbers, determine the

kth largest number. How to do better than just sorting the numbers? Read the first k numbers into an array, and sort

the numbers in decreasing order. Next, each remaining number is read one by one. Do you know how to complete the algorithm?

Page 8: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Selection Problem – Heapsort Problem: Given a set of N numbers, determine the

kth largest number. What is the time bound if we use a heap? Furthermore, what is the bound if k = N/2 (i.e., if

we want to find the median)?

Page 9: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Selection Problem – Quicksort Problem: Given a set of N numbers, determine the

kth largest number. Quicksort is very fast in sorting N numbers. Thus, quicksort should be very fast in selecting the

kth largest number. What is the work that we can save?

Page 10: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Quicksort – Algorithm

1. If the number of elements in S is 0 or 1, then return.

2. Pick any element v in S. This is called the pivot.

3. Partition S – {v} into two disjoint groups:

S1 = { x ε S – {v} | x ≤ v}

and

S2 = { x ε S – {v} | x ≥ v}.

4. Return { quicksort(S1) followed by v followed by quicksort(S2)}.

Where can we save work if we just want to find the kth largest number?

Quickselect makes only one recursive call (instead of two).

Page 11: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Quickselect – Code

Page 12: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Quicksort – Analysis

Time Bounds – Worst case Best case Average case

What would be the corresponding bounds for Quickselect?

In particular, the average case for Quickselect?

Page 13: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Average Case Analysis

Assume that each of the sizes for S1 is equally likely and thus has probability 1/N.

The average value of T(i) is thus (1/N) ∑ T(j). Quicksort recurrence becomes

T(N) = (2/N) ∑ T(j) + cN. What is recurrence for Quickselect? Quickselect recurrence is

T(N) = (1/N) ∑ T(j) + cN. See Problem 7.30. What is the answer?

Page 14: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Decision Tree

A decision tree is an abstraction used to prove lower bounds. In our context, it is a binary tree.

Each node represents a set of possible orderings. The results of the comparisons are the tree edges.

Page 15: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Decision Tree – Example

Sorting three numbers.

Page 16: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Decision Tree

Every algorithm that sorts by using only comparisons can be represented by a decision tree.

The number of comparisons used the sorting algorithm is equal to the depth of the deepest leaf.

The average number of comparisons used is equal to the average depth of the leaves.

Page 17: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Theory

Lemma 7.1. let T be a binary tree of depth d. Then T has at most 2d leaves.

Lemma 7.2. A binary tree with L leaves must have depth at least [log L].

Theorem 7.6. Any sorting algorithm that uses only comparisons between elements requires at least [log (N!)] comparisons in the worst case.

[ ] represents ceiling above. Stirling’s formula in Problem 7.34. Theorem 7.7. any sorting algorithm that uses

only comparisons between elements requires Ω(N log N) comparisons.

Page 18: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Linear Time Sorting

We have shown that any general sorting algorithm that uses only comparisons requires Ω(N log N) time in the worst case.

Now we describe bucket sort, which is a linear time algorithm.

Is this a contradiction?

Page 19: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Bucket Sort

Say the input A1, A2, …,AN consists of only positive integers smaller than M.

Keep an array called count, of size M, which is initialized to all 0’s.

Thus, count has M cells, or buckets, which are initially empty.

When Ai is read, increment count[Ai] by 1. After all the input is read, scan the count array,

printing out a representation of the sorted list. How much time does the algorithm require? What if M = O(N)? Have we violated the Ω(N log N) lower bound?

Page 20: CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting

Summary