csc401 – analysis of algorithms lecture notes 8 comparison-based sorting

25
1 CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting Objectives Objectives Introduce different known sorting Introduce different known sorting algorithms algorithms Analyze the running of diverse Analyze the running of diverse sorting algorithms sorting algorithms Induce the lower bound of running Induce the lower bound of running time on comparison-based sorting time on comparison-based sorting

Upload: hye

Post on 29-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting. Objectives Introduce different known sorting algorithms Analyze the running of diverse sorting algorithms Induce the lower bound of running time on comparison-based sorting. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

11

CSC401 – Analysis of Algorithms Lecture Notes 8

Comparison-Based Sorting

ObjectivesObjectives

Introduce different known sorting Introduce different known sorting algorithmsalgorithms

Analyze the running of diverse sorting Analyze the running of diverse sorting algorithmsalgorithms

Induce the lower bound of running Induce the lower bound of running time on comparison-based sortingtime on comparison-based sorting

Page 2: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

22

Divide-and-ConquerDivide-and-Conquer Divide-and conquerDivide-and conquer is a is a

general algorithm general algorithm design paradigm:design paradigm:– DivideDivide: divide the input : divide the input

data data SS in two disjoint in two disjoint subsets subsets SS11 and and SS22

– RecurRecur: solve the : solve the subproblems associated subproblems associated with with SS11 and and SS22

– ConquerConquer: combine the : combine the solutions for solutions for SS11 and and SS22 into a solution for into a solution for SS

The base case for the The base case for the recursion are recursion are subproblems of size 0 subproblems of size 0 or 1or 1

Merge-sortMerge-sort is a sorting is a sorting algorithm based on the algorithm based on the divide-and-conquer divide-and-conquer paradigm paradigm

Like heap-sortLike heap-sort– It uses a comparatorIt uses a comparator– It has It has OO((nn log log nn) ) running running

timetime Unlike heap-sortUnlike heap-sort

– It does not use an It does not use an auxiliary priority queueauxiliary priority queue

– It accesses data in a It accesses data in a sequential manner sequential manner (suitable to sort data on (suitable to sort data on a disk)a disk)

Page 3: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

33

Merge-SortMerge-Sort Merge-sort on an Merge-sort on an

input sequence input sequence SS with with nn elements elements consists of three consists of three steps:steps:– DivideDivide: partition : partition SS into into

two sequences two sequences SS11 and and SS22 of about of about nn22 elements eachelements each

– RecurRecur: recursively sort : recursively sort SS11 and and SS22

– ConquerConquer: merge : merge SS11 and and SS2 2 into a unique into a unique sorted sequencesorted sequence

Algorithm mergeSort(S, C)Input sequence S with n

elements, comparator C Output sequence S sorted

according to Cif S.size() > 1

(S1, S2) partition(S, n/2)

mergeSort(S1, C)

mergeSort(S2, C)

S merge(S1, S2)

Page 4: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

44

Merging Two Sorted SequencesMerging Two Sorted Sequences The conquer step of The conquer step of

merge-sort consists merge-sort consists of merging two of merging two sorted sequences sorted sequences A A and and BB into a sorted into a sorted sequence sequence S S containing the union containing the union of the elements of of the elements of A A and and BB

Merging two sorted Merging two sorted sequences, each sequences, each with with nn22 elements elements and implemented by and implemented by means of a doubly means of a doubly linked list, takes linked list, takes OO((nn)) timetime

Algorithm merge(A, B)Input sequences A and B with

n2 elements each

Output sorted sequence of A B

S empty sequence

while A.isEmpty() B.isEmpty()if A.first().element() <

B.first().element()S.insertLast(A.remove(A.first()))

elseS.insertLast(B.remove(B.first()))

while A.isEmpty()S.insertLast(A.remove(A.first()))while B.isEmpty()S.insertLast(B.remove(B.first()))return S

Page 5: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

55

Merge-Sort TreeMerge-Sort Tree An execution of merge-sort is depicted by a binary An execution of merge-sort is depicted by a binary

treetree– each node represents a recursive call of merge-sort and each node represents a recursive call of merge-sort and

storesstores unsorted sequence before the execution and its partitionunsorted sequence before the execution and its partition sorted sequence at the end of the executionsorted sequence at the end of the execution

– the root is the initial call the root is the initial call – the leaves are calls on subsequences of size 0 or 1the leaves are calls on subsequences of size 0 or 1

7 2 9 4 2 4 7 9

7 2 2 7 9 4 4 9

7 7 2 2 9 9 4 4

Page 6: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

66

Execution ExampleExecution Example

7 2 9 4 2 4 7 9 3 8 6 1 1 3 6 8

7 2 2 7 9 4 4 9 3 8 3 8 6 1 1 6

7 7 2 2 9 9 4 4 3 3 8 8 6 6 1 1

7 2 9 4 3 8 6 1 1 2 3 4 6 7 8 9

Page 7: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

77

Analysis of Merge-SortAnalysis of Merge-Sort The height The height hh of the merge-sort tree is of the merge-sort tree is OO(log (log nn))

– at each recursive call we divide in half the sequence, at each recursive call we divide in half the sequence,

The overall amount or work done at the nodes of The overall amount or work done at the nodes of depth depth i i is is OO((nn)) – we partition and merge we partition and merge 22ii sequences of size sequences of size nn22ii – we make we make 22ii11 recursive calls recursive calls

Thus, the total running time of merge-sort is Thus, the total running time of merge-sort is OO((nn log log nn))

sizesize#seqs#seqsdepthdepth

………………

nn22ii22iiii

nn222211

nn1100

Page 8: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

88

Set OperationsSet Operations We represent a set by We represent a set by

the sorted sequence of the sorted sequence of its elementsits elements

By specializing the By specializing the auxliliary methods he auxliliary methods he generic merge generic merge algorithm can be used algorithm can be used to perform basic set to perform basic set operations:operations:– unionunion– intersectionintersection– subtractionsubtraction

The running time of an The running time of an operation on sets operation on sets A A and and B B should be at most should be at most OO((nnAAnnBB))

Set union:Set union:– aIsLessaIsLess((a, Sa, S))

S.insertFirstS.insertFirst((aa))– bIsLessbIsLess((b, Sb, S))

S.insertLastS.insertLast((bb))– bothAreEqualbothAreEqual((a, b, Sa, b, S))

S. insertLastS. insertLast((aa)) Set intersection:Set intersection:

– aIsLessaIsLess((a, Sa, S)) { { do nothing do nothing }}

– bIsLessbIsLess((b, Sb, S)) { { do nothing do nothing }}

– bothAreEqualbothAreEqual((a, b, Sa, b, S))S. insertLastS. insertLast((aa))

Page 9: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

99

Storing a Set in a ListStoring a Set in a List We can implement a set with a listWe can implement a set with a list Elements are stored sorted according to some Elements are stored sorted according to some

canonical orderingcanonical ordering The space used is The space used is OO((nn))

List

Nodes storing set elements in order

Set elements

Page 10: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1010

Generic MergingGeneric Merging Generalized merge Generalized merge

of two sorted listsof two sorted lists A A and and BB

Template method Template method genericMergegenericMerge

Auxiliary methodsAuxiliary methods– aIsLessaIsLess– bIsLessbIsLess– bothAreEqualbothAreEqual

Runs in Runs in OO((nnAAnnBB)) time provided the time provided the auxiliary methods auxiliary methods run in run in OO(1)(1) time time

Algorithm genericMerge(A, B)S empty sequencewhile A.isEmpty() B.isEmpty()

a A.first().element(); b B.first().element()

if a < baIsLess(a, S); A.remove(A.first())

else if b < abIsLess(b, S); B.remove(B.first())

else { b = a } bothAreEqual(a, b, S)A.remove(A.first()); B.remove(B.first())

while A.isEmpty()aIsLess(a, S); A.remove(A.first())

while B.isEmpty()bIsLess(b, S); B.remove(B.first())

return S

Page 11: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1111

Using Generic Merge for Set OperationsUsing Generic Merge for Set Operations Any of the set operations can be Any of the set operations can be

implemented using a generic mergeimplemented using a generic merge For example:For example:

– For For intersectionintersection: only copy elements that : only copy elements that are duplicated in both listare duplicated in both list

– For For unionunion: copy every element from both : copy every element from both lists except for the duplicateslists except for the duplicates

All methods run in linear time.All methods run in linear time.

Page 12: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1212

Quick-SortQuick-Sort Quick-sortQuick-sort is a is a

randomized sorting randomized sorting algorithm based on the algorithm based on the divide-and-conquer divide-and-conquer paradigm:paradigm:– DivideDivide: pick a random : pick a random

element element xx (called (called pivotpivot) ) and partition and partition SS into into L L elements less than elements less than xx E E elements equal elements equal xx G G elements greater than elements greater than xx

– RecurRecur: sort : sort L L and and GG– ConquerConquer: join : join LL, , EE and and GG

x

x

L GE

x

Page 13: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1313

PartitionPartition We partition an input We partition an input

sequence as follows:sequence as follows:– We remove, in turn, each We remove, in turn, each

element element yy from from SS and and – We insert We insert yy into into LL, , EE or or GG,,

depending on the result depending on the result of the comparison with of the comparison with the pivot the pivot xx

Each insertion and Each insertion and removal is at the removal is at the beginning or at the end beginning or at the end of a sequence, and of a sequence, and hence takes hence takes OO(1)(1) time time

Thus, the partition step Thus, the partition step of quick-sort takes of quick-sort takes OO((nn)) timetime

Algorithm partition(S, p)Input sequence S, position p of pivot Output subsequences L, E, G of the

elements of S less than, equal to,or greater than the pivot, resp.

L, E, G empty sequencesx S.remove(p) while S.isEmpty()

y S.remove(S.first())if y < x

L.insertLast(y)else if y = x

E.insertLast(y)else { y > x }

G.insertLast(y)return L, E, G

Page 14: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1414

Quick-Sort TreeQuick-Sort Tree An execution of quick-sort is depicted by a An execution of quick-sort is depicted by a

binary treebinary tree– Each node represents a recursive call of quick-sort Each node represents a recursive call of quick-sort

and storesand stores Unsorted sequence before the execution and its pivotUnsorted sequence before the execution and its pivot Sorted sequence at the end of the executionSorted sequence at the end of the execution

– The root is the initial call The root is the initial call – The leaves are calls on subsequences of size 0 or 1The leaves are calls on subsequences of size 0 or 1

7 4 9 6 2 2 4 6 7 9

4 2 2 4 7 9 7 9

2 2 9 9

Page 15: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1515

Execution ExampleExecution Example

7 9 7 17 7 9

8 8

7 2 9 4 3 7 6 1 1 2 3 4 6 7 7 9

2 4 3 1 1 2 3 4

1 1 4 3 3 4

9 9 4 4

9 9

Page 16: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1616

Worst-case Running TimeWorst-case Running Time The worst case for quick-sort occurs when the pivot The worst case for quick-sort occurs when the pivot

is the unique minimum or maximum elementis the unique minimum or maximum element One of One of LL and and GG has size has size n n 1 1 and the other has size and the other has size 00 The running time is proportional to the sumThe running time is proportional to the sum

nn ( (nn 1) 1) … … 2 2 Thus, the worst-case running time of quick-sort is Thus, the worst-case running time of quick-sort is

OO((nn22))timetimedepthdepth

11nn 1 1

…………

nn 1 111

nn00

Page 17: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1717

Expected Running TimeExpected Running Time Consider a recursive call of quick-sort on a sequence of size Consider a recursive call of quick-sort on a sequence of size ss

– Good callGood call:: the sizes of the sizes of LL and and GG are each less than are each less than 33ss44– Bad callBad call:: one of one of LL and and GG has size greater than has size greater than 33ss44

A call is A call is goodgood with probability with probability 1122– 1/2 of the possible pivots cause good calls:1/2 of the possible pivots cause good calls:7 9 7 1 1

7 2 9 4 3 7 6 1 9

2 4 3 1 7 2 9 4 3 7 61

7 2 9 4 3 7 6 1

Good call Bad call

Good pivotsBad pivots Bad pivots

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Page 18: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1818

Expected Running Time, Part 2Expected Running Time, Part 2 Probabilistic Fact:Probabilistic Fact: The expected number of coin tosses The expected number of coin tosses

required in order to get required in order to get kk heads is heads is 22kk For a node of depth For a node of depth ii, we expect, we expect

– ii2 2 ancestors are good callsancestors are good calls– The size of the input sequence for the current call is at The size of the input sequence for the current call is at

most (most (3344))ii22nn

s(r)

s(a) s(b)

s(c) s(d) s(f)s(e)

time per levelexpected height

O(log n)

O(n)

O(n)

O(n)

total expected time: O(n log n)

Therefore, we haveTherefore, we have– For a node of depth For a node of depth 2log2log4433nn, ,

the expected input size is the expected input size is oneone

– The expected height of the The expected height of the quick-sort tree is quick-sort tree is OO(log (log nn))

The amount or work done at The amount or work done at the nodes of the same depth is the nodes of the same depth is OO((nn))

Thus, the expected running Thus, the expected running time of quick-sort is time of quick-sort is OO((n n log log nn))

Page 19: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

1919

In-Place Quick-SortIn-Place Quick-Sort Quick-sort can be Quick-sort can be

implemented to run in-implemented to run in-placeplace

In the partition step, we In the partition step, we use replace operations to use replace operations to rearrange the elements rearrange the elements of the input sequence of the input sequence such thatsuch that– the elements less than the the elements less than the

pivot have rank less than pivot have rank less than hh– the elements equal to the the elements equal to the

pivot have rank between pivot have rank between hh and and kk

– the elements greater than the elements greater than the pivot have rank the pivot have rank greater than greater than kk

Algorithm inPlaceQuickSort(S, l, r)Input sequence S, ranks l and rOutput sequence S with the

elements of rank between l and rrearranged in increasing order

if l r return

i a random integer between l and r x S.elemAtRank(i) (h, k) inPlacePartition(x)inPlaceQuickSort(S, l, h 1)inPlaceQuickSort(S, k 1, r)

The recursive calls The recursive calls considerconsider– elements with rank less elements with rank less

than than hh– elements with rank elements with rank

greater than greater than kk

Page 20: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

2020

In-Place PartitioningIn-Place Partitioning Perform the partition using two indices to split S Perform the partition using two indices to split S

into L and E U G (a similar method can split E U G into L and E U G (a similar method can split E U G into E and G).into E and G).

Repeat until j and k cross:Repeat until j and k cross:– Scan j to the right until finding an element Scan j to the right until finding an element >> x. x.– Scan k to the left until finding an element < x.Scan k to the left until finding an element < x.– Swap elements at indices j and kSwap elements at indices j and k

3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 6 9

j k

(pivot = 6)

3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 6 9

j k

Page 21: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

2121

Summary of Sorting AlgorithmsSummary of Sorting Algorithms

in-place, randomizedin-place, randomized fastest (good for large inputs)fastest (good for large inputs)

OO((nn log log nn))expectedexpected

quick-sortquick-sort

sequential data accesssequential data access fast (good for huge inputs)fast (good for huge inputs)OO((nn log log nn))merge-sortmerge-sort

in-placein-place fast (good for large inputs)fast (good for large inputs)OO((nn log log nn))heap-sortheap-sort

OO((nn22))

OO((nn22))

TimeTime

insertion-sortinsertion-sort

selection-sortselection-sort

AlgorithmAlgorithm NotesNotes

in-placein-place slow (good for small inputs)slow (good for small inputs)

in-placein-place slow (good for small inputs)slow (good for small inputs)

Page 22: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

2222

Comparison-Based SortingComparison-Based Sorting Many sorting algorithms are comparison based.Many sorting algorithms are comparison based.

– They sort by making comparisons between pairs of They sort by making comparisons between pairs of objectsobjects

– Examples: bubble-sort, selection-sort, insertion-sort, Examples: bubble-sort, selection-sort, insertion-sort, heap-sort, merge-sort, quick-sort, ...heap-sort, merge-sort, quick-sort, ...

Let us therefore derive a lower bound on the Let us therefore derive a lower bound on the running time of any algorithm that uses running time of any algorithm that uses comparisons to sort n elements, xcomparisons to sort n elements, x11, x, x22, …, x, …, xnn..

Is xi < xj?

yes

no

Page 23: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

2323

Counting ComparisonsCounting Comparisons Let us just count comparisons then.Let us just count comparisons then. Each possible run of the algorithm Each possible run of the algorithm

corresponds to a root-to-leaf path in a corresponds to a root-to-leaf path in a decision treedecision tree

xi < xj ?

xa < xb ?

xm < xo ? xp < xq ?xe < xf ? xk < xl ?

xc < xd ?

Page 24: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

2424

Decision Tree HeightDecision Tree Height The height of this decision tree is a lower bound on the The height of this decision tree is a lower bound on the

running timerunning time Every possible input permutation must lead to a separate leaf Every possible input permutation must lead to a separate leaf

output. output. – If not, some input …4…5… would have same output If not, some input …4…5… would have same output

ordering as …5…4…, which would be wrong.ordering as …5…4…, which would be wrong. Since there are n!=1*2*…*n leaves, the height is at least log Since there are n!=1*2*…*n leaves, the height is at least log

(n!)(n!) minimum height (time)

log (n!)

xi < xj ?

xa < xb ?

xm < xo ? xp < xq ?xe < xf ? xk < xl ?

xc < xd ?

n!

Page 25: CSC401 – Analysis of Algorithms Lecture Notes 8 Comparison-Based Sorting

2525

The Lower BoundThe Lower Bound Any comparison-based sorting algorithms takes at least log (n!) Any comparison-based sorting algorithms takes at least log (n!)

timetime Therefore, any such algorithm takes time at leastTherefore, any such algorithm takes time at least

That is, any comparison-based sorting algorithm must run in Ω(n That is, any comparison-based sorting algorithm must run in Ω(n log n) time.log n) time.

).2/(log)2/(2

log)!(log2

nnn

n

n