ttit33 algorithms and optimization – dalg lecture 2 ht 20062.1 ttit33 algorithms and optimization...

25
HT 2006 2.1 TTIT33 Algorithms and Optimization – Dalg Lecture 2 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2, 11.4-11.8 [CL] 2.1, 7.1-7.4, 8.1, 8.3-8.4

Post on 20-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.1

TTIT33 Algorithms and Optimization – Dalg Lecture 2

TTIT33Algorithms and optimization

Lecture 2 Algorithms

Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2, 11.4-11.8 [CL] 2.1, 7.1-7.4, 8.1, 8.3-8.4

Page 2: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.2

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Content:

Focus: complexity analysis • Intro: aspects of sorting

• Comparison-based sorting:

– Insertion sort, Selection sort, Quick sort, Merge sort

– Theoretical lower bound for comparison-based sorting,

• Digital sorting:

– bucket sort, radix sort

• Selection, median finding, quick select

Page 3: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.3

TTIT33 Algorithms and Optimization – Dalg Lecture 2

The Sorting Problem

Input:• A list L of data items with keys (the part of each

data item we base our sorting on).

Output:• A list L’ of the same data items placed in order

of non-decreasing keys.

Page 4: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.4

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Aspects of Sorting:

• Internal vs. External sorting:Can data be kept in fast, random accessed internal memory – or...

• Sorting in-place vs. sorting with auxiliary data structuresDoes the sorting algorithm need extra data structures?

• Worst-case vs. expected-case performanceHow does the algorithm behave in different situations?

• Sorting by comparison vs. Sorting digitallyCompare keys, or use binary representation of data ?

• Stable vs. unstable sortingWhat happens with multiple occurrences of the same key?

Page 5: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.5

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Insertion sort

”In each iteration, insert the first item from unsorted part

Its proper place in the sorted part”

An in-place sorting algorithm!Data stored in A[0..n-1]

Iterate i from 1 to n-1:• The table consist of:

– Sorted data in A[0.. i -1]– Unsorted data in A[i..n-1]

• Scan sorted part for index s for insertion of the selected item

• Increase i

ii

s

i

Page 6: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.6

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Analysis of Insertion Sort

• t1: n-1 passes over this ”constant speed” code• t2: n-1 passes...• t3: I = worst case no. of iterations in inner loop:• I = 1+2+…+n-1 = (n-1)n/2 = (n2-n)/2• t4: I passes• t5: n-1 passes• T: t1+t2+t3+t4+t5 = 3*(n-1)+(n2-n) = 3n-3+n2-n = n2- 2n -3

...thus we have an algorithm in O(n2) in worst case, but …. good if file almost sorted

Procedure InsertionSort (table A[0..n-1]):1 for i from 1 to n-1 do2 j i; x A[i]3 while j 1 and A[j-1]>x do4 A[j] A[j-1] ; j j-15 A[j] x

Page 7: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.7

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Selection sort

”In each iteration, search the unsorted set for the smallest remaining item to add to the end of the sorted set”

An in-place sorting algorithm!Data stored in A[0..n-1]

Iterate i from 1 to n-1:• The table consist of:

– Sorted data in A[0.. i -1]– Unsorted data in A[i..n-1]

• Scan unsorted part for index s forsmallest remaining item

• Swap places for A[i] and A[s]

i

i s

ii i si si si si si si si s

Page 8: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.8

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Analysis of Selection Sort

• t1: n-1 passes over this ”constant speed” code• t2: n-1 passes...• t3: I = no. of iterations in inner loop:

I = n-2 + n-3 + n-4 +...+1 = (n-2)(n-1)/2 = n2-3n+2• t4: I passes• t5: n-1 passes• T: t1+t2+t3+t4+t5 = 3*(n-1)+2*(n2-3n+2)/2 = 3n-3+n2-3n+2 = n2-1

...thus we have an algorithm in O(n2) ...rather bad!

Procedure StraightSelection (table A[0..n-1]):1 for i from 0 to n-2 do2 s i3 for j from i+1 to n-1 do4 if A[j] < A[s] then s j5 A[i] A[s]

Page 9: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.9

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Quick-Sort

• Quick-sort is a randomized sorting algorithm based on the divide-and-conquer paradigm:– Divide: pick a random

element x (called pivot) and partition S into

• L elements less than x• E elements equal x• G elements greater than x

– Recur: sort L and G– Conquer: join L, E and G

x

x

L GE

x

Page 10: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.10

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Quick-Sort Tree

• An execution of quick-sort is depicted by a binary tree– Each node represents a recursive call of quick-sort and stores

• Unsorted sequence before the execution and its pivot

• Sorted sequence at the end of the execution

– The root is the initial call – The leaves are calls on subsequences of size 0 or 1

7 4 9 6 2 2 4 6 7 9

4 2 2 4 7 9 7 9

2 2 9 9

Page 11: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.11

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Quick Sort – Analysis – worst case...

If the pivot element happensto be the min or max elementof A in each call to quicksort...(e.g., pre-sorted data)– Unbalanced recursion tree– Recursion depth becomes n

Sorting nearly–sorted arraysoccurs frequently in practice

Page 12: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.12

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Quick Sort – Analysis – best case...

• Best – balanced search tree! How...?• If pivot is median of data set!

Page 13: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.13

TTIT33 Algorithms and Optimization – Dalg Lecture 2

In-Place Partitioning

• Perform the partition using two indices to split S into L and E U G (a similar method can split E U G into E and G).

• Repeat until j and k cross:– Scan j to the right until finding an element > x.– Scan k to the left until finding an element < x.– Swap elements at indices j and k

3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 6 9

j k

(pivot = 6)

3 2 5 1 0 7 3 5 9 2 7 9 8 9 7 6 9

j k

Page 14: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.14

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Merge Sort

25 17 19 32 41 43 31 1225 17 19 32 41 43 31 12

25 17 19 32 41 43 31 1225 17 19 32 41 43 12 3117 25 19 32 41 43 12 31

17 19 25 32 12 31 41 43 12 17 19 25 31 32 41 43

Procedure MergeSort (table T[a..b] ):1 if a b then return else 2 Middle (a+b)/23 MergeSort(T[a..Middle])4 MergeSort(T[Middle+1..b]) 5 Merge(T[a..Middle] , T[Middle+1..b])

Page 15: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.15

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Comparison-based sorting• A comparison at every step• Each possible run of the algorithm corresponds to a

root-to-leaf path in a decision tree

xi < xj ?

xa < xb ?

xm < xo ? xp < xq ?xe < xf ? xk < xl ?

xc < xd ?

Page 16: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.16

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Lower bound on running time

• The height of this decision tree is a lower bound on the running time• Every input permutation leads to a separate leaf.

– If not, some input …4…5… would have same output ordering as …5…4…, which would be wrong.

• n!=1*2*…*n leaves heights log (n!)minimum height (time)

log (n!)

xi < xj ?

xa < xb ?

xm < xo ? xp < xq ?xe < xf ? xk < xl ?

xc < xd ?

n!

Page 17: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.17

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Bucket-Sort – not comparison based...

• Don’t compare keys – use them as indices to a table B• If we know that all keys are in {0..255}

...create a table B with room for 256 items,

...”move” all items into B using the key as index

...copy them back in order to A

Procedure BucketSort (table <K> A[0:m]):table <integer> B[0: |K|]for i from 0 to |K| do B[i] 0for i from 0 to m do B[A[i]]++j 0for i from 0 to |K| do

while B[i]-- > 0 do A[j++] i|K| is the max number of different keys

O(m + |K|) ...for fixed size |K|

A52026......

B1020011...

A02256......

Page 18: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.18

TTIT33 Algorithms and Optimization – Dalg Lecture 2

A variant of Bucket Sort

• Organizing items as lists

Input: 7,a ; 1,b ; 3,c ; 7,d ; 3,e ; 7,f

Output: 1,b ; 3,c ; 3,e ; 7,a ; 7,d ; 7,f

1 2 3 4 5 6 7 8 9

1, b 7, a 7, d3, e3, c 7, f

Page 19: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.19

TTIT33 Algorithms and Optimization – Dalg Lecture 2

If keys are large – Radix Sort

• Divide keys in n smaller parts...suitable for Bucket Sort

• For all keys, focusing on the last part,sort using Bucket Sort.

• Redo sort using second last part.• ...• Sort using first part of each key.

Since Bucket sort is stable, Radix sort must be stable!

Page 20: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.20

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Radix Sort - Example

Example: alphabetic keys, split by character, buckets on letters

mary anna bill adam mona (input data)

anna mona bill adam mary (sorted by 4th letter)

adam bill anna mona mary (sorted by 3rd letter, keep relative

order)mary adam bill anna mona

adam anna bill mary mona

Page 21: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.21

TTIT33 Algorithms and Optimization – Dalg Lecture 2

The Selection Problem

• Find the k-th smallest element in x1, x2, …, xn.

• Sort in O(n log n) time and index the k-th element ?

• Can we solve the selection problem faster?

7 4 9 6 2 2 4 6 7 9k=3

Page 22: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.22

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Quick-Select

• Quick-select is a randomized selection algorithm based on the prune-and-search paradigm:– Prune: pick a random element x

(called pivot) and partition S into • L elements less than x• E elements equal x• G elements greater than x

– Search: depending on k, either answer is in E, or we need to recurse in either L or G

x

x

L GE

k < |L|

|L| < k < |L|+|E|

k > |L|+|E|k’ = k - |L| - |E|

Page 23: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.23

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Quick-Select Visualization

• An execution of quick-select can be visualized by a recursion path– Each node represents a recursive call of quick-select, and stores k

and the remaining sequence

k=5, S=(7 4 9 3 2 6 5 1 8)

5

k=2, S=(7 4 9 6 5 8)

k=2, S=(7 4 6 5)

k=1, S=(7 6 5)

Page 24: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.24

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Quick-Select Visualization

• An execution of quick-select can be visualized by a recursion path– Each node represents a recursive call of quick-select, and stores k

and the remaining sequence

k=5, S=(7 4 9 3 2 6 5 1 8)

5

k=2, S=(7 4 9 6 5 8)

k=2, S=(7 4 6 5)

k=1, S=(7 6 5)

Page 25: TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT 20062.1 TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] 11.1-11.2,

HT 2006 2.25

TTIT33 Algorithms and Optimization – Dalg Lecture 2

Proving time complexity of QuickSelect (cont.)

t(n) n * g(n) + t(3n/4)

...where g(n) is the no.of times we scan all data(bad partitions), E(g(n)) = 2

+ the time to search the good partition, max t(3n/4)

E(t(n)) E(n * g(n) + t(3n/4))= n*E(g(n)) + E(t(3n/4))= n*2 + E(t(3n/4))

Reapply:

E(t(n)) n*2 + 3n/4 *2 + E(t(3(3n/4)/4))= n*2 + n*2*3/4 + E(n(3/4)2)

Geometric series:

E(t(n)) n*2 + n*2*3/4 + n*2* (3/4)2 +…

Hence expected running time is O(n)

5 1 8 3 7 2 4 6

5 1 8 3 7 2 4 6

2 1 4 3 5 7 8 6

2 1 4 3

g(n)n

n

n3/4

2 1 4 3

2 1 4 3We recurse into a partition of size n(1/4)...n(3/4) ...asume worst (largest) size

g(n)