correctness of huffman codes introduction to randomized algorithms quickselect & quicksort...

129
Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Upload: arabella-turner

Post on 18-Dec-2015

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

1

Correctness of Huffman Codes

Introduction To Randomized Algorithms

QuickSelect & QuickSort

Monday, July 14th

Page 2: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

2

Outline For Today

1. Correctness of Huffman Codes

2. QuickSelect

3. Probability Review

4. QuickSelect Runtime Analysis

5. QuickSort

Page 3: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

3

Outline For Today

1. Correctness of Huffman Codes

2. QuickSelect

3. Probability Review

4. QuickSelect Runtime Analysis

5. QuickSort

Page 4: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Goal: Make the binary blob as small as

possible, satisfying the protocol.

Recap: Encoding-Decoding

010010100010010100011110110010010101010110100001110100010011000010010101011010100010

encoder

decoder

Page 5: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Ex: Variable Length Prefix-free Encoding

Ex: A = a, b, c, d

abcd

010110111

110010

decode

Page 6: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Ex: Variable Length Prefix-free Encoding

Ex: A = a, b, c, d

abcd

010110111

110010

decodec

Page 7: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Ex: Variable Length Prefix-free Encoding

Ex: A = a, b, c, d

abcd

010110111

110010

decodeca

Page 8: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Ex: Variable Length Prefix-free Encoding

Ex: A = a, b, c, d

abcd

010110111

110010

decodecab

Page 9: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Recap: Prefix-free Encodings Binary TreesWe can represent each prefix-free code Ɣ as a binary

tree T and vice-versa.

abcd

010110111

Code 1

b

c d

0 1

a0 1

0 1Encoding of letter x = path

from the root to the leaf with

x

# bits for x = depthT(x)

Page 10: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Recap: Formal Problem Statement

Input: An alphabet A, and frequencies 𝓕 of letters in A

Output: A binary tree T, where letters of A are the

leaves of T, that has the minimum average bit length

(ABL):

Page 11: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Recap: Observations About Optimal T

Observation 1: The optimal binary tree T is full, i.e.,

each non-leaf vertex u has exactly 2 children

a

0 1

c

0 1

b

0 1

e0

a

0 1

c

0 1

b

0 1

e

Why?T T`

Page 12: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Recap: Observation 2 About Optimal TClaim: In any optimal tree T if leaf x has depth i, and leaf

y has depth j, s.t i < j => f(x) ≥ f(y)

Exchange Argument: Replace x and y and get a better

tree T`.

Page 13: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Corollary

In any optimal tree T the two lowest

frequency letters are both in the lowest

level of the tree!

Page 14: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Recap: Huffman’s Key Insight

Observation 1 => optimal Ts are full => each leaf has

a sibling

Corollary => 2 lowest freq. letters x, y are at the same

level

Changing letters across the same level does not

change the cost of T

b

c d

0 1

a0 1

0 1

There is an optimal tree T,

in which the two lowest

frequency letters are

siblings (in the lowest level

of the tree).

Page 15: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Possible Greedy Algorithm

Possible greedy algorithm:

1. If x, y are siblings, treat them as a single meta-letter

xy

2. Find an optimal tree T* with A-x, y + xy

3. Expand xy back into x and y in T*

Page 16: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Possible Greedy Algorithm (Example)

xy t

0 1

z0 1

Ex: A = x, y, z, t, and let x, y be the two lowest freq.

letters

Let A` = xy, z, t

t

0 1

z0 1

x y

0 1

T* T

Page 17: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Huffman’s Algorithm (1951)

procedure Huffman(A, 𝓕): if (|A|=2): return T where branch 0, 1 point to A[0] and A[1], respectively

let x, y be lowest two frequency letters let A` = A-x,y+xy let ` = - x, y + xy: f(x) + f(y)𝓕 𝓕 T* = Huffman(A`, `)𝓕 expand x, y in T* to get Treturn T`

Page 18: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Huffman’s Algorithm Correctness (1)

By induction on the |A|

Base case: |A| = 2 => return simple full tree with 2

leaves

IH: Assume true for all alphabets of size k-1

Huffman will get a Tk-1opt with meta-letter xy and

expand xy

Page 19: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Huffman’s Algorithm Correctness (2)

xy t

0 1z

0 1t

0 1z

0 1

x y0 1

Tk-1opt T

f(xy)*depth(xy)=(f(x) +

f(y))*depth(xy)

(f(x) + f(y))*(depth(xy) + 1)

Total diff = f(x) + f(y)

Page 20: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Huffman’s Algorithm Correctness (3)

Take any optimal Z, we’ll argue ABL(T) ≤ ABL(Z)

By corollary, we can assume in Z, x,y are also siblings

at the lowest level.

Consider Z` by merging xy in Z => Z` is valid prefix-

code for A` of size k-1

Page 21: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

21

Correctness

t

0 1

z

0 1

x y0 1

t

0 1

z

0 1

xy

Z Z`

ABL(Z) = ABL(Z`) + f(x) + f(y)

ABL(T) = ABL(T`) + f(x) + f(y)

By IH: ABL(T`) ≤ ABL(T`) => ABL(T) ≤

ABL(Z)

Q.E.D

Total diff is again f(x) + f(y)!

Page 22: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

Huffman’s Algorithm Runtime

Exercise: Make Huffman run in O(|A|log(|A|))?

Page 23: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

23

Outline For Today

1. Correctness of Huffman Codes

2. QuickSelect

3. Probability Review

4. QuickSelect Runtime Analysis

5. QuickSort

Page 24: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

24

Given a fixed input, they may give:

1. Different outputs

2. Different runtimes

depending on the outcomes of the coins

Compared to their deterministic counterparts:

often simpler, more practical, elegant

Randomized Algorithms

Randomized AlgorithmInput Outpu

t

flip coins

Page 25: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

25

Input: An array A of n integers, and an integer 1 ≤ k ≤

n

Output: Find the rank-k element in A: kth smallest

element

If k = 1 find min

If k = n find max

If k = n/2 find median

Naïve Solution: Sort A, return kth element O(nlog(n))

Problem of Selection

Can we do better? Maybe O(n)?

Page 26: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

26

QuickSelect

Given A, k

Pick a pivot p from A uniformly at random,

Partition A into AL: those < p and AR: those > p

If p is the rank-k element return p, i.e. |Al| = k-1

Otherwise recurse on either AL or AR depending on the

sizes of AL and AR .

Page 27: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

27

QuickSelect Simulation

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 28: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

28

QuickSelect Simulationpivot

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 29: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

29

QuickSelect Simulationpivot

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 30: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

30

QuickSelect Simulationpivot

105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 31: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

31

QuickSelect Simulationpivot

105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 32: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

32

QuickSelect Simulationpivot

7 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 33: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

33

QuickSelect Simulationpivot

7 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 34: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

34

QuickSelect Simulationpivot

7 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 35: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

35

QuickSelect Simulationpivot

7 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 36: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

36

QuickSelect Simulationpivot

7 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 37: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

37

QuickSelect Simulationpivot

7 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 38: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

38

QuickSelect Simulationpivot

7 1 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 39: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

39

QuickSelect Simulationpivot

7 1 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 40: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

40

QuickSelect Simulationpivot

7 1 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 41: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

41

QuickSelect Simulationpivot

7 1 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 42: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

42

QuickSelect Simulationpivot

7 1 11 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 43: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

43

QuickSelect Simulationpivot

7 1 11 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 44: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

44

QuickSelect Simulationpivot

7 1 4 11 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

k = 6

Page 45: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

45

QuickSelect Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 11 19 14 13 105

k = 6

Page 46: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

46

QuickSelect Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 10 11 19 14 13 105

k = 6

Page 47: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

47

QuickSelect Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 10 11 19 14 13 105

k = 6

Page 48: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

48

QuickSelect Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 98 10 11 19 14 13 105

k = 6

Page 49: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

49

QuickSelect Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 98 10 11 19 14 13 105

k = 6

Page 50: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

50

QuickSelect Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 16 98 10 11 19 14 13 105

k = 6

Page 51: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

51

QuickSelect Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 8 16 98 10 11 19 14 13 105

k = 6

6th smallest element is to the right of 8

4th smallest

Page 52: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

52

QuickSelect Simulation

16 98 10 11 19 14 13 105

k = 2

Page 53: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

53

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

Page 54: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

54

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

16

Page 55: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

55

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

16

Page 56: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

56

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

98 16

Page 57: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

57

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

98 16

Page 58: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

58

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 98 16

Page 59: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

59

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 98 16

Page 60: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

60

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 19 98 16

Page 61: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

61

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 19 98 16

Page 62: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

62

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 14 19 98 16

Page 63: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

63

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 14 19 98 16

Page 64: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

64

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 13 14 19 98 16

Page 65: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

65

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 13 14 19 98 16

Page 66: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

66

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 105 13 14 19 98 16

Page 67: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

67

QuickSelect Simulationpivot

16 98 10 11 19 14 13 105

k = 2

10 11 105 13 14 19 98 16

2nd smallest

return 11

Page 68: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

68

QuickSelect Pseudocode

procedure QuickSelect(A, k):pick a pivot element p uniformly at

random;for i from 1 to n:

if A[i] < p put A[i] into AL

else put A[i] into AR

if |AL| = k-1; return p; // p is the kth element

else if |AL| ≥ k-1 return QuickSelect(AL, k)

else: return QuickSelect(AR, k-1-|AL|)

Page 69: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

69

Correctness of QuickSelect

Informally:

If p is the kth smallest element we correctly return

it

O.w. the kth smallest element is either on AL or AR.

We pick the correct subarray according to # elements

< p

and update the rank k correctly

Can be made formal with an inductive proof.

Page 70: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

70

Runtime of QuickSelect

Assume we are selecting the median (rank = n/2)?

Question 1: What’s the best scenario?

First pivot is the median: O(n) runtime

Question 2: What’s the worst scenario?

We iteratively pick the max or min element as pivot

End up having n-1 iterations O(n2)

Runtime of QuickSelect is fundamentally a probability

question!

How long does QuickSelect take on average?

Or what’s the expected runtime of QuickSelect?

Page 71: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

71

Terminology Clarification Worst-Case vs Average-Case

Worst-case/Average-case refer to assumptions about

the input

Worst-case: Under any input (or the worst input).

Average-case: Under an “average” input according to

some distribution.

Our randomized algorithms analyses will be worst-case

No assumptions about the input distribution.

Given any input (or worst input) what’s the average or

expected run-time of algorithms?

Page 72: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

72

Outline For Today

1. Correctness of Huffman Codes

2. QuickSelect

3. Probability Review

4. QuickSelect Runtime Analysis

5. QuickSort

Page 73: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

73

Definition (Sample Space Ω): Set of all possible

outcomes

Ex 1: Rolling two dice

Ω = (1,1), (1, 2), …, (6, 6)

Ex 2: QuickSelect Ω, all possible sequences of pivot

picks

Ω: (kth), (nth, kth),…, (nth, (n-1)st, 1st, kth)

Each outcome i ∈Ω has a probability p(i) ≥ 0

Sample Space Ω

Page 74: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

74

Definition (Event S ⊆ Ω): a set of outcomes from Ω

Ex: Rolling a 10 or more with 2 dice

S = (4, 6), (6, 4), (5, 5), (5, 6), (6, 5), (6, 6)

Ex: Picking the kth element as pivot in at most 2 picks

S = (kth), (nth, kth), (n-1st, kth), …, (1st, kth)

The probability of each event S:

Event S⊆ Ω

Pr(rolling a 10 or more) = 6/36

Page 75: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

75

Definition (Random Variable): a fn: Ω -> (real 𝕽numbers)

Ex: X: the sum of the dices

Random Variable

Ω = (1,1), (1, 2), …, (6,

6)

2 3 12Ex: Y: run-time of QuickSelect

Ω: (kth) (nth, kth), …, (nth, n-1st, …,

kth) n ~2n ~n

2

Page 76: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

76

Definition (Indicator Random Variable): A RV X from Ω-

> 0, 1

With probability p, X=1

With probability 1-p, X=0

Indicator Random Variable

Examples Later

Page 77: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

77

Definition (Expectation E[X]): average value of X

Expectation

Value of X under

outcome i

probability of i

Equivalently:

Assuming X takes non-negative integer

values

Page 78: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

78

Expectation Examples

1. Let X be sum of 2 dices: E[X]

=2. Let Y be an indicator random variable.

Y=1 with prob. p, Y=0 with 1-p

3. Assume we have a coin that comes heads with prob.

p.

Let Z be # times we have to flip the coin to get a head

Due to independence of consequent coin flips.

Page 79: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

79

Facts About Expectation:

Facts About Expectation

Page 80: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

80

Let Z = Σ Xj

**Linearity of Expectation**

Even if Xj depend on each other (i.e, not

independent)Extremely useful when trying to understand the

average value of a complicated random variable Z!.

We express Z as a sum of simpler random variables.

Page 81: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

81

Ex: Birthday Paradox

If there are k people in a room, on average, how

many pairs of people have the same birthday?

Let Z be # pairs of people with the same birthday.

Z = 0, if no one shares birthdays

Z = 1, if exactly one pair of people have the same

birthday

… E[Z] is difficult to compute from the definition of

expectation.

Page 82: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

82

Ex: Birthday Paradox

Let X(i,j) be an indicator random variable

X(i,j) = 1 if i, j have the same birthday,

X(i,j) = 0 otherwise

Then: when k = 28, E[Z] >

1!

Page 83: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

83

Outline For Today

1. Correctness of Huffman Codes

2. QuickSelect

3. Probability Review

4. QuickSelect Runtime Analysis

5. QuickSort

Page 84: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

84

Back to Expected Runtime of QuickSelect Let Z be the runtime of QuickSelect

Run time question is equivalent to: What’s E[Z]?

Trick: Try to break Z into simpler random

variables.

Page 85: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

85

QuickSelect Execution

Termination

Recursion 1: work done = n

Recursion 2: work done = r2

Recursion 3: work done = r3

work done = 0

Recursion k: work done = rk

Recursion k+1: work done = rk+1

Total Work: Sum of work done across all

recursions

Page 86: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

86

Phases of QuickSelect

Termination

Phase 1: calls when the array size [n, 0.75n]

Phase 2: calls when the array size [0.75n, 0.752n)

Phase j-1: calls when the array size [0.75j-1n, 0.75jn)

Phase log4/3n

Page 87: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

87

Expectd Runtime In Terms of Phases

, where Xj is the work done in phase j.

Can we bound E[Xj]?

Page 88: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

88

Bounding the Work Of Each Phase Consider phase j

At each recursion during phase j, the work done ≤

(¾)j-1n

Let Yj be the # recursive calls made during phase j

Let’s try to bound the expected number of recursive calls during

phase j.

Page 89: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

89

Bounding the # Calls Of Each Phase

e1 e2 … … ek/4 … … … … … e3k/4 … … ek-1 ek

Let’s say phase j starts with (¾)j-1n ≤ k < (¾)jn

Guaranteed to exit the phase when we cut k

by ¾!

Let e1,e2, …, ek be our elements in increasing

order

Observation: A phase is guaranteed to end when p,

is between [ek/4, e3k/4], irrespective of the rank of the

item we’re searching for!if pivot is from here, phase is guaranteed to end

Page 90: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

90

Bounding the Work Of Each Phase

e1 e2 … … ek/4 … … … … … e3k/4 … … ek-1 ek

Q: What’s the probability of picking p from [ek/4,

e3k/4]?

A: 50%

Expected # picks to pick a central pivot is ≤ 2.

Therefore, expected # recursion to end phase j:

Page 91: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

91

Final Calculations

Q.E.D.

Page 92: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

92

Summary: QuickSelect’s Runtime Analysis1. Defined r.v. Z as the run-time of QuickSelect

2. Broke the executions into log4/3n phases,

according to the sizes of the arrays in the

recursions

3. Defined Xj as the runtime during phase j

4. Expressed Z as

5. Bounded Xj: Yj:*(3/4)jn, Yj is # recursions in

phase j

6. Bounded E[Yj] by 2

7. Using (5) and (6) bounded E[Z] by O(n).

Page 93: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

93

Outline For Today

1. Correctness of Huffman Codes

2. QuickSelect

3. Probability Review

4. QuickSelect Runtime Analysis

5. QuickSort

Page 94: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

94

Back To Sorting: QuickSort

Input: Given an array A of size n

Output: Elements of A in increasing order

Pick a pivot p from A uniformly at random,

Partition A into AL: those < p and AR: those > p

Sort AL and AR recursively

Output: [AL, p, AR]

Page 95: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

95

QuickSort Pseudocode

procedure QuickSort(A):if |A| = 1 return A[0]pick a pivot element p uniformly at

random;for i from 1 to n:

if A[i] < p put A[i] into AL

else put A[i] into AR

return [QuickSort(AL), p, QuickSort(AR)]

Page 96: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

96

QuickSort Simulation

105 7 13 8 14 1 19 11 4 10 98 16

Page 97: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

97

QuickSort Simulationpivot

105 7 13 8 14 1 19 11 4 10 98 16

Page 98: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

98

QuickSort Simulationpivot

105 7 13 8 14 1 19 11 4 10 98 16

Page 99: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

99

QuickSort Simulationpivot

105

105 7 13 8 14 1 19 11 4 10 98 16

Page 100: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

100

QuickSort Simulationpivot

105

105 7 13 8 14 1 19 11 4 10 98 16

Page 101: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

101

QuickSort Simulationpivot

7 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 102: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

102

QuickSort Simulationpivot

7 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 103: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

103

QuickSort Simulationpivot

7 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 104: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

104

QuickSort Simulationpivot

7 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 105: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

105

QuickSort Simulationpivot

7 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 106: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

106

QuickSort Simulationpivot

7 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 107: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

107

QuickSort Simulationpivot

7 1 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 108: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

108

QuickSort Simulationpivot

7 1 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 109: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

109

QuickSort Simulationpivot

7 1 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 110: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

110

QuickSort Simulationpivot

7 1 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 111: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

111

QuickSort Simulationpivot

7 1 11 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 112: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

112

QuickSort Simulationpivot

7 1 11 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 113: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

113

QuickSort Simulationpivot

7 1 4 11 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Page 114: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

114

QuickSort Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 11 19 14 13 105

Page 115: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

115

QuickSort Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 10 11 19 14 13 105

Page 116: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

116

QuickSort Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 10 11 19 14 13 105

Page 117: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

117

QuickSort Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 98 10 11 19 14 13 105

Page 118: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

118

QuickSort Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 98 10 11 19 14 13 105

Page 119: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

119

QuickSort Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 16 98 10 11 19 14 13 105

Page 120: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

120

QuickSort Simulation

105 7 13 8 14 1 19 11 4 10 98 16

pivot

7 1 4 8 16 98 10 11 19 14 13 105

Recurse

Recurse

Total work done at a recursive call with m elements:,

m-1 comparisons with the pivot + m copies = O(m).

** Run-time of QuickSort = O(# comparisons made)**

Page 121: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

121

Analysis Roadmap

1. Let Z be the runtime of QuickSort, i.e. #

comparisons made by Quicksort

2. Let X(i, j) be the # times (i, j) gets compared.

3. Express Z as:

4. Solve E[X(i, j)] and sum them up to solve E[Z].

Page 122: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

122

Counting # Comparisons Made By QuickSort Let X(i, j) be the # times (i, j) gets compared.

Fix a particular recursive call:105 7 13 8 14 1 19 11 4 10 98 16

7 1 4 8 16 98 10 11 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Observation 1: All comparisons are made against the

pivot!Observation 2: The pivot will never be compared to

anything else!

Page 123: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

123

Counting # (i, j) Comparisons

105 7 13 8 14 1 19 11 4 10 98 16

7 1 4 8 16 98 10 11 19 14 13 105

105 7 13 8 14 1 19 11 4 10 98 16

Condition for (i, j) comparison: i, j are compared only

when

1. They are in the some recursive call together

2. One of them is a pivot

And they can never be compared again.

Therefore X(i, j) can only be 0 or 1 (i.e., is an indicator

r.v.)

Page 124: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

124

E[X(i,j)]

Recall: Expected value of indicator R.V.: E[X] =

Pr(X=1).

Question: What’s Pr(X(i,j) = 1)?

Page 125: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

125

(i, j) Comparison Simulation (1)

e1 e2

… … … ei … … ej … … em-

1

em

Recursive Call 1

… ei …… ej … … em-

1

em

No Comparison

Recursive Call 2

pivot

pivot

… ei …… ej …

pivot

… ei … ej …

No Comparison

Recursive Call 3

Cannot be compared again! Total 0 comparisions!

i and j are at different recursions

Page 126: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

126

(i, j) Comparison Simulation (2)

e1 e2

… … … ei … … ej … … em-

1

em

Recursive Call 1

No Comparison

Recursive Call 2

pivot

pivot

… … ei … … ej …

pivot

No Comparison

Recursive Call 3

1 (ei, ej) comparison! (and only 1)

e1 e2

… … … ei … … ej …

Observation: (ei, ej) is compared only if ei or ej is the first pivot to be picked among [ei, ej] block.

Page 127: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

127

Pr(X(i,j) = 1)

Pr(X(i,j) = 1) = probability that ei or ej is the

first element to be picked amongst [ei, ei+1,

…, ej]:

Page 128: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

128

Final Calculations

For fixed i: 1/2 + 1/3 + ... + 1/(n-i+1)

≤ ln(n)

Q.E.D

Page 129: Correctness of Huffman Codes Introduction To Randomized Algorithms QuickSelect & QuickSort Monday, July 14th 1

129

On Monday: Min Cut and Max Cut