greedy algorithms csc 4520/6520

55
Greedy Algorithms CSc 4520/6520 Fall 2013

Upload: nhu

Post on 06-Feb-2016

78 views

Category:

Documents


1 download

DESCRIPTION

Greedy Algorithms CSc 4520/6520. Fall 2013. Problems Considered. Activity Selection Problem Knapsack Problem 0 – 1 Knapsack Fractional Knapsack Huffman Codes. Greedy Algorithms. 2 techniques for solving optimization problems: 1. Dynamic Programming - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Greedy Algorithms CSc  4520/6520

Greedy AlgorithmsCSc 4520/6520

Fall 2013

Page 2: Greedy Algorithms CSc  4520/6520

Problems Considered

• Activity Selection Problem• Knapsack Problem

–0 – 1 Knapsack–Fractional Knapsack

• Huffman Codes

Page 3: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 3http://www.cs.cityu.edu.hk/~helena

Greedy Algorithms

2 techniques for solving optimization problems:

1. Dynamic Programming

2. Greedy Algorithms (“Greedy Strategy”)

Greedy Approach can solve these problems:

For the optimization problems:

Dynamic Programming can solve these problems:

For some optimization problems, Dynamic Programming is “overkill”Greedy Strategy is simpler and more efficient.

Page 4: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 4http://www.cs.cityu.edu.hk/~helena

Activity-Selection Problem

For a set of proposed activities that wish to use a lecture hall, select a maximum-size subset of “compatible activities”.

Set of activities: S={a1,a2,…an}

Duration of activity ai: [start_timei, finish_timei)

Activities sorted in increasing order of finish time:

i 1 2 3 4 56 7 8 9 10 11

start_timei 1 3 0 5 3 56 8 8 2 12

finish_timei 4 5 6 7 8 9 1011 12 13 14

Page 5: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 5http://www.cs.cityu.edu.hk/~helena

Activity-Selection Problem

i 1 2 3 4 56 7 8 9 10 11

start_timei 1 3 0 5 3 5 68 8 2 12

finish_timei 4 5 6 7 8 910 11 12 13 14

Compatible activities:{a3, a9, a11},{a1,a4,a8,a11},{a2,a4,a9,a11}

0123456789

1011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11

Page 6: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 6http://www.cs.cityu.edu.hk/~helena

Activity-Selection ProblemDynamic Programming Solution (Step 1)

Step 1. Characterize the structure of an optimal solution.

S: i 1 2 3 45 6 7 8 9 10 11(=n)start_timei 1 3 0 5 3 56 8 8 2 12finish_timei 4 5 6 7 8 910 11 12 13 14

eg

Definition:

Sij={akS: finish_timeistart_timek<finish_timek start_timej}

0123456789

1011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11Let Si,j be the set of activities that

start after ai finishes and

finish before aj starts.

eg. S2,11=

0123456789

1011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11

0123456789

1011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11

0123456789

1011121314

time a1 a2 a3

okok

a4 a5

okokokok

a6

okokokok

a7

okokok

a8

okokokok

a9 a10 a11

Page 7: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 7http://www.cs.cityu.edu.hk/~helena

Activity-Selection ProblemDynamic Programming Solution (Step 1)

S: i 1 2 3 45 6 7 8 9 10 11(=n)start_timei 1 3 0 5 3 56 8 8 2 12finish_timei 4 5 6 7 8 910 11 12 13 14

Add fictitious activities: a0 and an+1:

S: i 0 1 2 3 45 6 7 8 9 10 11 12start_timei 1 3 0 5 3 56 8 8 2 12 finish_timei 0 4 5 6 7 8 910 11 12 13 14

ie. S0,n+1

={a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11} = S

Note: If i>=j then Si,j=Ø

0123456789

1011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11a0 a12

Page 8: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 8http://www.cs.cityu.edu.hk/~helena

Substructure:

Activity-Selection ProblemDynamic Programming Solution (Step 1)

Suppose a solution to Si,j includes activity ak,

then,2 subproblems are generated: Si,k, Sk,j

The problem:

For a set of proposed activities that wish to use a lecture hall, select a maximum-size subset of “compatible activities

Select a maximum-size subset of compatible activities from S0,n+1.

=

0123456789

1011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a110123456789

1011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11

The maximum-size subset Ai,j

of compatible activities is:

Ai,j=Ai,k U {ak} U Ak,jSuppose a solution to S0,n+1 contains a7, then, 2 subproblems are generated: S0,7 and S7,n+1

Page 9: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 9http://www.cs.cityu.edu.hk/~helena

Activity-Selection ProblemDynamic Programming Solution (Step 2)

Step 2. Recursively define an optimal solution

Let c[i,j] = number of activities in a maximum-size subset of compatible activities in Si,j.

If i>=j, then Si,j=Ø, ie. c[i,j]=0.

0 if Si,j=ØMaxi<k<j {c[i,k] + c[k,j] + 1} if Si,jØ

c(i,j) =

Step 3. Compute the value of an optimal solution in a bottom-up fashion

Step 4. Construct an optimal solution from computed information.

Page 10: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 10http://www.cs.cityu.edu.hk/~helena

Activity-Selection ProblemGreedy Strategy Solution

Consider any nonempty subproblem Si,j, and let am be the activity in Si,j with the earliest finish time.

0123456789

1011121314

time a1 a2 a3

okok

a4 a5

okokokok

a6

okokokok

a7

okokok

a8

okokokok

a9 a10 a11

eg. S2,11={a4,a6,a7,a8,a9}

Among {a4,a6,a7,a8,a9}, a4 will finish earliest

1. A4 is used in the solution

2. After choosing A4, there are 2 subproblems: S2,4 and S4,11. But S2,4 is empty. Only S4,11

remains as a subproblem.

Then,

1. Am is used in some maximum-size subset of compatible activities of Si,j.

2. The subproblem Si,m is empty, so that choosing am leaves the subproblem Sm,j as the only one that may be nonempty.

0 if Si,j=Ø

Maxi<k<j {c[i,k]+c[k,j]+1} if Si,jØc(i,j) =

Page 11: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 11http://www.cs.cityu.edu.hk/~helena

Activity-Selection ProblemGreedy Strategy Solution

That is,

To solve S0,12, we select a1 that will finish earliest, and solve for S1,12.

To solve S1,12, we select a4 that will finish earliest, and solve for S4,12.

To solve S4,12, we select a8 that will finish earliest, and solve for S8,12.

…Greedy Choices (Locally optimal choice)

To leave as much opportunity as possible for the remaining activities to be scheduled.

Solve the problem in a top-down fashion

Hence, to solve the Si,j:

1. Choose the activity am with the earliest finish time.

2. Solution of Si,j = {am} U Solution of subproblem Sm,j

01234567891011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11a0 a12

Page 12: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 12http://www.cs.cityu.edu.hk/~helena

Activity-Selection ProblemGreedy Strategy Solution

Recursive-Activity-Selector(i,j)1 m = i+1

// Find first activity in Si,j

2 while m < j and start_timem < finish_timei

3 do m = m + 14 if m < j5 then return {am} U Recursive-Activity-Selector(m,j)

6 else return ØOrder of calls:Recursive-Activity-Selector(0,12)Recursive-Activity-Selector(1,12)Recursive-Activity-Selector(4,12)Recursive-Activity-Selector(8,12)Recursive-Activity-Selector(11,12)

01234567891011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11a0 a12

m=2

Okay

m=3

Okaym=4

break the loop

Ø

{11}{8,11}

{4,8,11}{1,4,8,11}

Page 13: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 13http://www.cs.cityu.edu.hk/~helena

Activity-Selection ProblemGreedy Strategy Solution

Iterative-Activity-Selector()1 Answer = {a1}2 last_selected=13 for m = 2 to n4 if

start_timem>=finish_timelast_selected

5 then Answer = Answer U {am}

6 last_selected = m7 return Answer

01234567891011121314

time a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11a0 a12

Page 14: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 14http://www.cs.cityu.edu.hk/~helena

Activity-Selection ProblemGreedy Strategy Solution

For both Recursive-Activity-Selector and

Iterative-Activity-Selector,

Running times are (n)

Reason: each am are examined once.

Page 15: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 15http://www.cs.cityu.edu.hk/~helena

Greedy Algorithm Design

Steps of Greedy Algorithm Design:

1. Formulate the optimization problem in the form: we make a choice and we are left with one subproblem to solve.

2. Show that the greedy choice can lead to an optimal solution, so that the greedy choice is always safe.

3. Demonstrate that an optimal solution to original problem = greedy choice + an optimal solution to the subproblem

Optimal Substructure

Property

Greedy-Choice

Property

A good clue that that a greedy

strategy will solve the problem.

Page 16: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 16http://www.cs.cityu.edu.hk/~helena

Greedy Algorithm Design

Comparison:

Dynamic Programming Greedy Algorithms

At each step, the choice is determined based on solutions of subproblems.

At each step, we quickly make a choice that currently looks best. --A local optimal (greedy) choice.

Bottom-up approach Top-down approach

Sub-problems are solved first. Greedy choice can be made first before solving further sub-problems.

Can be slower, more complex Usually faster, simpler

Page 17: Greedy Algorithms CSc  4520/6520

Greedy Algorithms

Similar to dynamic programming, but simpler approach Also used for optimization problems

Idea: When we have a choice to make, make the one

that looks best right now Make a locally optimal choice in hope of getting a globally optimal

solution

Greedy algorithms don’t always yield an optimal solution

Makes the choice that looks best at the moment in order

to get optimal solution.

Page 18: Greedy Algorithms CSc  4520/6520

Fractional Knapsack Problem

Knapsack capacity: W

There are n items: the i-th item has value vi and weight

wi

Goal:

find xi such that for all 0 xi 1, i = 1, 2, .., n

wixi W and

xivi is maximum

Page 19: Greedy Algorithms CSc  4520/6520

50

Fractional Knapsack - Example

E.g.:

1020

30

50

Item 1

Item 2

Item 3

$60 $100 $120

10

20

$60

$100

+

$240

$6/pound $5/pound $4/pound

20---30

$80

+

Page 20: Greedy Algorithms CSc  4520/6520

Fractional Knapsack Problem Greedy strategy 1:

Pick the item with the maximum value E.g.:

W = 1 w1 = 100, v1 = 2 w2 = 1, v2 = 1 Taking from the item with the maximum value:

Total value taken = v1/w1 = 2/100 Smaller than what the thief can take if choosing the

other item

Total value (choose item 2) = v2/w2 = 1

Page 21: Greedy Algorithms CSc  4520/6520

Fractional Knapsack ProblemGreedy strategy 2:

Pick the item with the maximum value per pound vi/wi

If the supply of that element is exhausted and the thief can

carry more: take as much as possible from the item with the

next greatest value per pound

It is good to order items based on their value per pound

n

n

w

v

w

v

w

v ...

2

2

1

1

Page 22: Greedy Algorithms CSc  4520/6520

Fractional Knapsack ProblemAlg.: Fractional-Knapsack (W, v[n], w[n])

1. While w > 0 and as long as there are items remaining

2. pick item with maximum vi/wi

3. xi min (1, w/wi)

4. remove item i from list

5. w w – xiwi

w – the amount of space remaining in the knapsack (w = W)

Running time: (n) if items already ordered; else (nlgn)

Page 23: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 23http://www.cs.cityu.edu.hk/~helena

Huffman Codes

Huffman Codes • For compressing data (sequence of characters)• Widely used • Very efficient (saving 20-90%)• Use a table to keep frequencies of occurrence

of characters.• Output binary string.

“Today’s weather is nice”

“001 0110 0 0 100 1000 1110”

Page 24: Greedy Algorithms CSc  4520/6520

Huffman Code Problem Huffman’s algorithm achieves data

compression by finding the best variable length binary encoding scheme for the symbols that occur in the file to be compressed.

Page 25: Greedy Algorithms CSc  4520/6520

Huffman Code Problem

The more frequent a symbol occurs, the shorter should be the Huffman binary word representing it.

The Huffman code is a prefix-free code. No prefix of a code word is equal to another

codeword.

Page 26: Greedy Algorithms CSc  4520/6520

Overview Huffman codes: compressing data (savings of 20% to

90%) Huffman’s greedy algorithm uses a table of the

frequencies of occurrence of each character to build up an optimal way of representing each character as a binary string

C: Alphabet

Page 27: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 27http://www.cs.cityu.edu.hk/~helena

Huffman CodesFrequency Fixed-length

Variable-lengthcodeword codeword

‘a’ 45000 000 0‘b’ 13000 001 101‘c’ 12000 010 100‘d’ 16000 011 111‘e’ 9000 100 1101‘f’ 5000 101 1100

Example:

A file of 100,000 characters. Containing only ‘a’ to ‘e’

300,000 bits1*45000 + 3*13000 + 3*12000 +

3*16000 + 4*9000 + 4*5000 = 224,000 bits

1*45000 + 3*13000 + 3*12000 + 3*16000 + 4*9000 + 4*5000

= 222424,000 bits

eg. “abc” = “000001010” eg. “abc” = “0101100”

300,000224,000

Page 28: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 28http://www.cs.cityu.edu.hk/~helena

01

a:45 b:13 c:12 d:16 e:9 f:5

0

0

0

0

0

1

1

1 1

Huffman CodesThe coding schemes can be represented by trees:

Frequency Fixed-length(in thousands) codeword

‘a’ 45 000‘b’ 13 001‘c’ 12 010‘d’ 16 011‘e’ 9 100‘f’ 5 101

100

86 14

1401

58 28

a:45 b:13 c:12 d:16 e:9 f:5

0

0

0

0

0

1

1

1 1

Frequency Variable-length

(in thousands) codeword

‘a’ 45 0‘b’ 13 101‘c’ 12 100‘d’ 16 111‘e’ 9 1101‘f’ 5 1100100

55

0125 30

0

0

0

1

1

1

a:45

14

f:5 e:90 1

d:16b:1301

a:45 b:13 c:12 d:16 e:9 f:5

0

0

0

0

0

1

1

1 114

0158 28

a:45 b:13 c:12 d:16 e:9 f:5

0

0

0

0

0

1

1

1 1

86 14

1401

58 28

a:45 b:13 c:12 d:16 e:9 f:5

0

0

0

0

0

1

1

1 1

Not a fullbinary tree

A full binary treeevery nonleaf node

has 2 children

A file of 100,000 characters.

c:12

Page 29: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 29http://www.cs.cityu.edu.hk/~helena

Huffman Codes Frequency Codeword

‘a’ 45000 0‘b’ 13000 101‘c’ 12000 100‘d’ 16000 111‘e’ 9000 1101‘f’ 5000 1100

100

55

0125 30

0

0

0

1

1

1

a:45

14

f::5 e:90 1

d:16c:12 b:13

To find an optimal code for a file:

1. The coding must be unambiguous.Consider codes in which no codeword is also a prefix of other codeword. => Prefix Codes

Prefix Codes are unambiguous.Once the codewords are decided, it is easy to compress (encode) and decompress (decode).

2. File size must be smallest.=> Can be represented by a full binary tree.=> Usually less frequent characters are at bottom

Let C be the alphabet (eg. C={‘a’,’b’,’c’,’d’,’e’,’f’})For each character c, no. of bits to encode all c’s occurrences = freqc*depthc

File size B(T) = cCfreqc*depthcEg. “abc” is coded as “0101100”

Page 30: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 30http://www.cs.cityu.edu.hk/~helena

Huffman Codes

Huffman code (1952) was invented to solve it.A Greedy Approach.

Q: A min-priority queue f:5 e:9 c:12 b:13 d:16 a:45

100

55

25 30

a:45

14

f::5 e:9

d:16c:12 b:13

c:12 b:13 d:16 a:4514

f:5 e:9

d:16

a:45

14

25

c:12 b:13

30

f:5 e:9

a:45

d:1614

25

c:12 b:13

30

55

f:5 e:9

d:16 a:4514 25

c:12 b:13f:5 e:9

How do we find the optimal prefix code?

Page 31: Greedy Algorithms CSc  4520/6520

CS3381 Des & Anal of Alg (2001-2002 SemA)City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 31http://www.cs.cityu.edu.hk/~helena

Huffman Codes

HUFFMAN(C)1 Build Q from C2 For i = 1 to |C|-13 Allocate a new node z4 z.left = x = EXTRACT_MIN(Q)5 z.right = y = EXTRACT_MIN(Q)6 z.freq = x.freq + y.freq7 Insert z into Q in correct position.8 Return EXTRACT_MIN(Q)

Q: A min-priority queue f:5 e:9 c:12 b:13 d:16 a:45

c:12 b:13 d:16 a:4514

f:5 e:9

d:16 a:4514 25

c:12 b:13f:5 e:9

….

If Q is implemented as a binary min-heap, “Build Q from C” is O(n)“EXTRACT_MIN(Q)” is O(lg n)“Insert z into Q” is O(lg n)

Huffman(C) is O(n lg n)

How is it “greedy”?

Page 32: Greedy Algorithms CSc  4520/6520

Cost of a Tree T

For each character c in the alphabet C let f(c) be the frequency of c in the file let dT(c) be the depth of c in the tree

It is also the length of the codeword. Why?

Let B(T) be the number of bits required to encode the file (called the cost of T)

B(T ) f (c)dT (c)

cC

Page 33: Greedy Algorithms CSc  4520/6520

Huffman Code ProblemIn the pseudocode that follows: we assume that C is a set of n characters and that each

character c C is an object with a defined frequency f [c]. The algorithm builds the tree T corresponding to the

optimal code A min-priority queue Q, is used to identify the two least-

frequent objects to merge together. The result of the merger of two objects is a new object

whose frequency is the sum of the frequencies of the two

objects that were merged.

Page 34: Greedy Algorithms CSc  4520/6520

Running time of Huffman's algorithm The running time of Huffman's algorithm

assumes that Q is implemented as a binary min-heap.

For a set C of n characters, the initialization of Q in line 2 can be performed in O (n) time using the BUILD-MINHEAP

The for loop in lines 3-8 is executed exactly n - 1 times, and since each heap operation requires time O (lg n), the loop contributes O (n lg n) to the running time. Thus, the total running time of HUFFMAN on a set of n characters is O (n lg n).

Page 35: Greedy Algorithms CSc  4520/6520

Prefix Code Prefix(-free) code: no codeword is also a prefix of some other

codewords (Un-ambiguous) An optimal data compression achievable by a character code can

always be achieved with a prefix code Simplify the encoding (compression) and decoding

Encoding: abc 0 . 101. 100 = 0101100 Decoding: 001011101 = 0. 0. 101. 1101 aabe

Use binary tree to represent prefix codes for easy decoding An optimal code is always represented by a full binary tree, in which

every non-leaf node has two children |C| leaves and |C|-1 internal nodes Cost:

Cc

T cdcfTB )()()(

Frequency of c

Depth of c (length of the codeword)

Page 36: Greedy Algorithms CSc  4520/6520

Huffman Code Reduce size of data by 20%-90% in general

If no characters occur more frequently than others, then no advantage over ASCII

Encoding: Given the characters and their frequencies, perform the

algorithm and generate a code. Write the characters using the code

Decoding: Given the Huffman tree, figure out what each character

is (possible because of prefix property)

Page 37: Greedy Algorithms CSc  4520/6520

How to Decode?

37

With fixed length code, easy: break up into 3's, for instance

For variable length code, ensure that no character's code is the prefix of another no ambiguity

101111110100b d e a a

Page 38: Greedy Algorithms CSc  4520/6520

Huffman Algorithm correctness:Need to prove two things:

Greedy Choice Property: There exists a minimum cost prefix

tree where the two smallest frequency characters are indeed siblings with the longest path from root.

This means that the greedy choice does not hurt finding the optimum.

Page 39: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

Optimal Substructure Property: An optimal solution to the problem once

we choose the two least frequent elements and combine them to produce a smaller problem, is indeed a solution to the problem when the two elements are added.

Page 40: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

There exists a minimum cost tree where the minimum frequency elements are longest path siblings:

Assume that is not the situation.Then there are two elements in the

longest path.

Say a,b are the elements with smallest frequency and x,y the elements in the longest path.

Page 41: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

x y

a

dy

da

We knowabout depthandfrequency: da ≤ dy

fa ≤ fy

CT

Page 42: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

x y

a

dy

da

We also knowabout code tree CT: ∑fσdσ σ

is smallestpossible.

CT

Now exchange a and y.

Page 43: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

x a

y

dy

da

CT’

(da ≤ dy, fa ≤ fy

Thereforefada ≥fyda andfydy ≥fady )

Cost(CT) = ∑fσdσ

= σ

∑fσdσ+fada+fydy≥σ≠a,y

∑fσdσ+fyda+fady=σ≠a,y cost(CT’)

Page 44: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

x a

b

dx

db

CT

Now do the same thing for b and x

Page 45: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

b a

x

dx

db

CT”

And get an optimal code tree where a and b are sibling with the longest paths

Page 46: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

Optimal substructure property:Let a,b be the symbols with the smallest frequency.Let x be a new symbol whose frequency isfx =fa +fb. Delete characters a and b, and find the optimal code tree CT for the reduced alphabet.

Then CT’ = CT U {a,b} is an optimal tree for the original alphabet.

Page 47: Greedy Algorithms CSc  4520/6520

Algorithm correctness:CT

x

a b

CT’

x

fx = fa + fb

Page 48: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

cost(CT’)=∑fσd’σ = ∑fσd’σ + fad’a + fbd’b= σ σ≠a,b

∑fσd’σ + fa(dx+1) + fb (dx+1) =σ≠a,b

∑fσd’σ+(fa + fb)(dx+1)=σ≠a,b

∑fσdσ+fx(dx+1)+fx = cost(CT) + fxσ≠a,b

Page 49: Greedy Algorithms CSc  4520/6520

Algorithm correctness:CT

x

a b

CT’

x

fx = fa + fb

cost(CT)+fx = cost(CT’)

Page 50: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

Assume CT’ is not optimal.

By the previous lemma there is a tree CT”that is optimal, and where a and b are siblings. So

cost(CT”) < cost(CT’)

Page 51: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

CT’’’

x

a b

CT”

x

fx = fa + fb

By a similar argument:cost(CT’’’)+fx = cost(CT”)

Consider

Page 52: Greedy Algorithms CSc  4520/6520

Algorithm correctness:

We get:

cost(CT’’’) = cost(CT”) – fx < cost(CT’) – fx = cost(CT)

and this contradicts the minimality of cost(CT).

Page 53: Greedy Algorithms CSc  4520/6520

Application on Huffman code Both the .mp3 and .jpg file formats use

Huffman coding at one stage of the compression

Page 54: Greedy Algorithms CSc  4520/6520

Dynamic Programming vs. Greedy Algorithms Dynamic programming

We make a choice at each step The choice depends on solutions to subproblems Bottom up solution, from smaller to larger subproblems

Greedy algorithm Make the greedy choice and THEN Solve the subproblem arising after the choice is made The choice we make may depend on previous choices,

but not on solutions to subproblems Top down solution, problems decrease in size

Page 55: Greedy Algorithms CSc  4520/6520

Looking Ahead

• More greedy algorithms to come when considering graph algorithms

– Minimum spanning tree• Kruskal• Prim

– Dijkstra’s algorithm for shortest paths from single source