analysis & design of algorithms (csce 321)

47
Prof. Amr Goneid, AUC 1 Analysis & Design of Analysis & Design of Algorithms Algorithms (CSCE 321) (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms

Upload: lucine

Post on 20-Jan-2016

115 views

Category:

Documents


0 download

DESCRIPTION

Analysis & Design of Algorithms (CSCE 321). Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms. Greedy Algorithms. Greedy Algorithms. Microsoft Interview From: http://www.cs.pitt.edu/~kirk/cs1510/. Greedy Algorithms. Greedy Algorithms The General Method - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 1

Analysis & Design of Analysis & Design of AlgorithmsAlgorithms(CSCE 321)(CSCE 321)

Prof. Amr GoneidDepartment of Computer Science, AUC

Part 8. Greedy Algorithms

Page 2: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 2

Greedy AlgorithmsGreedy Algorithms

Page 3: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 3

Greedy AlgorithmsGreedy Algorithms

Microsoft Interview

From: http://www.cs.pitt.edu/~kirk/cs1510/

Page 4: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 4

Greedy AlgorithmsGreedy Algorithms

Greedy Algorithms The General Method Continuous Knapsack Problem Optimal Merge Patterns

Page 5: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 5

1. Greedy Algorithms1. Greedy Algorithms

Methodology: Start with a solution to a small sub-

problem Build up to the whole problem Make choices that look good in the

short term but not necessarily in the long term

Page 6: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 6

Greedy AlgorithmsGreedy Algorithms

Disadvantages: They do not always work. Short term choices may be disastrous on the

long term. Correctness is hard to proveAdvantages: When they work, they work fast Simple and easy to implement

Page 7: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 7

2. The General method2. The General methodLet a[ ] be an array of elements that may contribute to a solution. Let S be a solution,

Greedy (a[ ],n) {

S = empty;for each element (i) from a[ ], i = 1:n {

x = Select (a,i);if (Feasible(S,x)) S = Union(S,x);

}return S;

}

Page 8: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 8

The General method (continued)The General method (continued) Select:

Selects an element from a[ ] and removes it.Selection is optimized to satisfy an objective function.

Feasible:True if selected value can be included in the solution vector, False otherwise.

Union:Combines value with solution and updates objective function.

Page 9: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 9

3. Continuous Knapsack Problem3. Continuous Knapsack Problem

Page 10: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 10

Continuous Knapsack ProblemContinuous Knapsack Problem

Environment Object (i):

Total Weight wi

Total Profit pi

Fraction of object (i) is continuous (0 =< xi <= 1)

A Number of Objects

1 =< i <= n A knapsack

Capacity m

2n

1

m

Page 11: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 11

The problemThe problem Problem Statement:

For n objects with weights wi and profits pi, obtain the set of fractions of objects x i which will maximize the total profit without exceeding a total weight m.

Formally:

Obtain the set X = (x1 , x2 , … , xn) that will maximize 1 i n pi xi subject to the constraints:

1 i n wi xi m , 0 xi 1 , 1 i n

Page 12: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 12

Optimal SolutionOptimal Solution

Feasible Solution:

by satisfying constraints. Optimal Solution:

Feasible solution and maximizing profit. Lemma 1:

If 1 i n wi = m then xi = 1 is optimal.

Lemma 2:

An optimal solution will give 1 i n wi xi = m

Page 13: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 13

Greedy AlgorithmGreedy Algorithm

To maximize profit, choose highest p first.

Also choose highest x , i.e., smallest w first.

In other words, let us define the “value” of an object (i) to be the ratio vi = pi/wi and so we choose first the object with the highest vi value.

Page 14: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 14

AlgorithmAlgorithm

GreedyKnapsack ( p[ ] , w[ ] , m , n ,x[ ] ){

insert indices (i) of items in a maximum heap on value vi = pi / wi ;

Zero the vector x; Rem = m ;For k = 1..n{ remove top of heap to get index (i); if (w[i] > Rem) then break; x[i] = 1.0 ; Rem = Rem – w[i] ;}if (k < = n ) x[i] = Rem / w[i] ;

}// T(n) = O(n log n)

Page 15: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 15

ExampleExample n = 3 objects, m = 20 P = (25 , 24 , 15) , W = (18 , 15 , 10),

V = (1.39 , 1.6 ,1.5) Objects in decreasing order of V are {2 , 3 , 1} Set X = {0 ,0 ,0} and Rem = m = 20 K = 1, Choose object i = 2:

w2 < Rem, Set x2 = 1, w2 x2 = 15 , Rem = 5 K = 2, Choose object i = 3:

w3 > Rem, break; K < n , x3 = Rem / w3 = 0.5 Optimal solution is X = (0 , 1.0 , 0.5) , Total profit is 1 i n pi xi = 31.5 Total weight is 1 i n wi xi = m = 20

Page 16: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 16

4. Optimal Merge Patterns4. Optimal Merge Patterns(a) Definitions(a) Definitions Binary Merge Tree:

A binary tree with external nodes representing entities and internal nodes representing merges of these entities.

Optimal Binary Merge Tree:

The sum of paths from root to external nodes is optimal (e.g. minimum). Assuming that the node (i) contributes to the cost by pi and the path from root to such node has length Li, then optimality requires a pattern that minimizes

i

n

iiLpL

1

Page 17: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 17

Optimal Binary Merge TreeOptimal Binary Merge Tree

If the items {A,B,C} contribute to the merge cost by PA , PB , PC, respectively, then the following 3 different patterns will cost:

P1= 2(PA+PB)+PC P2 = PA+2(PB+PC) P3 = 2PA+PB+2PC

Which of these merge patterns is optimal?

A B

AB C

ABC

A

B

BC

C

ABC

B

A

AC

C

ABC

Page 18: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 18

(b) Optimal Merging of Lists(b) Optimal Merging of Lists

Lists {A,B,C} have lengths 30,25,10, respectively. The cost of merging two lists of lengths n,m is n+m. The following 3 different merge patterns will cost:

P1= 2(30+25)+10 = 120 P2 = 30+2(25+10) = 100 P3 = 25+2(30+10) = 105

P2 is optimal so that the merge order is {{B,C},A}.

A B

AB C

ABC

A

B

BC

C

ABC

B

A

AC

C

ABC

Page 19: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 19

The Greedy MethodThe Greedy Method

Insert lists and their lengths in a minimum heap of lengths. Repeat

Remove the two lowest length lists (pi ,pj) from heap. Merge lists with lengths (pi,pj) to form a new list with length pij = pi+ pj

Insert pij and its symbols into the heap

until all symbols are merged into one final list

C 10

B 25 A 30

A 30 BC 35 BCA 65

Page 20: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 20

The Greedy MethodThe Greedy Method

Notice that both Lists (B : 25 elements) and (C : 10 elements) have been merged (moved) twice

List (A : 30 elements) has been merged (moved) only once.

Hence the total number of element moves is 100. This is optimal among the other merge patterns.

Page 21: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 21

(c) Huffman Coding(c) Huffman CodingTerminologyTerminology Symbol:

A one-to-one representation of a single entity. Alphabet:

A finite set of symbols. Message:

A sequence of symbols. Encoding:

Translating symbols to a string of bits. Decoding:

The reverse.

Page 22: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 22

Encoding:a 00b 01c 10d 11

Decoding:0110001100b c a d a

This is fixed length coding

Example: Coding Tree for 4-Symbol Example: Coding Tree for 4-Symbol Alphabet (a,b,c,d)Alphabet (a,b,c,d)

abcd

ab cd

a b c d

0

0 1

1

0 1

Page 23: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 23

Coding Efficiency & RedundancyCoding Efficiency & Redundancy

Li =Length of path from root to symbol (i) = no. of bits representing that symbol.

Pi = probability of occurrence of symbol (i) in message.

n = size of alphabet. < L > = Average Symbol Length = 1 i n Pi Li

bits/symbol (bps) For fixed length coding, Li = L = constant, < L > = L

(bps) Is this optimal (minimum) ? Not necessarily.

Page 24: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 24

Coding Efficiency & RedundancyCoding Efficiency & Redundancy

The absolute minimum < L > in a message is called the Entropy.

The concept of entropy as a measure of the average content of information in a message has been introduced by Claude Shannon (1948).

Page 25: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 25

Coding Efficiency & RedundancyCoding Efficiency & Redundancy

Shannon's entropy represents an absolute limit on the best possible lossless compression of any communication. It is computed as:

)(1

loglog11

bpsP

PPPHi

n

iii

n

ii

Page 26: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 26

Coding Efficiency & RedundancyCoding Efficiency & Redundancy

Coding Efficiency: = H / < L > 0 1 Coding Redundancy: R = 1 - 0 R 1

H

Actual <L>

Optimal <L>

Perfect <L>

Page 27: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 27

Example: Fixed Length CodingExample: Fixed Length Coding

4- Symbol Alphabet (a,b,c,d). All symbols have the same length L = 2 bits

Message : abbcaada

< L > = 2 (bps)

Symbol (i) pi -log pi -pi log pi code Li

a 0.5 1 0.5 00 2

b 0.25 2 0.5 01 2

c 0.125 3 0.375 10 2

d 0.125 3 0.375 11 2

H = 1.75

Page 28: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 28

ExampleExample Entropy

H = 0.5 + 0.5 + 0.375 + 0.375 = 1.75 (bps),

Coding Efficiency

= H / < L > = 1.75 / 2 = 0.875, Coding Redundancy

R = 1 – 0.875 = 0.125 This is not optimal

Page 29: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 29

ResultResult Fixed length coding is optimal (perfect) only when all

symbol probabilities are equal.

To prove this:

With n = 2m symbols, L = m bits and <L> = m (bps).

If all probabilities are equal,

1

log1

log

log,21

11

L

HHence

mpn

ppH

mpn

p

n

ii

n

iii

im

i

Page 30: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 30

Variable Length CodingVariable Length Coding ((Huffman Coding)Huffman Coding)

The problem: Given a set of symbols and their

probabilities Find a set of binary codewords

that minimize the average length of the symbols

Page 31: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 31

Variable Length CodingVariable Length Coding ((Huffman Coding)Huffman Coding)Formally: Input: A message M(A,P) with

a symbol alphabet A = {a1,a2,…,an} of size (n)

a set of probabilities for the symbols P = {p1,p2,….pn} Output: A set of binary codewords C = {c1,c2,….cn}

with bit lengths L = {L1,L2,….Ln} Condition:

i

n

iiLpL

1

Minimize

Page 32: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 32

Variable Length CodingVariable Length Coding ((Huffman Coding)Huffman Coding) To achieve optimality, we use optimal

binary merge trees to code symbols of unequal probabilities.

Huffman Coding: More frequent symbols occur nearer to the root ( shorter code lengths), less frequent symbols occur at deeper levels (longer code lengths).

Page 33: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 33

The Greedy MethodThe Greedy Method

Store each symbol in a parentless node of a binary tree. Insert symbols and their probabilities in a minimum heapof probabilities. Repeat

Remove lowest two probabilities (pi ,pj) from heap. Merge symbols with (pi,pj) to form a new symbol (aiaj) with

probability pij = pi+ pj

Store symbol (aiaj) in a parentless node with two children ai and aj

Insert pij and its symbols into the heap

until all symbols are merged into one final alphabet (root) Trace path from root to each leaf (symbol) to form the bit string for

that symbol. Concatenate “0” for a left branch, and “1” for a right branch.

Page 34: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 34

Example (1):Example (1):

4- Symbol Alphabet A = {a, b, c, d} of size (4). Message M(A,P) : abbcaada, P = {0.5, 0.25, 0.125, 0.125} H = 1.75

Symbol (i) pi -log pi -pi log pi

a 0.5 1 0.5

b 0.25 2 0.5

c 0.125 3 0.375

d 0.125 3 0.375

Page 35: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 35

Building The Optimal Merge TableBuilding The Optimal Merge Table

si pi si pi si pi si pi

d 0.125

c 0.125 cd 0.25

b 0.25 b 0.25 bcd 0.5

a 0.5 a 0.5 a 0.5 abcd 1.0

Page 36: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 36

Optimal Merge Tree for Example(1)Optimal Merge Tree for Example(1)Example:

a (50%), b (25%), c (12.5%), d (12.5%)

a b c d

Page 37: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 37

Optimal Merge Tree for Example(1)Optimal Merge Tree for Example(1)Example:

a (50%), b (25%), c (12.5%), d (12.5%)

cd

a b c d

0 1

Page 38: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 38

Optimal Merge Tree for Example(1)Optimal Merge Tree for Example(1)Example:

a (50%), b (25%), c (12.5%), d (12.5%)

bcd

cd

a

b

c d

01

0 1

Page 39: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 39

Optimal Merge Tree for Example(1)Optimal Merge Tree for Example(1)Example:

a (50%), b (25%), c (12.5%), d (12.5%)

abcd

bcd

cd

a

b

c d

0 1

01

0 1

ai ci Li

(bits)

a 0 1

b 10 2

c 110 3

d 111 3

Page 40: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 40

Coding Efficiency for Example(1)Coding Efficiency for Example(1)

< L > = ( 1* 0.5 + 2 * 0.25 + 3 * 0.125 + 3 * 0.125) = 1.75 (bps)

H = 0.5 + 0.5 + 0.375 + 0.375 = 1.75 (bps), = H / < L > = 1.75 / 1.75 = 1.00 , R = 0.0

Notice that:

Symbols exist at leaves, i.e., no symbol code is the prefix of another symbol code.

This is why the method is also called

“prefix coding”

Page 41: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 41

AnalysisAnalysis

The cost of insertion in a minimum heap is O(n logn)

The repeat loop is done (n-1) times.

In each iteration, the worst case removal of the least two elements is 2 logn and the insertion of the merged element is logn

Hence, the complexity of the Huffman algorithm is

O(n logn)

Page 42: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 42

Example (2):Example (2):

4- Symbol Alphabet A = {a, b, c, d} of size (4). P = {0.4, 0.25, 0.18, 0.17} H = 1.909

Symbol (i) pi -log pi -pi log pi

a 0.40 1.322 0.5288

b 0.25 2 0.5

c 0.18 2.474 0.4453

d 0.17 2.556 0.4345

Page 43: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 43

Example(2): Merge TableExample(2): Merge Table

si pi si pi si pi si pi

d 0.17

c 0.18 b 0.25

b 0.25 cd 0.35 a 0.40

a 0.40 a 0.40 cdb 0.60 cdba 1.0

Page 44: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 44

Optimal Merge Tree for Example(2)Optimal Merge Tree for Example(2)

cdba

cdb

cd

a

b

c d

01

0 1

0 1

ai ci Li

(bits)

a 1 1

b 01 2

c 001 3

d 000 3

Page 45: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 45

Coding Efficiency for Example(2)Coding Efficiency for Example(2)

a (40%), b (25%), c (18%), d (17%)<L> = 1.95 bps (Optimal)H = 1.909 = 97.9 %R = 2.1 %Coding is optimal (97.9%) but not perfect

Important Result:Perfect coding ( = 100 %) can be achieved only forprobability values of the form 2- m (1/2, ¼, 1/8,…etc

)

Page 46: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 46

File CompressionFile Compression

Variable Length Codes can be used to compress files. Symbols are initially coded using ASCII (8-bit) fixed length codes.

Steps:

1. Determine Probabilities of symbols in file.

2. Build Merge Tree (or Table)

3. Assign variable length codes to symbols.

4. Encode symbols using new codes.

5. Save coded symbols in another file together with the symbol code table.

The Compression Ratio = < L > / 8

Page 47: Analysis & Design of Algorithms (CSCE  321)

Prof. Amr Goneid, AUC 47

Huffman Coding AnimationsHuffman Coding Animations

For examples of animations of Huffman coding, see:

http://www.cs.pitt.edu/~kirk/cs1501/animations

Huffman.html

http://peter.bittner.it/tugraz/huffmancoding.html