lecture greedy dynamic

7/27/2019 Lecture Greedy Dynamic

http://slidepdf.com/reader/full/lecture-greedy-dynamic 1/59

Greedy Algorithms and Dynamic Programming 1

Lectures onGreedy Algorithms and

Dynamic Programming

COMP 523: Advanced Algorithmic Techniques

Lecturer: Dariusz Kowalski




Overview

Previous lectures:

• Algorithms based on recursion - call to the same

procedure to solve the problem for the smaller-

size sub-input(s)

• Graph algorithms: searching, with applications

These lectures:• Greedy algorithms

• Dynamic programming




Greedy algorithm’s paradigm

Algorithm is greedy if :

• it builds up a solution in small consecutive steps

• it chooses a decision at each step myopically to optimize some underlying criterion

Analyzing optimal greedy algorithms by showing that:

• in every step it is not worse than any other algorithm, or • every algorithm can be gradually transformed to the

greedy one without hurting its quality




Interval scheduling

Input: set of intervals on the line, represented by pairs of points (ends of intervals)

Output: the largest set of intervals such that

none two of them overlap

Generic greedy solution:

• Consider intervals one after another using somerule




Rule 1

Select the interval that starts earliest

(but is not overlapping the already chosen intervals)

Underestimated solution!

optimal

algorithm




Rule 2

Select the shortest interval

(but not overlapping the already chosen intervals)


optimal

algorithm




Rule 3

Select the interval intersecting the smallest number of remaining intervals

(but still is not overlapping the already chosen intervals)


optimal

algorithm




Rule 4

Select the interval that ends first(but still is not overlapping the already chosen intervals)

Hurray! Exact solution!




Analysis - exact solution

Algorithm gives non-overlapping intervals:obvious, since we always choose an interval which does

not overlap the previously chosen intervals

The solution is exact:

Let:

• A be the set of intervals obtained by the algorithm,

• Opt be the largest set of pairwise non-overlapping

intervals

We show that A must be as large as Opt




Analysis - exact solution cont. Let A = {A1,…,Ak } and Opt = {B1,…,Bm} be sorted.

By definition of Opt we have k m.Fact: for every i k , Ai finishes not later than Bi.

Proof: by induction.

For i = 1 by definition of the first step of the algorithm.

From i -1 to i : Suppose that Ai-1 finishes not later than Bi-1.From the definition of a single step of the algorithm, Ai is the first

interval that finishes after Ai-1 and does not overlap it.

If Bi finished before Ai then it would overlap some of the previous

A1,…, Ai-1 and consequently - by the inductive assumption - itwould overlap or end before Bi-1, which would be acontradiction.

Ai-1

Bi-1 Bi

Ai




Analysis - exact solution cont.

Theorem: A is the exact solution.Proof: we show that k = m.

Suppose to the contrary that k < m.

We already know that Ak

finishes not later than Bk .

Hence we could add Bk +1 to A and obtain a bigger solution by the algorithm - a contradiction.

Ak -1

Bk -1 Bk

Ak

Bk +1

algorithm finishes selection




Implementation & time complexity

Efficient implementation:

• Sort intervals according to the right-most ends

• For every consecutive interval: – If the left-most end is after the right-most end of the

last selected interval then we select this interval

– Otherwise we skip it and go to the next interval

Time complexity: O(n log n + n) = O(n log n)




Textbook and Exercises

READING:

• Chapter 4 “Greedy Algorithms”, Section 4.1

EXERCISE:

• All Interval Scheduling problem from Section

4.1




Greedy algorithm’s paradigm

Algorithm is greedy if :

• it builds up a solution in small consecutive steps

• it chooses a decision at each step myopically to optimize some underlying criterion

Analyzing optimal greedy algorithms by showing that:

• in every step it is not worse than any other algorithm, or • every algorithm can be gradually transformed to the

greedy one without hurting its quality




Properties of minimum spanning

trees MSTProperties of spanning trees:• n nodes

• n - 1 edges

• at least 2 leaves (leaf - a node with only one neighbor)MST cycle property:

• after adding an edge we obtain exactly one cycle andeach edge from MST in this cycle has no bigger weight

than the weight of the added edge

11

2

23

11

2

23cycle




Crucial observation about MST

Consider sets of nodes A and V - A• Let F be the set of edges between A and V - A

• Let a be the smallest weight of an edge in F

Theorem: Every MST must contain at least one edge of weight a

from set F

11

2

23

11

2

23

A A




Proof of the Theorem

Let e be the edge in F with the smallest weight - for simplicityassume that such edge is unique. Suppose to the contrary that

e is not in some MST. Consider one such MST.

Add e to MST - a cycle is obtained, in which e has weight not smaller

than any other weight of edge in this cycle, by the MST cycle property.Since the two ends of e are in different sets A and V - A,

there is another edge f in the cycle and in F. By definition of e,

such f must have a bigger weight than e, which is a contradiction.

11

2

23

11

2

23

A A




Greedy algorithms finding MST

Kruskal’s algorithm: • Sort all edges according to their weights

• Choose n - 1 edges, one after another, as follows:

– If a new added edge does not create a cycle with previouslyselected edges then we keep it in (partial) solution;

otherwise we remove it

Remark: we always have a partial forest

11

2

23

11

2

23

11

2

23




Why the algorithms work?

Follows from the crucial observations:Kruskal’s algorithm:

• Suppose we add edge {v,w};

• This edge has a smallest weight among edges between theset of nodes already connected with v (by a path inalready selected subgraph) and other nodes

Prim’s algorithm:

• Always chooses an edge with a smallest weight amongedges between the set of already connected nodes and freenodes (i.e., non-connected nodes)





READING:

• Chapter 4 “Greedy Algorithms”, Section 4.5

EXERCISES:

• Solved Exercise 3 from Chapter 4

• Generalize the proof of the Theorem to the case

where may be more than one edges of smallestweight in F




Priority Queues (PQ)

Implementation of Prim’s algorithm using PQ




Crucial observation about MST

Consider sets of nodes A and V - A• Let F be the set of edges between A and V - A

• Let a be the smallest weight of an edge in F

Theorem: Every MST must contain at least one edge of weight a

from set F

11

2

23

11

2

23

A A




Greedy algorithm finding MST

Prim’s algorithm: • Select an arbitrary node as a root

• Choose n - 1 edges, one after another, as follows:

– Consider all edges which are incident to the currently build(partial) solution and which do not create a cycle in it, and

select one which has the smallest weight

Remark: we always have a connected partial tree

11

2

23

11

2

23

11

2

23

root




Priority queue

Set of n elements, each has its priority value (key)

– the smaller key the higher priority the element has

Operations provided in time O(log n):

• Adding new element to PQ

• Removing an element from PQ

• Taking element with the smallest key




Implementation of PQ based on heaps

Heap: rooted (almost) complete binary tree, each node has its

• value

• key• 3 pointers: to the parent and

children (or nil(s) if parent or child(ren) not available)

Required property: in each subtree the smallest key

is always in the root

4

2

6

3

57

2 3 4 7 5 6




Operations on the heap

PQ operations:

• Add

• Remove

• Take

Additional supporting operation:

• Last leaf:Updating the pointer to the rigth-most leaf on the lowest

level of the tree, after each operation (take, add, remove)




Implementing operations on heap

Smallest key element: trivially read from the root

Adding new element:

• find the next last leaf location in the heap

• put the new element as the last leaf • recursively compare it with its parent’s key:

– if the element has the smaller key then swap the element and its parent andcontinue;

otherwise stop

Remark: finding the next last leaf may require to search through the path up and then down (exercise)




Implementing operations on heap

Removing element:

• remove it from the tree

• move the value from last leaf on its place

• update the last leaf

• compare the moved element recursively either

– “up” if its value is smaller than its current parent:

swap the elements and continue going up until reaching smaller parent or theroot,

or

– “down” if its value is bigger than its current parent: swap it with the smallest of its children and continue going down untilreaching a node with no smaller child or a leaf




Examples - adding

2

1

6

3

57

1 3 2 7 5 6

4

4

4

2

6

3

57

2 3 4 7 5 6

1

1

1

2

6

3

57

2 3 1 7 5 6

4

4

add 1 at the end swap 1 and 4

swap 1 and 2




Examples - removing

4

2

6

3

57

2 3 4 7 5 6

43

6

57

3 4 7 56

46

3

57

6 4 7 53

45

3

67

5 4 7 63

removing 2 swap 2 and last element remove 2 andswap 6 and 3

swap 6 and 5





READING:• Chapters 2 and 4, Sections 2.5 and 4.5

EXERCISES:

• Solved Exercises 1 and 2 from Chapter 4• Prove that a spanning tree of an n - node graph has

n - 1 edges

• Prove that an n - node connected graph has at least

n - 1 edges

• Show how to implement the update of the last leaf in

time O(log n)




Dynamic programming

Two problems:

• Weighted interval scheduling

• Sequence alignment




Dynamic Programming paradigm

Dynamic Programming (DP):• Decompose the problem into series of sub-problems

• Build up correct solutions to larger and larger sub- problems

Similar to:

• Recursive programming vs. DP: in DP sub-problems maystrongly overlap

• Exhaustive search vs. DP: in DP we try to findredundancies and reduce the space for searching

• Greedy algorithms vs. DP: sometimes DP orders sub- problems and processes them one after another




(Weighted) Interval scheduling

(Weighted) Interval scheduling:Input: set of intervals (with weights) on the line,

represented by pairs of points - ends of intervals

Output: the largest (maximum sum of weights) setof intervals such that none two of them overlap

Greedy algorithm doesn’t work for weighted case!




Example

Greedy algorithm:

• Repeatedly select the interval that ends first (but still notoverlapping the already chosen intervals)

Exact solution of unweighted case.

weight 1

weight 3

weight 1

Greedy algorithm gives total weight 2 instead of optimal 3




Basic structure and definition

• Sort the intervals according to their right ends

• Define function p as follows:

– p(1) = 0

– p(i) is the number of intervals which finish before ith intervalstarts

weight 1

weight 3

weight 1

weight 2

p(1)=0

p(2)=1

p(3)=0

p(4)=2

B i




Basic property

• Let w j be the weight of jth interval

• Optimal solution for the set of first j intervals satisfiesOPT( j) = max{ w j + OPT( p( j)) , OPT( j-1) }

Proof :

If jth interval is in the optimal solution O then the other intervals in

O are among intervals 1,…, p( j).Otherwise search for solution among first j-1 intervals.

weight 1

weight 3

weight 1

weight 2

p(1)=0

p(2)=1

p(3)=0

p(4)=2




Sketch of the algorithm

• Additional array M[0…n] initialized by 0, p(1),…, p(n)

( intuitively M[ j] stores optimal solution OPT( j) )

Algorithm

• For j = 1,…,n do

– Read p( j) = M[ j] – Set M[ j] := max{ w j + M[ p( j)] , M[ j-1] }

weight 1

weight 3

weight 1

weight 2

p(1)=0

p(2)=1

p(3)=0

p(4)=2




Complexity of solution

Time: O(n log n)

• Sorting: O(n log n)

• Initialization of M[0…n] by 0, p(1),…, p(n): O(n log n)

• Algorithm: n iterations, each takes constant time, total

O(n) Memory: O(n) - additional array M

weight 1

weight 3

weight 1

weight 2

p(1)=0

p(2)=1

p(3)=0

p(4)=2

li bl




Sequence alignment problem

Popular problem from word processing andcomputational biology

• Input: two words X = x1x2…xn and Y = y1y2…ym

• Output: largest alignment

Alignment A:

set of pairs (i1, j1),…,(ik ,jk ) such that

• If (i,j) in A then xi = y j

• If (i,j) is before (i’, j’) in A then i < i’ and j < j’ (nocrossing matches)

l




Example

• Input: X = c t t t c t c c Y = t c t t c c

Alignment A:

X = c t t t c t c c

| | | | |

Y = t c t t c c Another largest alignment A:

X = c t t t c t c c

| | | | |Y = t c t t c c

Fi di h i f li




Finding the size of max alignment

Optimal alignment OPT(i,j) for prefixes of X and Y of lengths i

and j respectively:OPT(i,j) = max{ ij + OPT(i-1, j-1) , OPT(i, j-1) , OPT(i-1, j) }

where ij equals 1 if xi = y j, otherwise is equal to -

Proof :

If xi = y j in the optimal solution O then the optimal alignmentcontains one match (xi , y j) and the optimal solution for prefixesof length i-1 and j-1 respectively.

Otherwise at most one end is matched. It follows that either x1x2…xi-1 is matched only with letters from y1y2…ym or y1y2…y j-1 is matched only with letters from x1x2…xn. Hence theoptimal solution is either the same as for OPT(i-1, j) or for OPT(i, j-1).

Al i h fi di li




Algorithm finding max alignment

• Initialize matrix M[0..n,0..m] into zerosAlgorithm

• For i = 1,…,n do

– For j = 1,…,m do• Compute ij

• Set M[i,j] : =

max{ ij

+ M[i-1, j-1] , M[i, j-1] , M[i-1, j] }

C l i




Complexity

Time: O(nm) • Initialization of matrix M[0..n,0..m]: O(nm)

• Algorithm: O(nm)

Memory: O(nm)

R i f i l li




Reconstruction of optimal alignment

Input: matrix M[0..n,0..m] containing OPTvalues

Algorithm

• Set i = n, j = m• While both i,j > 0 do

• Compute ij

• If M[i,j] = ij

+ M[i-1, j-1] then match xiand y

jand

set i = i - 1, j = j - 1; else

• If M[i,j] = M[i, j-1] then set j = j - 1 (skip letter y j ); else

• If M[i,j] = M[i-1, j] then set i = i - 1 (skip letter xi )

Di t b t d




Distance between words

Generalization of alignment problem

• Input:

– two words X = x1x2…xn and Y = y1y2…ym

– mismatch costs pq, for every pair of letters p and q

– gap penalty

• Output:

– (smallest) distance between words X and Y

E l




Example

• Input: X = c t t t c t c c Y = t c t t c c

Alignment A: (4 gaps of cost each, 1 mismatch of cost ct )

X = c t t t c t c c

| | | ^ |

Y = t c t t c c Largest alignment A: (4 gaps)

X = c t t t c t c c

| | | | |Y = t c t t c c

Fi di th di t b t d




Finding the distance between words

Optimal alignment OPT(i,j) for prefixes of X and Y of lengths i and j

respectively:OPT(i,j) = min{ ij + OPT(i-1, j-1) , + OPT(i, j-1) , + OPT(i-1, j) }

Proof :

If xi and y j are (mis)matched in the optimal solution O then the

optimal alignment contains one (mis)match (xi , y j) of cost ij andthe optimal solution for prefixes of length i-1 and j-1 respectively.

Otherwise at most one end is (mis)matched. It follows that either x1x2…xi-1 is (mis)matched only with letters from y1y2…ym or y1y2…y j-1 is (mis)matched only with letters from x1x2…xn. Hence

the optimal solution is either the same as counted for OPT(i-1, j) or for OPT(i, j-1), plus the penalty gap .

Algorithm and complexity remain the same.





READING:

• Chapter 6 “Dynamic Programming”, Sections

6.1 and 6.6EXERCISES:

• All Shortest Paths problem, Section 6.8




Conclusions

• Greedy algorithms: algorithms constructing solutionsstep after step by using a local rule

• Exact greedy algorithm for interval selection problem -in time O(n log n) illustrating “greedy stays ahead” rule

• Greedy algorithms for finding minimum spanning treein a graph

– Kruskal’s algorithm

– Prim’s algorithm

• Priority Queues – greedy Prim’s algorithms for finding a minimum spanning tree

in a graph in time O(m log n)



G d Al ith d D i P i 59

Conclusions cont.

• Dynamic programming

– Weighted interval scheduling in time O(n log n)

– Sequence alignment in time O(nm)

lecture greedy dynamic

Documents