Transcript
Page 1: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Greedy AlgorithmsCS 6030

bySavitha Parur Venkitachalam

Page 2: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Outline

• Greedy approach to Motif searching• Genome rearrangements• Sorting by Reversals• Greedy algorithms for sorting by reversals• Approximation algorithms• Breakpoint Reversal sort

Page 3: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Greedy motif searching

• Developed by Gerald Hertz and Gary Stormo in 1989

• CONSENSUS is the tool based on greedy algorithm

• Faster than Brute force and Simple motif search algorithms

• An approximation algorithm with an unknown approximation ratio

Page 4: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Greedy motif search – Psuedocode

Page 5: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Greedy motif search – Steps• Input – DNA Sequence , t (# sequences) , n (length

of one sequence) , l (length of motif to search)• Output – set of starting points of l-mers• Performs an exhaustive search using hamming

distance on first two sequences of the DNA • Forms a 2 x l seed matrix with the two closest l-

mers • Scans the rest of t-2 sequences to find the l-mer

that best matches the seed and add it to the next row of the seed matrix

Page 6: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Complexity

• Exhaustive search on first two sequences require l(n-l+1)2 operations which is O(ln2)

• The sequential scan on t-2 sequences requires l(n-l+1)(t-2) operations which is O(lnt)

• Thus running time of greedy motif search is O(ln2 + lnt)

• If t is small compared to n algorithm behaves O(ln2)

Page 7: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Consensus tool • Greedy motif algorithm may miss the optimal

motif • Consensus tool saves large number of seed

matrices• Consensus tool can check sequences in

random• Consensus tool is less likely to miss the

optimal motif

Page 8: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Genome rearrangements

• Gene rearrangements results in a change of gene ordering

• Series of gene rearrangements can alter genomic architecture of a species

• 99% similarity between cabbage and turnip genes

• Fewer than 250 genomic rearrangements since divergence of human and mice

Page 9: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam
Page 10: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam
Page 11: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

History of Chromosome X

Rat Consortium, Nature, 2004

Page 12: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Types of RearrangementsReversal

1 2 3 4 5 6 1 2 -5 -4 -3 6

Translocation1 2 3 4 5 6

1 2 6 4 5 3

1 2 3 4 5 6

1 2 3 4 5 6

Fusion

Fission

Page 13: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Greedy algorithms in Gene Rearrangements

• Biologists are interested in finding the smallest number of reversals in an evolutionary sequence

• gives a lower bound on the number of rearrangements and the similarity between two species

• Two greedy algorithms used - Simple reversal sort - Breakpoint reversal sort

Page 14: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Gene Order• Gene order is represented by a permutation

: p p = p 1 ------ p i-1 p i p i+1 ------ p j-1 p j p

j+1 ----- p n

Reversal r ( i, j ) reverses (flips) the elements from i to j in p

* p r ( i, j ) ↓ p 1 ------ p i-1 p j p j-1 ------ p i+1 p i p j+1

----- pn

Page 15: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Reversal examplep = 1 2 3 4 5 6 7 8 r(3,5) ↓ 1 2 5 4 3 6 7 8

r(5,6) ↓ 1 2 5 4 6 3 7 8

Page 16: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Reversal distance problem

• Goal: Given two permutations, find the shortest series of reversals that transforms one into another

• Input: Permutations p and s

• Output: A series of reversals r1,…rt transforming p into s, such that t is minimum

• t - reversal distance between p and s• d(p, s) - smallest possible value of t, given p and s

Page 17: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Sorting by reversal

• Goal : Given a permutation , find a shortest series of reversals that transforms it into the identity permutation.

• Input: Permutation π• Output : A series of reversals r1,…rt

transforming p into identity permutation, such that t is minimum

Page 18: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Sorting by reversal - Greedy algorithm• If sorting permutation p = 1 2 3 6 4 5, the first

three elements are already in order so it does not make any sense to break them.

• The length of the already sorted prefix of p is denoted prefix(p)– prefix(p) = 3

• This results in an idea for a greedy algorithm: increase prefix(p) at every step

Page 19: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Simple Reversal sort – Psuedocode

• A very generalized approach leads to analgorithm that sorts by moving ith element to ith position

SimpleReversalSort(p)1 for i 1 to n – 12 j position of element i in p (i.e., pj = i)

3 if j ≠i4 p p * r(i, j)5 output p6 if p is the identity permutation 7 return

Page 20: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Example – SimpleReversalSort not optimal

Input – 612345 612345 ->162345 ->126345 ->123645->123465 --> 123456Greedy SimpleReversalSort takes 5 steps where as optimal solution only takes 2 steps612345 -> 543216 -> 123456• An example of SimpleReversalSort is ‘Pancake

Flipping problem’

Page 21: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Approximation Ratio• These algorithms produce approximate

solution rather than an optimal one• Approximation ratio is of an algorithm A is

given by A(p) / OPT(p)– For algorithm A that minimizes objective

function (minimization algorithm):• max|p| = n A(p) / OPT(p)

– For maximization algorithm:• min|p| = n A(p) / OPT(p)

Page 22: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Breakpoints – A different face of greed• In a permutation p = p 1 ----p n

- if p i and p i+1 are consecutive numbers it is an adjacency

- if p i and p i+1 are not consecutive numbers it is a breakpoint

Example:p = 1 | 9 | 3 4 | 7 8 | 2 | 6 5

• Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) form breakpoints

• Pairs (3,4) (7,8) and (6,5) form adjacencies

• b(p) - # breakpoints in permutation p

• Our goal is to eliminate all breakpoints and thus forming the identity permutation

Page 23: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Breakpoint Reversal Sort – Steps• Put two elements p 0 =0 and p n + 1=n+1 at the ends of p• Eliminate breakpoints using reversals• Each reversal eliminates at most 2 breakpoints• This implies reversal distance ≥ #breakpoints/2

p = 2 3 1 4 6 50 2 3 1 4 6 5 7 b(p) = 50 1 3 2 4 6 5 7 b(p) = 40 1 2 3 4 6 5 7 b(p) = 20 1 2 3 4 5 6 7 b(p) = 0

• Not efficient as it may run forever

Page 24: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Psuedocode – Breakpoint reversal Sort

BreakPointReversalSort(p)1 while b(p) > 02 Among all possible reversals,

choose reversal r minimizing b(p • r)

3 p p • r(i, j)4 output p5 return

Page 25: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Using stripsA strip is an interval between two consecutive breakpoints in a permutation

• Decreasing strip: strip of elements in decreasing order • Increasing strip: strip of elements in increasing order

0 1 9 4 3 7 8 2 5 6 10

• A single-element strip can be declared either increasing or decreasing. We will choose to declare them as decreasing with exception of the strips with 0 and n+1

Page 26: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Reducing breakpoints• Choose the decreasing strip with the smallest element k in

p• Find K-1 in the permutation • Reverse the segment between k and k-1Eg: p = 1 4 6 5 7 8 3 2 0 1 4 6 5 7 8 3 2 9 b(p) = 5

0 1 2 3 8 7 5 6 4 9 b( p ) = 4

0 1 2 3 4 6 5 7 8 9 b( p ) = 2

0 1 2 3 4 5 6 7 8 9

Page 27: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

ImprovedBreakpointReversalSort• Sometimes permutation may not contain any decreasing strips• So an increasing strip has to be reversed so that it becomes a decreasing

strip• Taking this into consideration we have an improved algorithm

ImprovedBreakpointReversalSort(p)1 while b(p) > 02 if p has a decreasing strip3 Among all possible reversals, choose reversal r that minimizes b(p • r)4 else5 Choose a reversal r that flips an increasing strip in p6 p p • r7 output p8 return

Page 28: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Example – ImprovedBreakPointSort

• There are no decreasing strips in p, for: p = 0 1 2 | 5 6 7 | 3 4 | 8 b(p) = 3

p • r(6,7) = 0 1 2 | 5 6 7 | 4 3 | 8 b(p) = 3

r(6,7) does not change the # of breakpointsr(6,7) creates a decreasing strip thus guaranteeing that the next step will decrease the # of breakpoints.

Page 29: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Approximation Ratio - ImprovedBreakpointReversalSort

• Approximation ratio is 4– It eliminates at least one breakpoint in every two

steps; at most 2b(p) steps– Approximation ratio: 2b(p) / d(p)– Optimal algorithm eliminates at most 2

breakpoints in every step: d(p) b(p) / 2– Performance guarantee:

• ( 2b(p) / d(p) ) [ 2b(p) / (b(p) / 2) ] = 4

Page 30: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

References

• An Introduction to Bioinformatics Algorithms - Neil C.Jones and Pavel A.Pevzner• http://bix.ucsd.edu/bioalgorithms/slides.php#

Ch5

Page 31: Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Questions


Top Related