Transcript
Page 1: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Greedy Algorithms Greedy Algorithms And And

Genome Rearrangements Genome Rearrangements

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 2: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Turnip vs. Cabbage: Look and Taste DifferentTurnip vs. Cabbage: Look and Taste Different

Although cabbages and turnips share a Although cabbages and turnips share a recent common ancestor, they look and recent common ancestor, they look and taste differenttaste different

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 3: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Turnip vs Cabbage: Comparing Gene Sequences Yields No Evolutionary Information

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 4: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Turnip vs Cabbage: Almost Identical Turnip vs Cabbage: Almost Identical mtDNA gene sequencesmtDNA gene sequences

In 1980s Jeffrey Palmer studied In 1980s Jeffrey Palmer studied evolution of plant organelles by evolution of plant organelles by comparing mitochondrial genomes of the comparing mitochondrial genomes of the cabbage and turnipcabbage and turnip

99% similarity between genes99% similarity between genesThese surprisingly identical gene These surprisingly identical gene

sequences differed in gene ordersequences differed in gene orderThis study helped pave the way to This study helped pave the way to

analyzing genome rearrangements in analyzing genome rearrangements in molecular evolutionmolecular evolution

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 5: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:Gene order comparison:

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 6: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:Gene order comparison:

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 7: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:Gene order comparison:

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 8: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:Gene order comparison:

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 9: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order

Gene order comparison:Gene order comparison:

Before

After

Evolution is manifested as the divergence in gene order

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 10: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Transforming Cabbage into Transforming Cabbage into TurnipTurnip

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 11: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

What are the similarity blocks and how to What are the similarity blocks and how to find them?find them?

What is the architecture of the ancestral What is the architecture of the ancestral genome?genome?

What is the evolutionary scenario for What is the evolutionary scenario for transforming one genome into the other?transforming one genome into the other?

Unknown ancestor~ 75 million years ago

Mouse (X chrom.)

Human (X chrom.)

Genome rearrangementsGenome rearrangements

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 12: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

History of Chromosome XHistory of Chromosome X

Rat Consortium, Nature, 2004An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 13: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

A Genetic Anomaly…A Genetic Anomaly…

Page 14: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

ReversalsReversals

Blocks represent conserved genes.Blocks represent conserved genes.

1 32

4

10

56

8

9

7

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 15: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

ReversalsReversals1 32

4

10

56

8

9

7

1, 2, 3, -8, -7, -6, -5, -4, 9, 10

Blocks represent conserved genes. In the course of evolution or in a clinical context, blocks

1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 16: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Reversals and BreakpointsReversals and Breakpoints1 32

4

10

56

8

9

7

1, 2, 3, -8, -7, -6, -5, -4, 9, 10

The reversion introduced two breakpoints(disruptions in order).

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 17: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Reversals: ExampleReversals: Example

5’ ATGCCTGTACTA 3’

3’ TACGGACATGAT 5’

5’ ATGTACAGGCTA 3’

3’ TACATGTCCGAT 5’

Break and Invert

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 18: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Types of RearrangementsTypes of Rearrangements

Reversal1 2 3 4 5 6 1 2 -5 -4 -3 6

Translocation1 2 3 44 5 6

1 2 6 4 5 3

1 2 3 4 5 6

1 2 3 4 5 6

Fusion

FissionAn Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 19: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Comparative Genomic Architectures: Comparative Genomic Architectures: Mouse vs Human GenomeMouse vs Human Genome

Humans and mice Humans and mice have similar genomes, have similar genomes, but their genes are but their genes are ordered differentlyordered differently

~245 rearrangements~245 rearrangementsReversalsReversalsFusionsFusionsFissionsFissionsTranslocationTranslocation

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 20: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Waardenburg’s Syndrome: Mouse Provides Waardenburg’s Syndrome: Mouse Provides Insight into Human Genetic DisorderInsight into Human Genetic Disorder

Waardenburg’s syndrome is characterized by pigmentary Waardenburg’s syndrome is characterized by pigmentary dysphasiadysphasia

Gene implicated in the disease was linked to human Gene implicated in the disease was linked to human chromosome 2 but it was not clear where exactly it is located chromosome 2 but it was not clear where exactly it is located on chromosome 2 on chromosome 2

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 21: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Waardenburg’s syndrome and splotch miceWaardenburg’s syndrome and splotch mice

A breed of mice (with splotch gene) had A breed of mice (with splotch gene) had similar symptoms caused by the same similar symptoms caused by the same type of gene as in humanstype of gene as in humans

Scientists succeeded in identifying location Scientists succeeded in identifying location of gene responsible for disorder in miceof gene responsible for disorder in mice

Finding the gene in mice gives clues to Finding the gene in mice gives clues to where the same gene is located in humanswhere the same gene is located in humans

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 22: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Comparative Genomic Architecture of Comparative Genomic Architecture of Human and Mouse GenomesHuman and Mouse Genomes

To locate where To locate where corresponding corresponding gene is in gene is in humans, we humans, we have to analyze have to analyze the relative the relative architecture of architecture of human and human and mouse genomesmouse genomes

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 23: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Pevzner, P. and Tesler, G. (2003). Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. PNAS, 100 (13): 7672-7677.

Page 24: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Reversals: ExampleReversals: Example

= A B C D E F G H= A B C D E F G H

(3,5)(3,5)

A B A B E D CE D C F G H F G H

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Note: we will ignore directionality of synteny blocks for the sake of simplicity…

Page 25: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Reversals: ExampleReversals: Example

= A B C D E F G H= A B C D E F G H

(3,5)(3,5)

A B E D C F G HA B E D C F G H

(5,6)(5,6)

A B E D F C G HA B E D F C G H

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 26: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Reversals and Gene OrdersReversals and Gene Orders Gene order is represented by a permutation Gene order is represented by a permutation 11------ ------ i-1 i-1 i i i+1 ------i+1 ------j-1 j-1 j j j+1 -----j+1 -----nn

11------ ------ i-1i-1 j j j-1 ------j-1 ------i+1 i+1 ii j+1 -----j+1 -----nn

Reversal Reversal ( ( i, ji, j ) reverses (flips) the elements ) reverses (flips) the elements from from ii to to jj in in

,j)

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 27: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Reversal Distance ProblemReversal Distance Problem

GoalGoal: Given two permutations, find the shortest series : Given two permutations, find the shortest series of reversals that transforms one into anotherof reversals that transforms one into another

InputInput: Permutations : Permutations and and

OutputOutput: A series of reversals : A series of reversals 11,…,…tt transforming transforming into into such that such that tt is minimum is minimum

t t : reversal distance between : reversal distance between and and dd((, , ) : smallest possible value of ) : smallest possible value of tt, given , given and and

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 28: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Sorting By Reversals ProblemSorting By Reversals Problem

GoalGoal: Given a permutation, find a shortest : Given a permutation, find a shortest series of reversals that transforms it into series of reversals that transforms it into the identity permutation (the identity permutation (1 21 2 … … nn ) )

InputInput: Permutation : Permutation

OutputOutput: A series of reversals : A series of reversals 11,, … … tt transforming transforming into the identity permutation into the identity permutation such that such that tt is minimum is minimum

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 29: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Sorting By Reversals: ExampleSorting By Reversals: Example

tt = =dd(( ) = reversal distance of ) = reversal distance of Example :Example : = = 3 43 4 2 1 5 6 7 10 9 8 2 1 5 6 7 10 9 8 4 3 2 1 5 6 7 4 3 2 1 5 6 7 10 9 810 9 8 4 3 2 14 3 2 1 5 6 7 8 9 10 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 101 2 3 4 5 6 7 8 9 10 So So dd(( ) ) = 3 = 3

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 30: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Sorting by reversals: 5 stepsSorting by reversals: 5 steps

Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: 2 3 4 5 6 7 8 1Step 3: 2 3 4 5 6 7 8 -1Step 4: -8 -7 -6 -5 -4 -3 -2 -1Step 5: g 1 2 3 4 5 6 7 8

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 31: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Sorting by reversals: 4 stepsSorting by reversals: 4 steps

Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: -5 -4 -3 -2 -8 -7 -6 1Step 3: -5 -4 -3 -2 -1 6 7 8Step 4: g 1 2 3 4 5 6 7 8

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 32: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Sorting by reversals: 4 stepsSorting by reversals: 4 steps

Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: -5 -4 -3 -2 -8 -7 -6 1Step 3: -5 -4 -3 -2 -1 6 7 8Step 4: g 1 2 3 4 5 6 7 8

What is the minimum reversal distance for this permutation? Can it be sorted in 3 steps?

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 33: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Pancake Flipping ProblemPancake Flipping Problem The chef is sloppy; he The chef is sloppy; he

prepares an unordered stack prepares an unordered stack of pancakes of different sizesof pancakes of different sizes

The waiter wants to The waiter wants to rearrange them (so that the rearrange them (so that the smallest winds up on top, smallest winds up on top, and so on, down to the and so on, down to the largest at the bottom)largest at the bottom)

He does it by flipping over He does it by flipping over several from the top, several from the top, repeating this as many times repeating this as many times as necessaryas necessary

Christos Papadimitrou and William Gates flip pancakes

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 34: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Pancake Flipping Problem: FormulationPancake Flipping Problem: Formulation

GoalGoal: Given a stack of : Given a stack of nn pancakes, what is pancakes, what is the minimum number of flips to rearrange the minimum number of flips to rearrange them into perfect stack?them into perfect stack?

InputInput: Permutation : Permutation OutputOutput: A series of prefix reversals : A series of prefix reversals 11,, … … tt

transforming transforming into the identity permutation into the identity permutation such that such that tt is minimum is minimum

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 35: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Pancake Flipping Problem: Greedy AlgorithmPancake Flipping Problem: Greedy Algorithm

Greedy approach: 2 prefix reversals at most to Greedy approach: 2 prefix reversals at most to place a pancake in its right position, place a pancake in its right position, 2(n – 1)2(n – 1) steps total at moststeps total at most

William Gates and Christos Papadimitriou William Gates and Christos Papadimitriou showed in the mid-1970s that this problem can showed in the mid-1970s that this problem can be solved by at most be solved by at most 5/3 (n + 1) prefix 5/3 (n + 1) prefix reversalsreversals

Note: our problem is a bit different in that we may not have to move (i.e., Note: our problem is a bit different in that we may not have to move (i.e., “flip”) some items in order to move others…“flip”) some items in order to move others…

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 36: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Sorting By Reversals: A Greedy AlgorithmSorting By Reversals: A Greedy Algorithm

If sorting permutation If sorting permutation = = 1 2 31 2 3 6 4 5, the 6 4 5, the first three elements are already in order so first three elements are already in order so it does not make any sense to break them. it does not make any sense to break them.

The length of the already sorted prefix of The length of the already sorted prefix of is denoted is denoted prefixprefix(()) prefixprefix(() = 3) = 3

This results in an idea for a greedy This results in an idea for a greedy algorithm: increase algorithm: increase prefixprefix(() at every step) at every step

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 37: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Doing so, Doing so, can be sorted can be sorted

1 2 31 2 3 6 46 4 5 5

1 2 3 41 2 3 4 6 56 5 1 2 3 4 5 61 2 3 4 5 6

Greedy Algorithm: An Example

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 38: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Greedy Algorithm: PseudocodeGreedy Algorithm: Pseudocode

SimpleReversalSort(SimpleReversalSort())

1 1 forfor ii 11 to to n – 1n – 1

2 2 j j position of element position of element ii in in (i.e., (i.e., jj = = ii))

3 3 ifif j j ≠≠ii

4 4 * * ((i, ji, j))

5 5 outputoutput 6 6 ifif is the identity permutation is the identity permutation

7 7 returnreturn

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

3 8 2 5 1 6 7 4

i=1 j=5

*(i, j) = 1 5 2 8 3 6 7 4

Page 39: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Analyzing SimpleReversalSortAnalyzing SimpleReversalSortSimpleReversalSort does not guarantee SimpleReversalSort does not guarantee

the smallest number of reversals and the smallest number of reversals and takes five steps on takes five steps on = 6 1 2 3 4 5 : = 6 1 2 3 4 5 :

Step 1: 1 Step 1: 1 6 26 2 3 4 5 3 4 5Step 2: 1 2 Step 2: 1 2 6 36 3 4 5 4 5 Step 3: 1 2 3 Step 3: 1 2 3 6 46 4 5 5Step 4: 1 2 3 4 Step 4: 1 2 3 4 6 56 5Step 5: 1 2 3 4 5 6Step 5: 1 2 3 4 5 6

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

The number of reversal steps to sort a permutation of length n is at most (n – 1).

Page 40: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

But it can be sorted in two steps:But it can be sorted in two steps:

= = 6 1 2 3 4 56 1 2 3 4 5 Step 1: Step 1: 5 4 3 2 15 4 3 2 1 6 6 Step 2: 1 2 3 4 5 6Step 2: 1 2 3 4 5 6

So, SimpleReversalSort(So, SimpleReversalSort() is not optimal) is not optimal

Optimal algorithms are unknown for many Optimal algorithms are unknown for many problems; approximation algorithms are usedproblems; approximation algorithms are used

Analyzing SimpleReversalSort (cont’d)

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 41: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Approximation AlgorithmsApproximation Algorithms

These algorithms find approximate These algorithms find approximate solutions rather than optimal solutionssolutions rather than optimal solutions

The The approximation ratio approximation ratio of an algorithm A of an algorithm A on input on input is: is: A(A() / OPT() / OPT())where where A(A() -solution produced by algorithm A ) -solution produced by algorithm A

OPT( OPT() - optimal solution of the ) - optimal solution of the problemproblem

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 42: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Approximation Ratio/Performance GuaranteeApproximation Ratio/Performance Guarantee

Approximation ratio (Approximation ratio (performance performance guaranteeguarantee) of algorithm A: max ) of algorithm A: max approximation ratio of all inputs of size approximation ratio of all inputs of size nnFor algorithm A that minimizes objective For algorithm A that minimizes objective

function (minimization algorithm):function (minimization algorithm):maxmax||| = | = nn A( A() / OPT() / OPT())

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 43: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Approximation Ratio (Performance Guarantee)Approximation Ratio (Performance Guarantee)

Approximation ratio (Approximation ratio (performance performance guaranteeguarantee) of algorithm A: max ) of algorithm A: max approximation ratio of all inputs of size approximation ratio of all inputs of size nnFor algorithm A that minimizes objective For algorithm A that minimizes objective

function (minimization algorithm):function (minimization algorithm):maxmax||| = | = nn A( A() / OPT() / OPT())

For maximization algorithm:For maximization algorithm:minmin||| = | = nn A( A() / OPT() / OPT())

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 44: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Recall: We are trying to minimize the number of reversals.Recall: We are trying to minimize the number of reversals.

In the worst case, we will make n-1 reversals for In the worst case, we will make n-1 reversals for nn elements when 2 would have been sufficient (e.g., consider elements when 2 would have been sufficient (e.g., consider again again = 6 1 2 3 4 5), so m = 6 1 2 3 4 5), so maxax||| = | = nn A( A() / OPT() / OPT() ) (n- (n-1)/2. For 1)/2. For nn=1001 the number of reversals =1001 the number of reversals undertaken by SimpleReversalSort could be as undertaken by SimpleReversalSort could be as much as 500 times the optimal.much as 500 times the optimal.

Goal: algorithms with better approximation ratios Goal: algorithms with better approximation ratios (i.e., better performance guarantees).(i.e., better performance guarantees).

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

What is the approximation ratio of SimpleReversalSort?

Page 45: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Key: Try using something else (i.e., besides prefix() as the basis for a greedy algorithm. How about “breakpoints”?

Page 46: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

= = 2233……n-1n-1nn

A pair of elements A pair of elements ii and and i + 1i + 1 are termed are termed

an an adjacencyadjacency if if

i+1i+1 = = i i ++ 1 1For example:For example:

= 1 9 3 4 7 8 2 6 5= 1 9 3 4 7 8 2 6 5 (3, 4) or (7, 8) and (6,5) are (3, 4) or (7, 8) and (6,5) are adjacenciesadjacencies

Adjacencies and BreakpointsAdjacencies and Breakpoints

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 47: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

There is a There is a breakpoint breakpoint between neighboring between neighboring elements that are non-consecutive:elements that are non-consecutive:

= 1 9 3 4 7 8 2 6 5= 1 9 3 4 7 8 2 6 5

Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) form breakpoints of permutation form breakpoints of permutation

bb(() - # breakpoints in permutation ) - # breakpoints in permutation

Breakpoints: An ExampleBreakpoints: An Example

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 48: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Adjacency & BreakpointsAdjacency & Breakpoints•An adjacency - a pair of neighboring elements that are consecutive

• A breakpoint - a pair of neighboring elements that are not consecutive

π = 5 6 2 1 3 4

0 5 6 2 1 3 4 7

adjacencies

breakpoints

Extend π with π0 = 0 and π7 = 7

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 49: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

= 0 1 9 3 4 7 8 2 6 5 10

We put two elements We put two elements 00 = 0= 0 and and n + 1 n + 1 = n+1= n+1 at at

the ends of the ends of

Example: Example:

Extending with 0 and 10

Notes:

1.In the example above, a new breakpoint was created after extending.

2.A permutation on n elements may have as many as n+1 breakpoints and as few as 0.

Extending PermutationsExtending Permutations

= 1 9 3 4 7 8 2 6 5

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 50: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Key: Because the non-consecutive elements i and i+1 forming a breakpoint must be separated in the process of transforming to the identity permutation, sorting by reversals can be thought of as the process of eliminating breakpoints.

Page 51: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Each reversal can eliminate at most 2 breakpoints.Each reversal can eliminate at most 2 breakpoints.

= 2 3 1 4 6 5= 2 3 1 4 6 5

0 0 2 3 12 3 1 4 6 5 4 6 5 77 bb(() = 5) = 5

00 1 1 3 23 2 4 6 5 4 6 5 77 bb(() = 4) = 4

00 1 2 3 4 1 2 3 4 6 56 5 77 bb(() = 2) = 2

00 1 2 3 4 5 6 1 2 3 4 5 6 77 b b(() = 0) = 0

Reversal Distance and BreakpointsReversal Distance and Breakpoints

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 52: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Each reversal eliminates at most 2 breakpoints.Each reversal eliminates at most 2 breakpoints. This implies: This implies: d(d() ) b( b()/2)/2

i.e., reversal distance ≥ #breakpoints / 2i.e., reversal distance ≥ #breakpoints / 2 = 2 3 1 4 6 5= 2 3 1 4 6 5

0 0 2 3 12 3 1 4 6 5 4 6 5 77 bb(() = 5) = 5

00 1 1 3 23 2 4 6 5 4 6 5 77 bb(() = 4) = 4

00 1 2 3 4 1 2 3 4 6 56 5 77 bb(() = 2) = 2

00 1 2 3 4 5 6 1 2 3 4 5 6 77 b b(() = 0) = 0

Reversal Distance and BreakpointsReversal Distance and Breakpoints

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 53: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Sorting By Reversals: A Better Greedy AlgorithmSorting By Reversals: A Better Greedy Algorithm

BreakPointReversalSort(BreakPointReversalSort())

11 whilewhile bb(() > 0) > 0

22 Among all possible reversals,Among all possible reversals,

choose reversalchoose reversal minimizing minimizing bb(( • • ))

33 • • ((i, ji, j))

44 outputoutput 55 returnreturn

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 54: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Sorting By Reversals: A Better Greedy AlgorithmSorting By Reversals: A Better Greedy Algorithm

BreakPointReversalSort(BreakPointReversalSort())

11 whilewhile bb(() > 0) > 0

22 Among all possible reversals,Among all possible reversals,

choose reversalchoose reversal minimizing minimizing bb(( • • ))

33 • • ((i, ji, j))

44 outputoutput 55 returnreturn

Question: Will this algorithm terminate? (i.e., how do we know that eliminating one breakpoint doesn’t just introduce another?)

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 55: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

StripsStrips StripStrip: an interval between two consecutive : an interval between two consecutive

breakpoints in a permutation breakpoints in a permutation Decreasing stripDecreasing strip: : stripstrip of elements in of elements in

decreasing order (e.g. 6 5 and 3 2 ).decreasing order (e.g. 6 5 and 3 2 ). Increasing stripIncreasing strip: : stripstrip of elements in of elements in

increasing order (e.g. 7 8)increasing order (e.g. 7 8) 00 1 9 4 3 7 8 2 5 6 1 9 4 3 7 8 2 5 6 1010

A single-element strip can be declared either increasing or A single-element strip can be declared either increasing or decreasing. We will choose to declare them as decreasing with decreasing. We will choose to declare them as decreasing with exception of the strips with exception of the strips with 00 and and n+1n+1

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 56: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Reducing the Number of BreakpointsReducing the Number of Breakpoints

Theorem 1:Theorem 1:

If permutation If permutation contains at least one contains at least one decreasing strip, then there exists a decreasing strip, then there exists a reversal reversal which decreases the number of which decreases the number of breakpoints (i.e. breakpoints (i.e. bb((• • ) < ) < bb(() )) )

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 57: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Things To ConsiderThings To ConsiderForFor = 1 4 6 5 7 8 3 2 = 1 4 6 5 7 8 3 2

00 1 4 6 5 7 8 3 2 1 4 6 5 7 8 3 2 99 b b(() = 5) = 5Choose decreasing strip with the smallest Choose decreasing strip with the smallest

element element kk in in ( ( kk = 2 in this case) = 2 in this case)

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 58: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Things To ConsiderThings To Consider (cont’d) (cont’d)

ForFor = 1 4 6 5 7 8 3 2 = 1 4 6 5 7 8 3 2

00 1 4 6 5 7 8 3 1 4 6 5 7 8 3 22 99 b b(() = 5) = 5Choose decreasing strip with the smallest Choose decreasing strip with the smallest

element element kk in in ( ( kk = 2 in this case) = 2 in this case)

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 59: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Things To ConsiderThings To Consider (cont’d) (cont’d)

ForFor = 1 4 6 5 7 8 3 2 = 1 4 6 5 7 8 3 2

00 11 4 6 5 7 8 3 4 6 5 7 8 3 22 99 b b(() = 5) = 5Choose decreasing strip with the smallest Choose decreasing strip with the smallest

element element kk in in ( ( kk = 2 in this case) = 2 in this case) Find Find k – 1k – 1 in the permutation in the permutation

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 60: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Things To ConsiderThings To Consider (cont’d) (cont’d)

ForFor = 1 4 6 5 7 8 3 2 = 1 4 6 5 7 8 3 2 00 11 4 6 5 7 8 3 4 6 5 7 8 3 22 99 b b(() = 5) = 5

Choose decreasing strip with the smallest Choose decreasing strip with the smallest element element kk in in ( ( kk = 2 in this case) = 2 in this case)

Find Find k – 1k – 1 in the permutation in the permutationReverse the segment between Reverse the segment between kk and and k-1k-1::00 11 4 6 5 7 8 3 4 6 5 7 8 3 22 99 bb(() = 5) = 5

00 11 22 3 8 7 5 6 4 3 8 7 5 6 4 99 bb(() = 4) = 4

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 61: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Reducing the Number of Reducing the Number of Breakpoints AgainBreakpoints Again

If there is no decreasing strip, there may be If there is no decreasing strip, there may be no reversal no reversal that reduces the number of that reduces the number of breakpoints (i.e. breakpoints (i.e. bb((••) ≥ ) ≥ bb(() for any ) for any reversal reversal ). ).

By reversing an increasing strip ( # of By reversing an increasing strip ( # of breakpoints stay unchanged ), we will create breakpoints stay unchanged ), we will create a decreasing strip at the next step. Then the a decreasing strip at the next step. Then the number of breakpoints will be reduced in the number of breakpoints will be reduced in the next step (theorem 1).next step (theorem 1).

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 62: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Things To ConsiderThings To Consider (cont’d) (cont’d)

There are noThere are no decreasing strips in decreasing strips in , for:, for:

= = 00 1 2 5 6 7 3 4 1 2 5 6 7 3 4 8 8 bb(() = 3) = 3

•• (6(6,7) = ,7) = 00 1 2 5 6 7 4 3 1 2 5 6 7 4 3 88 b b(() = 3) = 3

(6,7) does not change the # of breakpoints(6,7) does not change the # of breakpoints (6,7) creates a decreasing strip thus (6,7) creates a decreasing strip thus

guaranteeing that the next step will decrease the guaranteeing that the next step will decrease the # of breakpoints.# of breakpoints.

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 63: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

ImprovedBreakpointReversalSortImprovedBreakpointReversalSortImprovedBreakpointReversalSort(ImprovedBreakpointReversalSort())1 1 whilewhile bb(() > 0) > 02 2 ifif has a decreasing striphas a decreasing strip33 Among all possible reversals, choose reversalAmong all possible reversals, choose reversal

that minimizes that minimizes bb(( •• ))4 4 elseelse5 Choose a reversal5 Choose a reversal that flips an increasing strip in that flips an increasing strip in 6 6 •• 7 7 outputoutput 8 8 returnreturn

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 64: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

ImprovedBreakPointReversalSortImprovedBreakPointReversalSort is an is an approximation algorithm with a performance approximation algorithm with a performance guarantee of at most 4. Here’s why:guarantee of at most 4. Here’s why: It eliminates at least one breakpoint in every It eliminates at least one breakpoint in every

two steps, so there are at most two steps, so there are at most 2b2b(() steps) stepsApproximation ratio = Approximation ratio = 2b2b(() / ) / dd(())Optimal algorithm eliminates at most 2 Optimal algorithm eliminates at most 2

breakpoints in every step: breakpoints in every step: dd(() ) bb(() / 2) / 2Performance guarantee:Performance guarantee:

( ( 2b2b(() / ) / dd(() ) ?? [ ) ) ?? [ 2b2b(() / () / (bb(() / 2) ]) / 2) ]

ImprovedBreakpointReversalSort: ImprovedBreakpointReversalSort: Performance GuaranteePerformance Guarantee

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 65: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

ImprovedBreakPointReversalSortImprovedBreakPointReversalSort is an is an approximation algorithm with a performance approximation algorithm with a performance guarantee of at most 4. Here’s why:guarantee of at most 4. Here’s why: It eliminates at least one breakpoint in every It eliminates at least one breakpoint in every

two steps, so there are at most two steps, so there are at most 2b2b(() steps) stepsApproximation ratio = Approximation ratio = 2b2b(() / ) / dd(())Optimal algorithm eliminates at most 2 Optimal algorithm eliminates at most 2

breakpoints in every step: breakpoints in every step: dd(() ) bb(() / 2) / 2Performance guarantee:Performance guarantee:

( ( 2b2b(() / ) / dd(() ) ) ) [ [ 2b2b(() / () / (bb(() / 2) ] = 4) / 2) ] = 4

ImprovedBreakpointReversalSort: ImprovedBreakpointReversalSort: Performance GuaranteePerformance Guarantee

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 66: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Signed PermutationsSigned PermutationsUp to this point, all permutations to sort Up to this point, all permutations to sort

were unsignedwere unsignedBut genes have directions… so we should But genes have directions… so we should

consider signed permutationsconsider signed permutations

5’ 3’

= 1 -2 - 3 4 -5

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Page 67: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

We should, but we won’t right now…

Page 68: Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

GRIMM Web ServerGRIMM Web Server

http://nbcr.sdsc.edu/GRIMM/grimm.cgi

An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info

Real genome architectures are represented by signed permutations

Efficient algorithms to sort signed permutations have been developed

GRIMM web server computes the reversal distances between signed permutations


Top Related