Greedy Algorithms Greedy Algorithms And And
Genome Rearrangements Genome Rearrangements
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Turnip vs. Cabbage: Look and Taste DifferentTurnip vs. Cabbage: Look and Taste Different
Although cabbages and turnips share a Although cabbages and turnips share a recent common ancestor, they look and recent common ancestor, they look and taste differenttaste different
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Turnip vs Cabbage: Comparing Gene Sequences Yields No Evolutionary Information
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Turnip vs Cabbage: Almost Identical Turnip vs Cabbage: Almost Identical mtDNA gene sequencesmtDNA gene sequences
In 1980s Jeffrey Palmer studied In 1980s Jeffrey Palmer studied evolution of plant organelles by evolution of plant organelles by comparing mitochondrial genomes of the comparing mitochondrial genomes of the cabbage and turnipcabbage and turnip
99% similarity between genes99% similarity between genesThese surprisingly identical gene These surprisingly identical gene
sequences differed in gene ordersequences differed in gene orderThis study helped pave the way to This study helped pave the way to
analyzing genome rearrangements in analyzing genome rearrangements in molecular evolutionmolecular evolution
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order
Gene order comparison:Gene order comparison:
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order
Gene order comparison:Gene order comparison:
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order
Gene order comparison:Gene order comparison:
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order
Gene order comparison:Gene order comparison:
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Turnip vs Cabbage: Different mtDNA Gene OrderTurnip vs Cabbage: Different mtDNA Gene Order
Gene order comparison:Gene order comparison:
Before
After
Evolution is manifested as the divergence in gene order
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Transforming Cabbage into Transforming Cabbage into TurnipTurnip
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
What are the similarity blocks and how to What are the similarity blocks and how to find them?find them?
What is the architecture of the ancestral What is the architecture of the ancestral genome?genome?
What is the evolutionary scenario for What is the evolutionary scenario for transforming one genome into the other?transforming one genome into the other?
Unknown ancestor~ 75 million years ago
Mouse (X chrom.)
Human (X chrom.)
Genome rearrangementsGenome rearrangements
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
History of Chromosome XHistory of Chromosome X
Rat Consortium, Nature, 2004An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
A Genetic Anomaly…A Genetic Anomaly…
ReversalsReversals
Blocks represent conserved genes.Blocks represent conserved genes.
1 32
4
10
56
8
9
7
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
ReversalsReversals1 32
4
10
56
8
9
7
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
Blocks represent conserved genes. In the course of evolution or in a clinical context, blocks
1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Reversals and BreakpointsReversals and Breakpoints1 32
4
10
56
8
9
7
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
The reversion introduced two breakpoints(disruptions in order).
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Reversals: ExampleReversals: Example
5’ ATGCCTGTACTA 3’
3’ TACGGACATGAT 5’
5’ ATGTACAGGCTA 3’
3’ TACATGTCCGAT 5’
Break and Invert
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Types of RearrangementsTypes of Rearrangements
Reversal1 2 3 4 5 6 1 2 -5 -4 -3 6
Translocation1 2 3 44 5 6
1 2 6 4 5 3
1 2 3 4 5 6
1 2 3 4 5 6
Fusion
FissionAn Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Comparative Genomic Architectures: Comparative Genomic Architectures: Mouse vs Human GenomeMouse vs Human Genome
Humans and mice Humans and mice have similar genomes, have similar genomes, but their genes are but their genes are ordered differentlyordered differently
~245 rearrangements~245 rearrangementsReversalsReversalsFusionsFusionsFissionsFissionsTranslocationTranslocation
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Waardenburg’s Syndrome: Mouse Provides Waardenburg’s Syndrome: Mouse Provides Insight into Human Genetic DisorderInsight into Human Genetic Disorder
Waardenburg’s syndrome is characterized by pigmentary Waardenburg’s syndrome is characterized by pigmentary dysphasiadysphasia
Gene implicated in the disease was linked to human Gene implicated in the disease was linked to human chromosome 2 but it was not clear where exactly it is located chromosome 2 but it was not clear where exactly it is located on chromosome 2 on chromosome 2
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Waardenburg’s syndrome and splotch miceWaardenburg’s syndrome and splotch mice
A breed of mice (with splotch gene) had A breed of mice (with splotch gene) had similar symptoms caused by the same similar symptoms caused by the same type of gene as in humanstype of gene as in humans
Scientists succeeded in identifying location Scientists succeeded in identifying location of gene responsible for disorder in miceof gene responsible for disorder in mice
Finding the gene in mice gives clues to Finding the gene in mice gives clues to where the same gene is located in humanswhere the same gene is located in humans
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Comparative Genomic Architecture of Comparative Genomic Architecture of Human and Mouse GenomesHuman and Mouse Genomes
To locate where To locate where corresponding corresponding gene is in gene is in humans, we humans, we have to analyze have to analyze the relative the relative architecture of architecture of human and human and mouse genomesmouse genomes
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Pevzner, P. and Tesler, G. (2003). Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. PNAS, 100 (13): 7672-7677.
Reversals: ExampleReversals: Example
= A B C D E F G H= A B C D E F G H
(3,5)(3,5)
A B A B E D CE D C F G H F G H
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Note: we will ignore directionality of synteny blocks for the sake of simplicity…
Reversals: ExampleReversals: Example
= A B C D E F G H= A B C D E F G H
(3,5)(3,5)
A B E D C F G HA B E D C F G H
(5,6)(5,6)
A B E D F C G HA B E D F C G H
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Reversals and Gene OrdersReversals and Gene Orders Gene order is represented by a permutation Gene order is represented by a permutation 11------ ------ i-1 i-1 i i i+1 ------i+1 ------j-1 j-1 j j j+1 -----j+1 -----nn
11------ ------ i-1i-1 j j j-1 ------j-1 ------i+1 i+1 ii j+1 -----j+1 -----nn
Reversal Reversal ( ( i, ji, j ) reverses (flips) the elements ) reverses (flips) the elements from from ii to to jj in in
,j)
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Reversal Distance ProblemReversal Distance Problem
GoalGoal: Given two permutations, find the shortest series : Given two permutations, find the shortest series of reversals that transforms one into anotherof reversals that transforms one into another
InputInput: Permutations : Permutations and and
OutputOutput: A series of reversals : A series of reversals 11,…,…tt transforming transforming into into such that such that tt is minimum is minimum
t t : reversal distance between : reversal distance between and and dd((, , ) : smallest possible value of ) : smallest possible value of tt, given , given and and
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Sorting By Reversals ProblemSorting By Reversals Problem
GoalGoal: Given a permutation, find a shortest : Given a permutation, find a shortest series of reversals that transforms it into series of reversals that transforms it into the identity permutation (the identity permutation (1 21 2 … … nn ) )
InputInput: Permutation : Permutation
OutputOutput: A series of reversals : A series of reversals 11,, … … tt transforming transforming into the identity permutation into the identity permutation such that such that tt is minimum is minimum
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Sorting By Reversals: ExampleSorting By Reversals: Example
tt = =dd(( ) = reversal distance of ) = reversal distance of Example :Example : = = 3 43 4 2 1 5 6 7 10 9 8 2 1 5 6 7 10 9 8 4 3 2 1 5 6 7 4 3 2 1 5 6 7 10 9 810 9 8 4 3 2 14 3 2 1 5 6 7 8 9 10 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 101 2 3 4 5 6 7 8 9 10 So So dd(( ) ) = 3 = 3
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Sorting by reversals: 5 stepsSorting by reversals: 5 steps
Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: 2 3 4 5 6 7 8 1Step 3: 2 3 4 5 6 7 8 -1Step 4: -8 -7 -6 -5 -4 -3 -2 -1Step 5: g 1 2 3 4 5 6 7 8
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Sorting by reversals: 4 stepsSorting by reversals: 4 steps
Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: -5 -4 -3 -2 -8 -7 -6 1Step 3: -5 -4 -3 -2 -1 6 7 8Step 4: g 1 2 3 4 5 6 7 8
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Sorting by reversals: 4 stepsSorting by reversals: 4 steps
Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: -5 -4 -3 -2 -8 -7 -6 1Step 3: -5 -4 -3 -2 -1 6 7 8Step 4: g 1 2 3 4 5 6 7 8
What is the minimum reversal distance for this permutation? Can it be sorted in 3 steps?
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Pancake Flipping ProblemPancake Flipping Problem The chef is sloppy; he The chef is sloppy; he
prepares an unordered stack prepares an unordered stack of pancakes of different sizesof pancakes of different sizes
The waiter wants to The waiter wants to rearrange them (so that the rearrange them (so that the smallest winds up on top, smallest winds up on top, and so on, down to the and so on, down to the largest at the bottom)largest at the bottom)
He does it by flipping over He does it by flipping over several from the top, several from the top, repeating this as many times repeating this as many times as necessaryas necessary
Christos Papadimitrou and William Gates flip pancakes
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Pancake Flipping Problem: FormulationPancake Flipping Problem: Formulation
GoalGoal: Given a stack of : Given a stack of nn pancakes, what is pancakes, what is the minimum number of flips to rearrange the minimum number of flips to rearrange them into perfect stack?them into perfect stack?
InputInput: Permutation : Permutation OutputOutput: A series of prefix reversals : A series of prefix reversals 11,, … … tt
transforming transforming into the identity permutation into the identity permutation such that such that tt is minimum is minimum
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Pancake Flipping Problem: Greedy AlgorithmPancake Flipping Problem: Greedy Algorithm
Greedy approach: 2 prefix reversals at most to Greedy approach: 2 prefix reversals at most to place a pancake in its right position, place a pancake in its right position, 2(n – 1)2(n – 1) steps total at moststeps total at most
William Gates and Christos Papadimitriou William Gates and Christos Papadimitriou showed in the mid-1970s that this problem can showed in the mid-1970s that this problem can be solved by at most be solved by at most 5/3 (n + 1) prefix 5/3 (n + 1) prefix reversalsreversals
Note: our problem is a bit different in that we may not have to move (i.e., Note: our problem is a bit different in that we may not have to move (i.e., “flip”) some items in order to move others…“flip”) some items in order to move others…
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Sorting By Reversals: A Greedy AlgorithmSorting By Reversals: A Greedy Algorithm
If sorting permutation If sorting permutation = = 1 2 31 2 3 6 4 5, the 6 4 5, the first three elements are already in order so first three elements are already in order so it does not make any sense to break them. it does not make any sense to break them.
The length of the already sorted prefix of The length of the already sorted prefix of is denoted is denoted prefixprefix(()) prefixprefix(() = 3) = 3
This results in an idea for a greedy This results in an idea for a greedy algorithm: increase algorithm: increase prefixprefix(() at every step) at every step
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Doing so, Doing so, can be sorted can be sorted
1 2 31 2 3 6 46 4 5 5
1 2 3 41 2 3 4 6 56 5 1 2 3 4 5 61 2 3 4 5 6
Greedy Algorithm: An Example
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Greedy Algorithm: PseudocodeGreedy Algorithm: Pseudocode
SimpleReversalSort(SimpleReversalSort())
1 1 forfor ii 11 to to n – 1n – 1
2 2 j j position of element position of element ii in in (i.e., (i.e., jj = = ii))
3 3 ifif j j ≠≠ii
4 4 * * ((i, ji, j))
5 5 outputoutput 6 6 ifif is the identity permutation is the identity permutation
7 7 returnreturn
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
3 8 2 5 1 6 7 4
i=1 j=5
*(i, j) = 1 5 2 8 3 6 7 4
Analyzing SimpleReversalSortAnalyzing SimpleReversalSortSimpleReversalSort does not guarantee SimpleReversalSort does not guarantee
the smallest number of reversals and the smallest number of reversals and takes five steps on takes five steps on = 6 1 2 3 4 5 : = 6 1 2 3 4 5 :
Step 1: 1 Step 1: 1 6 26 2 3 4 5 3 4 5Step 2: 1 2 Step 2: 1 2 6 36 3 4 5 4 5 Step 3: 1 2 3 Step 3: 1 2 3 6 46 4 5 5Step 4: 1 2 3 4 Step 4: 1 2 3 4 6 56 5Step 5: 1 2 3 4 5 6Step 5: 1 2 3 4 5 6
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
The number of reversal steps to sort a permutation of length n is at most (n – 1).
But it can be sorted in two steps:But it can be sorted in two steps:
= = 6 1 2 3 4 56 1 2 3 4 5 Step 1: Step 1: 5 4 3 2 15 4 3 2 1 6 6 Step 2: 1 2 3 4 5 6Step 2: 1 2 3 4 5 6
So, SimpleReversalSort(So, SimpleReversalSort() is not optimal) is not optimal
Optimal algorithms are unknown for many Optimal algorithms are unknown for many problems; approximation algorithms are usedproblems; approximation algorithms are used
Analyzing SimpleReversalSort (cont’d)
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Approximation AlgorithmsApproximation Algorithms
These algorithms find approximate These algorithms find approximate solutions rather than optimal solutionssolutions rather than optimal solutions
The The approximation ratio approximation ratio of an algorithm A of an algorithm A on input on input is: is: A(A() / OPT() / OPT())where where A(A() -solution produced by algorithm A ) -solution produced by algorithm A
OPT( OPT() - optimal solution of the ) - optimal solution of the problemproblem
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Approximation Ratio/Performance GuaranteeApproximation Ratio/Performance Guarantee
Approximation ratio (Approximation ratio (performance performance guaranteeguarantee) of algorithm A: max ) of algorithm A: max approximation ratio of all inputs of size approximation ratio of all inputs of size nnFor algorithm A that minimizes objective For algorithm A that minimizes objective
function (minimization algorithm):function (minimization algorithm):maxmax||| = | = nn A( A() / OPT() / OPT())
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Approximation Ratio (Performance Guarantee)Approximation Ratio (Performance Guarantee)
Approximation ratio (Approximation ratio (performance performance guaranteeguarantee) of algorithm A: max ) of algorithm A: max approximation ratio of all inputs of size approximation ratio of all inputs of size nnFor algorithm A that minimizes objective For algorithm A that minimizes objective
function (minimization algorithm):function (minimization algorithm):maxmax||| = | = nn A( A() / OPT() / OPT())
For maximization algorithm:For maximization algorithm:minmin||| = | = nn A( A() / OPT() / OPT())
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Recall: We are trying to minimize the number of reversals.Recall: We are trying to minimize the number of reversals.
In the worst case, we will make n-1 reversals for In the worst case, we will make n-1 reversals for nn elements when 2 would have been sufficient (e.g., consider elements when 2 would have been sufficient (e.g., consider again again = 6 1 2 3 4 5), so m = 6 1 2 3 4 5), so maxax||| = | = nn A( A() / OPT() / OPT() ) (n- (n-1)/2. For 1)/2. For nn=1001 the number of reversals =1001 the number of reversals undertaken by SimpleReversalSort could be as undertaken by SimpleReversalSort could be as much as 500 times the optimal.much as 500 times the optimal.
Goal: algorithms with better approximation ratios Goal: algorithms with better approximation ratios (i.e., better performance guarantees).(i.e., better performance guarantees).
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
What is the approximation ratio of SimpleReversalSort?
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Key: Try using something else (i.e., besides prefix() as the basis for a greedy algorithm. How about “breakpoints”?
= = 2233……n-1n-1nn
A pair of elements A pair of elements ii and and i + 1i + 1 are termed are termed
an an adjacencyadjacency if if
i+1i+1 = = i i ++ 1 1For example:For example:
= 1 9 3 4 7 8 2 6 5= 1 9 3 4 7 8 2 6 5 (3, 4) or (7, 8) and (6,5) are (3, 4) or (7, 8) and (6,5) are adjacenciesadjacencies
Adjacencies and BreakpointsAdjacencies and Breakpoints
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
There is a There is a breakpoint breakpoint between neighboring between neighboring elements that are non-consecutive:elements that are non-consecutive:
= 1 9 3 4 7 8 2 6 5= 1 9 3 4 7 8 2 6 5
Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) form breakpoints of permutation form breakpoints of permutation
bb(() - # breakpoints in permutation ) - # breakpoints in permutation
Breakpoints: An ExampleBreakpoints: An Example
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Adjacency & BreakpointsAdjacency & Breakpoints•An adjacency - a pair of neighboring elements that are consecutive
• A breakpoint - a pair of neighboring elements that are not consecutive
π = 5 6 2 1 3 4
0 5 6 2 1 3 4 7
adjacencies
breakpoints
Extend π with π0 = 0 and π7 = 7
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
= 0 1 9 3 4 7 8 2 6 5 10
We put two elements We put two elements 00 = 0= 0 and and n + 1 n + 1 = n+1= n+1 at at
the ends of the ends of
Example: Example:
Extending with 0 and 10
Notes:
1.In the example above, a new breakpoint was created after extending.
2.A permutation on n elements may have as many as n+1 breakpoints and as few as 0.
Extending PermutationsExtending Permutations
= 1 9 3 4 7 8 2 6 5
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Key: Because the non-consecutive elements i and i+1 forming a breakpoint must be separated in the process of transforming to the identity permutation, sorting by reversals can be thought of as the process of eliminating breakpoints.
Each reversal can eliminate at most 2 breakpoints.Each reversal can eliminate at most 2 breakpoints.
= 2 3 1 4 6 5= 2 3 1 4 6 5
0 0 2 3 12 3 1 4 6 5 4 6 5 77 bb(() = 5) = 5
00 1 1 3 23 2 4 6 5 4 6 5 77 bb(() = 4) = 4
00 1 2 3 4 1 2 3 4 6 56 5 77 bb(() = 2) = 2
00 1 2 3 4 5 6 1 2 3 4 5 6 77 b b(() = 0) = 0
Reversal Distance and BreakpointsReversal Distance and Breakpoints
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Each reversal eliminates at most 2 breakpoints.Each reversal eliminates at most 2 breakpoints. This implies: This implies: d(d() ) b( b()/2)/2
i.e., reversal distance ≥ #breakpoints / 2i.e., reversal distance ≥ #breakpoints / 2 = 2 3 1 4 6 5= 2 3 1 4 6 5
0 0 2 3 12 3 1 4 6 5 4 6 5 77 bb(() = 5) = 5
00 1 1 3 23 2 4 6 5 4 6 5 77 bb(() = 4) = 4
00 1 2 3 4 1 2 3 4 6 56 5 77 bb(() = 2) = 2
00 1 2 3 4 5 6 1 2 3 4 5 6 77 b b(() = 0) = 0
Reversal Distance and BreakpointsReversal Distance and Breakpoints
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Sorting By Reversals: A Better Greedy AlgorithmSorting By Reversals: A Better Greedy Algorithm
BreakPointReversalSort(BreakPointReversalSort())
11 whilewhile bb(() > 0) > 0
22 Among all possible reversals,Among all possible reversals,
choose reversalchoose reversal minimizing minimizing bb(( • • ))
33 • • ((i, ji, j))
44 outputoutput 55 returnreturn
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Sorting By Reversals: A Better Greedy AlgorithmSorting By Reversals: A Better Greedy Algorithm
BreakPointReversalSort(BreakPointReversalSort())
11 whilewhile bb(() > 0) > 0
22 Among all possible reversals,Among all possible reversals,
choose reversalchoose reversal minimizing minimizing bb(( • • ))
33 • • ((i, ji, j))
44 outputoutput 55 returnreturn
Question: Will this algorithm terminate? (i.e., how do we know that eliminating one breakpoint doesn’t just introduce another?)
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
StripsStrips StripStrip: an interval between two consecutive : an interval between two consecutive
breakpoints in a permutation breakpoints in a permutation Decreasing stripDecreasing strip: : stripstrip of elements in of elements in
decreasing order (e.g. 6 5 and 3 2 ).decreasing order (e.g. 6 5 and 3 2 ). Increasing stripIncreasing strip: : stripstrip of elements in of elements in
increasing order (e.g. 7 8)increasing order (e.g. 7 8) 00 1 9 4 3 7 8 2 5 6 1 9 4 3 7 8 2 5 6 1010
A single-element strip can be declared either increasing or A single-element strip can be declared either increasing or decreasing. We will choose to declare them as decreasing with decreasing. We will choose to declare them as decreasing with exception of the strips with exception of the strips with 00 and and n+1n+1
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Reducing the Number of BreakpointsReducing the Number of Breakpoints
Theorem 1:Theorem 1:
If permutation If permutation contains at least one contains at least one decreasing strip, then there exists a decreasing strip, then there exists a reversal reversal which decreases the number of which decreases the number of breakpoints (i.e. breakpoints (i.e. bb((• • ) < ) < bb(() )) )
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Things To ConsiderThings To ConsiderForFor = 1 4 6 5 7 8 3 2 = 1 4 6 5 7 8 3 2
00 1 4 6 5 7 8 3 2 1 4 6 5 7 8 3 2 99 b b(() = 5) = 5Choose decreasing strip with the smallest Choose decreasing strip with the smallest
element element kk in in ( ( kk = 2 in this case) = 2 in this case)
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Things To ConsiderThings To Consider (cont’d) (cont’d)
ForFor = 1 4 6 5 7 8 3 2 = 1 4 6 5 7 8 3 2
00 1 4 6 5 7 8 3 1 4 6 5 7 8 3 22 99 b b(() = 5) = 5Choose decreasing strip with the smallest Choose decreasing strip with the smallest
element element kk in in ( ( kk = 2 in this case) = 2 in this case)
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Things To ConsiderThings To Consider (cont’d) (cont’d)
ForFor = 1 4 6 5 7 8 3 2 = 1 4 6 5 7 8 3 2
00 11 4 6 5 7 8 3 4 6 5 7 8 3 22 99 b b(() = 5) = 5Choose decreasing strip with the smallest Choose decreasing strip with the smallest
element element kk in in ( ( kk = 2 in this case) = 2 in this case) Find Find k – 1k – 1 in the permutation in the permutation
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Things To ConsiderThings To Consider (cont’d) (cont’d)
ForFor = 1 4 6 5 7 8 3 2 = 1 4 6 5 7 8 3 2 00 11 4 6 5 7 8 3 4 6 5 7 8 3 22 99 b b(() = 5) = 5
Choose decreasing strip with the smallest Choose decreasing strip with the smallest element element kk in in ( ( kk = 2 in this case) = 2 in this case)
Find Find k – 1k – 1 in the permutation in the permutationReverse the segment between Reverse the segment between kk and and k-1k-1::00 11 4 6 5 7 8 3 4 6 5 7 8 3 22 99 bb(() = 5) = 5
00 11 22 3 8 7 5 6 4 3 8 7 5 6 4 99 bb(() = 4) = 4
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Reducing the Number of Reducing the Number of Breakpoints AgainBreakpoints Again
If there is no decreasing strip, there may be If there is no decreasing strip, there may be no reversal no reversal that reduces the number of that reduces the number of breakpoints (i.e. breakpoints (i.e. bb((••) ≥ ) ≥ bb(() for any ) for any reversal reversal ). ).
By reversing an increasing strip ( # of By reversing an increasing strip ( # of breakpoints stay unchanged ), we will create breakpoints stay unchanged ), we will create a decreasing strip at the next step. Then the a decreasing strip at the next step. Then the number of breakpoints will be reduced in the number of breakpoints will be reduced in the next step (theorem 1).next step (theorem 1).
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Things To ConsiderThings To Consider (cont’d) (cont’d)
There are noThere are no decreasing strips in decreasing strips in , for:, for:
= = 00 1 2 5 6 7 3 4 1 2 5 6 7 3 4 8 8 bb(() = 3) = 3
•• (6(6,7) = ,7) = 00 1 2 5 6 7 4 3 1 2 5 6 7 4 3 88 b b(() = 3) = 3
(6,7) does not change the # of breakpoints(6,7) does not change the # of breakpoints (6,7) creates a decreasing strip thus (6,7) creates a decreasing strip thus
guaranteeing that the next step will decrease the guaranteeing that the next step will decrease the # of breakpoints.# of breakpoints.
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
ImprovedBreakpointReversalSortImprovedBreakpointReversalSortImprovedBreakpointReversalSort(ImprovedBreakpointReversalSort())1 1 whilewhile bb(() > 0) > 02 2 ifif has a decreasing striphas a decreasing strip33 Among all possible reversals, choose reversalAmong all possible reversals, choose reversal
that minimizes that minimizes bb(( •• ))4 4 elseelse5 Choose a reversal5 Choose a reversal that flips an increasing strip in that flips an increasing strip in 6 6 •• 7 7 outputoutput 8 8 returnreturn
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
ImprovedBreakPointReversalSortImprovedBreakPointReversalSort is an is an approximation algorithm with a performance approximation algorithm with a performance guarantee of at most 4. Here’s why:guarantee of at most 4. Here’s why: It eliminates at least one breakpoint in every It eliminates at least one breakpoint in every
two steps, so there are at most two steps, so there are at most 2b2b(() steps) stepsApproximation ratio = Approximation ratio = 2b2b(() / ) / dd(())Optimal algorithm eliminates at most 2 Optimal algorithm eliminates at most 2
breakpoints in every step: breakpoints in every step: dd(() ) bb(() / 2) / 2Performance guarantee:Performance guarantee:
( ( 2b2b(() / ) / dd(() ) ?? [ ) ) ?? [ 2b2b(() / () / (bb(() / 2) ]) / 2) ]
ImprovedBreakpointReversalSort: ImprovedBreakpointReversalSort: Performance GuaranteePerformance Guarantee
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
ImprovedBreakPointReversalSortImprovedBreakPointReversalSort is an is an approximation algorithm with a performance approximation algorithm with a performance guarantee of at most 4. Here’s why:guarantee of at most 4. Here’s why: It eliminates at least one breakpoint in every It eliminates at least one breakpoint in every
two steps, so there are at most two steps, so there are at most 2b2b(() steps) stepsApproximation ratio = Approximation ratio = 2b2b(() / ) / dd(())Optimal algorithm eliminates at most 2 Optimal algorithm eliminates at most 2
breakpoints in every step: breakpoints in every step: dd(() ) bb(() / 2) / 2Performance guarantee:Performance guarantee:
( ( 2b2b(() / ) / dd(() ) ) ) [ [ 2b2b(() / () / (bb(() / 2) ] = 4) / 2) ] = 4
ImprovedBreakpointReversalSort: ImprovedBreakpointReversalSort: Performance GuaranteePerformance Guarantee
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Signed PermutationsSigned PermutationsUp to this point, all permutations to sort Up to this point, all permutations to sort
were unsignedwere unsignedBut genes have directions… so we should But genes have directions… so we should
consider signed permutationsconsider signed permutations
5’ 3’
= 1 -2 - 3 4 -5
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
We should, but we won’t right now…
GRIMM Web ServerGRIMM Web Server
http://nbcr.sdsc.edu/GRIMM/grimm.cgi
An Introduction to Bioinformatics Algorithms (Jones and Pevzner) www.bioalgorithms.info
Real genome architectures are represented by signed permutations
Efficient algorithms to sort signed permutations have been developed
GRIMM web server computes the reversal distances between signed permutations