![Page 1: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/1.jpg)
1
Computer Science DepartmentTechnion – Israel Institute of Technology
Genomic Sorting with Length-Weighted Reversals
Ron Y. PinterTechnion
Steve SkienaSUNY Stony Brook
![Page 2: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/2.jpg)
2
Genome Rearrangement
• events– duplication– translocation– reversal (inversion)
• occur primarily during reproduction
• allow large-scale genomic comparisons
![Page 3: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/3.jpg)
3
Sorting by Reversals
• genome represented as a permutation on 1, 2, …, n– n = # homologous genes among species
• assumptions– can identify genes– genes are distinct
• operation: reversal of a subsequence (of genes)– models inversion (occurs during crossover)
• one of the permutations can be 1, 2, …, n– appropriately relabel others
![Page 4: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/4.jpg)
4
• 6 reversal• in our model (for f(l) = l): cost = 18
Example
4 3 2 8 7 1 5 6 11 10 9
4 3 2 1 7 8 5 6 9 10 11
1 2 3 4 8 7 6 5 9 10 11
1 2 3 4 5 6 7 8 9 10 11
![Page 5: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/5.jpg)
5
Our Model
• unsigned
• cost of reversal of subsequence of length l is f(l)
• total sorting cost (or distance) is
f (length(sj))
Sj are reversed
subsequences
![Page 6: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/6.jpg)
6
Cost Functions
• additivef(x+y) = f(x) + f(y)
• subadditivef(x+y) < f(x) + f(y)
• superadditivef(x+y) > f(x) + f(y)
• other– e.g. bitonic
f(l)
f(l)
![Page 7: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/7.jpg)
7
Problems
• algorithm to sort any permutation– worst-case min cost
• approximate min cost for a given permutation
![Page 8: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/8.jpg)
8
Extremal Costs
• highly subadditive: e.g. unit cost, f(l) = 1– NP complete [Caprara, ’97]– series of approximation ratios: 2, 1.75,
1.375
• highly superadditive: f(l) > l2
– essentially bubblesort
![Page 9: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/9.jpg)
9
Our Results
• additive cost function– specifically f(l) = l
• QuickSort-like algorithm for worst-case– complexity: O(n lg2n)
• min cost approximation ratio of O(lg2n)
![Page 10: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/10.jpg)
10
MedianEject(a,b)
• find r maximal blocks of wrong-sided elements with respect to median
• for lg r do: flip every other pair of blocks of wrong-sided and adjacent blocks
• move wrong-sided blocks to median boundary
• reverse left and right blocks
![Page 11: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/11.jpg)
11
complexity: O((b-a) lg r)
Sample Run
![Page 12: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/12.jpg)
12
ReversalSort(a,b)
MedianEject (a,b);
ReversalSort (a, );
ReversalSort ( ,b);
Complexity
T(n) = 2 T ( ) + O(f(n) lg n) O(f(n)lg2n)= O(n lg2n) for f(n)~n
2
ab
2
n
![Page 13: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/13.jpg)
13
Algorithmic Improvements
I simplify “short” phases
II merge 2 last steps of MedianEject
when possible (2p+q vs. 3p+q)
III apply II recursively
p q p
![Page 14: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/14.jpg)
14
Approximation Ratio• M(p) is the maximal total distance between pairs of out-of order
elements
Lemma 4: min cost is (M(p))butLemma 6: # of out-of order elts < 3 M(p)+Lemma 7: MedianEject touches only elements within linear range
from out-of-order elements
yields:
• each round of MedianEject takes O(M(p) lg2 n)
• ReversalSort costs O(M(p) lg2 n)
• ReversalSort is at most O((lg2 n) times optimal
![Page 15: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/15.jpg)
15
• use our cost (= distance) to build phylogenetic trees
• 4 plants (chloroplastic genes)• consistent with [Martin et al., PNAS Sept ‘02]• work in progress [M. Shoham]
Bioinformatic “Validation”
Cyanophora
Cyanidium
Guilardia
Porphyra
![Page 16: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/16.jpg)
16
• weighted genes
• tighter approximation ratio– close to O(lg n)– can get to constant?
• other cost functions (incl. bitonic)
• the signed case
Open Problems: Algorithmic
![Page 17: 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a612d2/html5/thumbnails/17.jpg)
17
• chromosomal ordering
• what is the right cost function?– consider cost(l) = ld
• combine with constant-based models– restricted regions– “undesired” reversal sequences
• deal with duplication and translocation events
Open Problems: Modeling