efficient algorithms for multichromosomal genome rearrangements

40
EFFICIENT ALGORITHMS FOR EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL GENOME MULTICHROMOSOMAL GENOME REARRANGEMENTS REARRANGEMENTS Glenn Tesler Glenn Tesler Journal of Computer and System Sciences 65 (2002) Presented by Presented by Liora LEVY Liora LEVY Seminar in BioInformatics Seminar in BioInformatics Technion Spring 2005 Technion Spring 2005

Upload: koen

Post on 05-Feb-2016

55 views

Category:

Documents


0 download

DESCRIPTION

EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL GENOME REARRANGEMENTS. Glenn Tesler. Journal of Computer and System Sciences 65 (2002). Presented by Liora LEVY Seminar in BioInformatics Technion – Spring 2005. AGENDA MOTIVATION 2. THE AUTHOR - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

EFFICIENT ALGORITHMS FOR EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL GENOME MULTICHROMOSOMAL GENOME

REARRANGEMENTSREARRANGEMENTS

Glenn TeslerGlenn Tesler

Journal of Computer and System Sciences 65 (2002)

Presented by Liora Presented by Liora LEVYLEVY

Seminar in BioInformaticsSeminar in BioInformatics

Technion – Spring 2005Technion – Spring 2005

Page 2: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

AGENDAAGENDA

1.1. MOTIVATIONMOTIVATION

2. THE AUTHOR2. THE AUTHOR

3. THE PROBLEMATIC3. THE PROBLEMATIC

4. THE ALGORITHM OF G. 4. THE ALGORITHM OF G. TESLERTESLER

5. SOFTWARE TOOL : GRIMM5. SOFTWARE TOOL : GRIMM

Page 3: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

MOTIVATIONMOTIVATION

Compute the distance between two Compute the distance between two multichromosomal genomes.multichromosomal genomes.

Scientists are interested Scientists are interested by this distance in order to by this distance in order to establish phylogenic trees establish phylogenic trees of species. of species.

WHY?WHY?We’ll We’ll define define it laterit later

Page 4: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Until the XIX Until the XIX thth century, people believed that all century, people believed that all species and in particular the humans were created species and in particular the humans were created as they are today, by God.as they are today, by God.

But, then Darwin in 1859 in But, then Darwin in 1859 in “The Origin of “The Origin of Species”Species” , developed the idea that all the species , developed the idea that all the species evolved from a common ancestor.evolved from a common ancestor.

Charles DarwinCharles DarwinBritish NaturalistBritish Naturalist

18091809- - 18821882

 "I have called this principle, by which, each slight variation, if useful, is preserved by the term Natural Selection.  "

Page 5: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Well, if you do believe that, you may be Well, if you do believe that, you may be surprisedsurprised……

Which animal was believed to be the closest Which animal was believed to be the closest (to have a common ancestor) to the human…?(to have a common ancestor) to the human…?

Maybe you need Maybe you need helphelp……

Page 6: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Guillaume Bourque(Montreal), Pavel Pevzner and Guillaume Bourque(Montreal), Pavel Pevzner and Glenn Tesler (San Diego, CA) works allow to recontitute Glenn Tesler (San Diego, CA) works allow to recontitute the genetic profile of the ancestor of mammals…the genetic profile of the ancestor of mammals…

It’s is a rodent with fur which lived 90 million years It’s is a rodent with fur which lived 90 million years agoago..

They even established that They even established that human and rats share about human and rats share about 90% of their genes.90% of their genes.

Genome Research, April 2004Genome Research, April 2004

So, why, or how are we so differentsSo, why, or how are we so differents??

Even if most of the genes are the same, their Even if most of the genes are the same, their order in the chromosome, and in the genome in order in the chromosome, and in the genome in general is very importantgeneral is very important..

Page 7: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS
Page 8: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

THE THE AUTHOR(S)AUTHOR(S)

Glenn Tesler Glenn Tesler Assistant Professor, Department of Assistant Professor, Department of Mathematics University of California, San Mathematics University of California, San DiegoDiegoThe article is based on a previous article from The article is based on a previous article from S. Hannenhalli and P. Pevzner S. Hannenhalli and P. Pevzner

Transforming men into mice (polynomial algorithm Transforming men into mice (polynomial algorithm for genomic distance problem), 1995for genomic distance problem), 1995

Sridhar Hannenhalli , Genetics Dpt, Sridhar Hannenhalli , Genetics Dpt, University of Pennsylvania. University of Pennsylvania.

Left:Left: Pavel Pevzner Pavel PevznerCS Dpt, University of California, San Diego CS Dpt, University of California, San Diego

Right:Right: Glenn Tesler Glenn Tesler

Page 9: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

THE PROBLEMATICTHE PROBLEMATIC

Distance between two multichromosomal Distance between two multichromosomal genomes:genomes: minimum number of reversals, minimum number of reversals, translocations, fissions and fusions required to translocations, fissions and fusions required to transform one genome to another.transform one genome to another.

We’ve already seen algorithms for the We’ve already seen algorithms for the unichromosomal problem.unichromosomal problem.

So why do we need “multichromosomal” ??So why do we need “multichromosomal” ??

It is very simple, mammalians got It is very simple, mammalians got multichromosomal genomes and so we need to multichromosomal genomes and so we need to find a way to translate the unichromosomal find a way to translate the unichromosomal solution in order to adapt it to the real biological solution in order to adapt it to the real biological issues. issues.

Page 10: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Human cariotype: Human cariotype:

22 pairs of chromosoms + 2 sexual 22 pairs of chromosoms + 2 sexual chromosoms.chromosoms.

Page 11: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

OLD ALGORITHM vs. NEW ALGORITHMOLD ALGORITHM vs. NEW ALGORITHM

Hannenhalli and Pevzner already gave a polynomial Hannenhalli and Pevzner already gave a polynomial algorithm “genomic_sort” for computing that algorithm “genomic_sort” for computing that distance. Glenn Tesler added some details in order distance. Glenn Tesler added some details in order to fix some problems they had with the construction.to fix some problems they had with the construction.

He also improved the speed of the algorithm by He also improved the speed of the algorithm by combining it with the algorithm of Bader, Moret and combining it with the algorithm of Bader, Moret and Yang that produces reversal scenarios for Yang that produces reversal scenarios for permutations in linear time.permutations in linear time.

Page 12: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

MAIN IDEA OF THE ALGORITHMSMAIN IDEA OF THE ALGORITHMS

The main idea to compute the rearrangement The main idea to compute the rearrangement distance between two multichromosomal genomes distance between two multichromosomal genomes ΠΠ and and ΓΓ is to concatenate their chromosomes into two is to concatenate their chromosomes into two permutations permutations ππ and and γγ. The purpose of this . The purpose of this concatenated genomes is that every rearrangement concatenated genomes is that every rearrangement in a multichromosomal genome in a multichromosomal genome ΓΓ can be mimicked can be mimicked by a reversal in a permutation by a reversal in a permutation γγ. In an optimal . In an optimal concatenate, sorting concatenate, sorting γγ with respect to with respect to ππ actually actually corresponds to sorting corresponds to sorting ΓΓ with respect to with respect to ΠΠ..

Tesler also showed that when such an optimal Tesler also showed that when such an optimal concatenate does not exist , a near-optimal concatenate does not exist , a near-optimal concatenate exists such that sorting this concatenate exists such that sorting this concatenate mimics sorting the multichromosomal concatenate mimics sorting the multichromosomal genomes and uses a single extra reversal which genomes and uses a single extra reversal which corresponds to a reodering of the chromosomes.corresponds to a reodering of the chromosomes.

IMPORTANT

Page 13: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

I – I – Improvement made to the old Improvement made to the old algorithmalgorithm

THE ALGORITHM OF G. THE ALGORITHM OF G. TESLERTESLER

1.1. There is a gap in their reduction of the There is a gap in their reduction of the multichromosomal problem to the unichromosomal multichromosomal problem to the unichromosomal problem of "sorting by reversals" (where algorithms problem of "sorting by reversals" (where algorithms for efficient generation of such scenarios are known). for efficient generation of such scenarios are known). It is sometimes necessary to reorder and flip certain It is sometimes necessary to reorder and flip certain chromosomes of chromosomes of bothboth multichromosomal genomes to multichromosomal genomes to form the permutations used in the unichromosomal form the permutations used in the unichromosomal problem, but they do not reorder either one. problem, but they do not reorder either one. We will close the gap and prove the following We will close the gap and prove the following improvement to their algorithmimprovement to their algorithm

Page 14: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Theorem 1.Theorem 1.    LetLet dd==dd((ΠΠ,Γ) ,Γ) denote the distance denote the distance between two multichromosomal genomesbetween two multichromosomal genomes, , ΠΠ andand Γ. Γ. There is a constructive algorithm to produce two There is a constructive algorithm to produce two permutationspermutations ππ*, *, γγ* * whose reversal distance iswhose reversal distance is ddrev(rev(ππ*, *, γγ* )=* )=dd oror dd+1, +1, such that optimal reversal such that optimal reversal scenarios between these permutations directly scenarios between these permutations directly mimic optimal rearrangement scenarios between mimic optimal rearrangement scenarios between genomesgenomes ΠΠ andand Γ. Γ. All of this takes polynomial timeAll of this takes polynomial time. . WhenWhen ddrev (rev (ππ*, *, γγ*)= *)= dd+1, +1, one reversal step mimics one reversal step mimics flipping a block of consecutive whole chromosomesflipping a block of consecutive whole chromosomes, , which does not count as an operation in a which does not count as an operation in a multichromosomal rearrangement scenariomultichromosomal rearrangement scenario; ; there there are examples when such a step is requiredare examples when such a step is required. .

22 . .Although the distance is symmetric (Although the distance is symmetric (dd((ΠΠ,Γ)=,Γ)=dd(Γ, (Γ, ΠΠ)), )), when the genomes have different numbers of when the genomes have different numbers of chromosomes their algorithm requires that it be chromosomes their algorithm requires that it be computed as computed as dd((ΠΠ,Γ) where ,Γ) where ΠΠ has fewer chromosomes has fewer chromosomes

than Γthan Γ . .

Page 15: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

3. We combined this algorithm with the Bader , 3. We combined this algorithm with the Bader , Moret, Yan linear-time algorithm for computing Moret, Yan linear-time algorithm for computing reversal distance in unichromosomal genomes.reversal distance in unichromosomal genomes.

Thus, we’ve reduced computation times:Thus, we’ve reduced computation times:

Time to compute distance : Time to compute distance : OO((nn) )

Time to compute a rearrangement scenario: Time to compute a rearrangement scenario: OO((nn22) )

(where n is the total number of "markers" in the (where n is the total number of "markers" in the reduction: the number of genes plus twice the reduction: the number of genes plus twice the number of chromosomes in the genome with more number of chromosomes in the genome with more chromosomes) .chromosomes) .

4. We prove a heuristic for selecting good reversals 4. We prove a heuristic for selecting good reversals based on breakpoints. The heuristic is not based on breakpoints. The heuristic is not theoretically optimal for producing pairwise theoretically optimal for producing pairwise rearrangement scenarios, but is fast in practice, rearrangement scenarios, but is fast in practice, and generalizes to phylogenetic trees involving and generalizes to phylogenetic trees involving more than two genomes. It is used by MGR, a more than two genomes. It is used by MGR, a program for constructing phylogenetic trees.program for constructing phylogenetic trees.

Page 16: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

II - II - Some definitions and notationsSome definitions and notations

We represent genes by numbers 1,…,We represent genes by numbers 1,…,NNg. g. Orientation (strand) of each gene is indicated by a ± Orientation (strand) of each gene is indicated by a ± sign. sign.

A chromosome is a sequence of signed numbersA chromosome is a sequence of signed numbers

, and the , and the flipflip of a chromosome is . of a chromosome is .

In studies of rearrangements on unichromosomal In studies of rearrangements on unichromosomal genomes, several types of chromosomes have been genomes, several types of chromosomes have been considered but only considered but only Undirected linear chromosomesUndirected linear chromosomes type is biologically relevant for multichromosomal type is biologically relevant for multichromosomal genomes: genomes:

and are regarded as equivalent.and are regarded as equivalent.

Genome is a set Genome is a set ΠΠ={={ππ(1),…,(1),…,ππ(Nc)} with Nc (Nc)} with Nc chromosomes. Chromosom i: chromosomes. Chromosom i: ππ(i)=< (i)=< ππ(i)(i)11,…, ,…, ππ(i)(i)nnii

> >

Each gene j=1,…,Ng occurs once in the Each gene j=1,…,Ng occurs once in the genome (+j / -j)genome (+j / -j)

Genes, chromosomes, genomesGenes, chromosomes, genomes

Page 17: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Caps: Ck= Ng+k for k=1,2,…,Nc.Caps: Ck= Ng+k for k=1,2,…,Nc.

Capping for a chromosom : Capping for a chromosom : ππ(i)=< (i)=< ππ(i)(i)00, , ππ(i)(i)11,…, ,…, ππ(i)(i)nnii, ,

ππ(i)(i)nni+1i+1 > >

lcaplcap rcaprcap

Capping for a genome is Capping for a genome is

There are (2Nc)! Possible cappings.There are (2Nc)! Possible cappings.

A concatenate of is a signed permutation of 1,2,…,n A concatenate of is a signed permutation of 1,2,…,n formed by choosing one of the Nc! orderings and one of formed by choosing one of the Nc! orderings and one of the 2the 2NcNc flippings of the chromosomes, and concatenating flippings of the chromosomes, and concatenating them together.them together.

Page 18: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Mimicking multichromosomal rearrangement Mimicking multichromosomal rearrangement operations by reversals on a single permutationoperations by reversals on a single permutation

The reversal The reversal ρρ(i, j) on a signed permutation (i, j) on a signed permutation ππ =< =< ππ11,…, ,…, ππkk> (where 1> (where 1≤≤i ≤ j ≤ k) isi ≤ j ≤ k) is

< < ππ11,…, ,…, ππi−1i−1,, ππ− j− j,…, ,…, ππ− i− i, , ππj+1j+1,…, ,…, ππkk>.>.Another representation Another representation ππ=<A,B,C> <A,-B,C>=<A,B,C> <A,-B,C>

Translocation: Translocation: ππ =<A,B> and =<A,B> and σσ==<C,D> <A,D> and <C,D> <A,D> and <C,B><C,B>

Fusion: Fusion: ππ =<A,B> and =<A,B> and σσ=<C,D> <A,B,C,D>=<C,D> <A,B,C,D>

Fission: Fission: ππ =<A,B> and =<A,B> and σσ=<=<Ø,ØØ,Ø> <A, Ø> and > <A, Ø> and <Ø,B> <Ø,B>

Page 19: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Number of steps in a scenario:Number of steps in a scenario:

dd( ( ПП,Γ)+# of blockflips+# of cap-,Γ)+# of blockflips+# of cap-exchanges. exchanges.

Maximum 1 for Maximum 1 for optimal optimal

concatenatesconcatenates

Non necessary Non necessary for optimal for optimal cappingscappings

Page 20: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Convention for the signs of lcaps and rcapsConvention for the signs of lcaps and rcaps::

Breakpoint graphBreakpoint graph

Page 21: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Hurdles and relativesHurdles and relatives

Interleaving graphInterleaving graph

Page 22: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

When When

b =number of black edgesb =number of black edges

c= number of cycles and paths c= number of cycles and paths

PPΓΓΓΓ=number of =number of ΓΓΓΓ paths paths

(Others parameters are from Bader et (Others parameters are from Bader et al. algorithm)al. algorithm)

The distance can be calculated The distance can be calculated asas: :

Page 23: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

III – III – The new algorithmThe new algorithm

Page 24: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

1. Joigning and closing paths, simplified1. Joigning and closing paths, simplified

Several steps of genomic_sort add an edge to the Several steps of genomic_sort add an edge to the graph to join two paths into a larger path. The result graph to join two paths into a larger path. The result is always a Γis always a ΓΠΠ-path with an oriented or -path with an oriented or interchromosomal edge, and a subsequent iteration interchromosomal edge, and a subsequent iteration of the main loop of their algorithm closes that path of the main loop of their algorithm closes that path We simplify this by adding two edges simultaneously We simplify this by adding two edges simultaneously to join these paths into a cycle in a single loop to join these paths into a cycle in a single loop iteration. iteration.

The first such steps join a The first such steps join a ΠΠΠΠ-path with a ΓΓ-path. The resulting -path with a ΓΓ-path. The resulting paths never interact with any other path in the main loop, so we paths never interact with any other path in the main loop, so we separate this out into its own loop (B5–B7). It is also rephrased to separate this out into its own loop (B5–B7). It is also rephrased to account for the new distinction between account for the new distinction between pp and and ppΓΓ. ΓΓ. The other path joining steps (steps A8 and A13) join two Γ-paths. The other path joining steps (steps A8 and A13) join two Γ-paths. They proved that at least one of the two possible Γ-edges They proved that at least one of the two possible Γ-edges connecting them is oriented or interchromosomal, and they test the connecting them is oriented or interchromosomal, and they test the edges to add such an edge first. The other edge is guaranteed to be edges to add such an edge first. The other edge is guaranteed to be added in a later iteration. Since the order that they are added does added in a later iteration. Since the order that they are added does not affect the final output, we remove this test and just add them not affect the final output, we remove this test and just add them both at once (steps B10 and B13).both at once (steps B10 and B13).

Page 25: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

2.Adaptation of BMY algorithm2.Adaptation of BMY algorithm

BMY : algorithm to compute the connected BMY : algorithm to compute the connected components of the interleaving graph. They components of the interleaving graph. They implemented it in the file invdist.c of GRAPPA. We implemented it in the file invdist.c of GRAPPA. We modified it to account for paths (instead of just modified it to account for paths (instead of just cycles), deleted tails, and bare edges. The resulting cycles), deleted tails, and bare edges. The resulting procedure form_components runs in time Θ(procedure form_components runs in time Θ(nn). It ). It identifies the components and computes and stores identifies the components and computes and stores certain structural information about them.certain structural information about them.

ΘΘ(n)(n)

Page 26: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

3. When 3. When ΓΓ has fewer chromosoms than has fewer chromosoms than ΠΠ

The original construction of The original construction of GG((ΠΠ,Γ) assumes that ,Γ) assumes that Nc( Nc(ΠΠ))≤ Nc(≤ Nc(ΓΓ), the solution is to add null chromosom ), the solution is to add null chromosom to to ΠΠ..

However, that construction breaks down without that However, that construction breaks down without that assumption: if Γ has fewer chromosomes and we pad it assumption: if Γ has fewer chromosomes and we pad it with nulls, then when we delete a gray edge with nulls, then when we delete a gray edge corresponding to a null in Γ, the construction leaves corresponding to a null in Γ, the construction leaves unresolved how to classify the vertices of the edge into unresolved how to classify the vertices of the edge into ΠΠ-caps and Γ-tails. We have said both vertices should -caps and Γ-tails. We have said both vertices should be classified as be classified as ΠΠ-caps in this case. -caps in this case.

Changes were made to make the construction truly Changes were made to make the construction truly symmetric, regardless of which genome has more symmetric, regardless of which genome has more chromosomes.chromosomes.

Page 27: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

4. From optimal cappings to optimal 4. From optimal cappings to optimal concatenatesconcatenates

The procedure genomic_sort, produced a new capping The procedure genomic_sort, produced a new capping of Γ to prove the distance formula. However, to of Γ to prove the distance formula. However, to compute the distance without building a proof compute the distance without building a proof certificate (i.e., capping), it is only necessary to certificate (i.e., capping), it is only necessary to compute rearrangement distance. compute rearrangement distance.

It is possible to extend that procedure to It is possible to extend that procedure to algorithmically produce an optimal rearrangement algorithmically produce an optimal rearrangement scenario between two genomes, but they do not scenario between two genomes, but they do not actually give the connection between the capping and actually give the connection between the capping and the scenario; our added step B19 does this.the scenario; our added step B19 does this.

•Proper flippingProper flipping

•Proper bondingProper bonding

Procedure form_optimal_concatenate runs in O(n. Nc)Procedure form_optimal_concatenate runs in O(n. Nc)

Page 28: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

5. Optimal scenarios5. Optimal scenarios

Mimicking a rearrangement scenario by a reversal Mimicking a rearrangement scenario by a reversal scnearioscnearioSeveral algorithm for producing optiaml scenarios Several algorithm for producing optiaml scenarios between a pairof permutations:between a pairof permutations:

•Hannenhalli and Pevzner: O(nHannenhalli and Pevzner: O(n55) and O(n) and O(n44))•Berman and Hannenhalli: O(nBerman and Hannenhalli: O(n22 αα(n))(n))•Kaplan, Shamir and tarjan: O(nKaplan, Shamir and tarjan: O(n22))

These are easily adapted to produce a These are easily adapted to produce a multichromosomal rearrangement scenario, but must multichromosomal rearrangement scenario, but must obey the following restriction:obey the following restriction:

A reversal starts at an lcap A reversal starts at an lcap it ends at an rcap. it ends at an rcap.

Page 29: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

6.Breakpoint heuristic for optimal scenarios and 6.Breakpoint heuristic for optimal scenarios and treestrees

Although the algorithms just named can quickly select Although the algorithms just named can quickly select good reversals for pairwise genomic rearrangement good reversals for pairwise genomic rearrangement scenarios, selection of good reversals is NP-hard for scenarios, selection of good reversals is NP-hard for even the simplest phylogenetic trees. We have even the simplest phylogenetic trees. We have integrated the algorithms in this paper into Guillaume integrated the algorithms in this paper into Guillaume Bourque's program MGR for constructing phylogenetic Bourque's program MGR for constructing phylogenetic trees. trees.

Let Let GG={ ={ ΠΠ11,…, ,…, ΠΠmm} be a set of genomes, either } be a set of genomes, either

multichromosomal, or unichromosomal with circular, multichromosomal, or unichromosomal with circular, directed linear, or undirected linear chromosomes. A directed linear, or undirected linear chromosomes. A phylogenetic treephylogenetic tree TT on on GG is a tree whose vertices are is a tree whose vertices are genomes on a common set of genes, and whose leaves genomes on a common set of genes, and whose leaves are the genomes in are the genomes in GG. .

A A conserved adjacencyconserved adjacency ( (xx,,yy) of ) of GG is a pair of genes such is a pair of genes such that every genome in that every genome in GG contains either ( contains either (xx,,yy) or (−) or (−yy,−,−xx) ) consecutively. Let consecutively. Let AA((ΠΠ11,…, ,…, ΠΠmm) denote the set of all ) denote the set of all

conserved adjacencies. conserved adjacencies.

Page 30: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

A A conserved stripconserved strip ( (xx1,…,1,…,xkxk) is a sequence of ) is a sequence of genes such that every genome contains either genes such that every genome contains either it or (−it or (−xkxk,…,−,…,−xx1) consecutively. It is comprised 1) consecutively. It is comprised of of kk−1 conserved adjacencies.−1 conserved adjacencies.

Theorem 8Theorem 8

• Between any two genomesBetween any two genomes ( ,Γ), ( ,Γ), there is an there is an optimal reversal or rearrangement scenario in optimal reversal or rearrangement scenario in which the pairs inwhich the pairs in AA( ,Γ) ( ,Γ) are adjacent at every are adjacent at every stepstep.  .  

(b) (b) For a set of genomesFor a set of genomes GG={ 1,…, ={ 1,…, mm}, }, there is an there is an optimal phylogenetic tree in which the pairs inoptimal phylogenetic tree in which the pairs in AA( 1,…, ( 1,…, mm) ) are adjacencies in every nodeare adjacencies in every node, , and and an optimal rearrangement scenario of forman optimal rearrangement scenario of form (a) (a) exists on each edgeexists on each edge..

Page 31: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

GRIMMGRIMMGenome Rearrangements in Man and MouseGenome Rearrangements in Man and Mouse

This is a web server combining rearrangement This is a web server combining rearrangement algorithms for unichromosomal and multichromosomal algorithms for unichromosomal and multichromosomal genomes, with either signed or unsigned gene data. In genomes, with either signed or unsigned gene data. In each case, it computes the minimum possible number each case, it computes the minimum possible number of rearrangement steps, and determines a possible of rearrangement steps, and determines a possible scenario taking this number of steps. This is integrated scenario taking this number of steps. This is integrated into a related project MGR for constructing optimal into a related project MGR for constructing optimal

phylogenic trees with multiple genomesphylogenic trees with multiple genomes . .

SOFTWARE TOOLSOFTWARE TOOL

Page 32: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Input: Two genomesInput: Two genomes

Page 33: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Result : A scenarioResult : A scenario

Page 34: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Input: Three genomesInput: Three genomes

Page 35: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Output: Output:

1. A distance matrix1. A distance matrix

Page 36: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

22 . .A common ancestorA common ancestor

Page 37: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

33 . .A phylogenic treeA phylogenic tree

Page 38: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

New issues for the future:New issues for the future:

Recent works by Robert Pruitt and Susan Lolle, Purdue Recent works by Robert Pruitt and Susan Lolle, Purdue University, Indiana, USA on a plant : University, Indiana, USA on a plant : Arabidopsis Arabidopsis thalianathaliana (in particular the mutant HotHead) showed (in particular the mutant HotHead) showed that genetic material may also be transmitted by RNA that genetic material may also be transmitted by RNA and not only by DNA. This is in opposition to Mendel and not only by DNA. This is in opposition to Mendel theory(1865), and insists on the fact that children can theory(1865), and insists on the fact that children can have genes that their parents don’t have (but their have genes that their parents don’t have (but their grand parents do)…grand parents do)…

Genetic studies may take a new depart after this Genetic studies may take a new depart after this discovery.discovery.

I’ve read it from Sciences et Vies, May 2005 I’ve read it from Sciences et Vies, May 2005 Original parution: Nature, April 2005.Original parution: Nature, April 2005.

Subjects for Doctorate???Subjects for Doctorate???

Page 39: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

Challenge: Genome rearrangements and cancerChallenge: Genome rearrangements and cancer

We insisted on the fact that genome rearrangements We insisted on the fact that genome rearrangements were used to study the evolution of a group of were used to study the evolution of a group of organisms. Now, because of a rapid increase of organisms. Now, because of a rapid increase of chromosomal mutations frequently observed in cancer chromosomal mutations frequently observed in cancer cells, it’s possible to study the cancer genome very cells, it’s possible to study the cancer genome very much like if it was a new organism that had recently much like if it was a new organism that had recently diverged from the normal human genomes.diverged from the normal human genomes.

The interest is that although cancer progression is The interest is that although cancer progression is frequently associated with genome rearrangements frequently associated with genome rearrangements the mechanisms behind these rearrangements are still the mechanisms behind these rearrangements are still poorly understood.poorly understood.

Source: Source:

Guillaume Bourque,Guillaume Bourque,

Genome Institute of Singapore. Genome Institute of Singapore.

Page 40: EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL               GENOME REARRANGEMENTS

THANK YOUTHANK YOU. . . . . .