descendent subtrees comparison of phylogenetic trees with applications to co-evolutionary...

Post on 31-Dec-2015

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in

Bacterial Genome

Yaw-Ling Lin 1 Tsan-Sheng Hsu2

1 Dept Computer Sci. & Info. Management,Providence University, Taichung, Taiwan.

2 Institute of Information ScienceAcademia Sinica, Taipei, Taiwan

Yaw-Ling Lin, Providence, Taiwan 2

Motivation – Where the problems

come from?

Yaw-Ling Lin, Providence, Taiwan 3

Two-Component System

• Two-component systems (2CS):– Sensor histidine kinase– response regulator

• The major controlling machinery in order for bacteria to encounter a diverse and often hostile environment

Yaw-Ling Lin, Providence, Taiwan 4

2CS in Pseudomonas aeruginosa PAO1

http://www.pseudomonas.com/

“Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.” Nature. 2000 Aug 31;406(6799):947-8. by Stover CK, Pham XQ, Erwin AL, et al.

• Genome: 6.3M bp• predicted genes: 5570• 123 genes were classif

ied as 2CSs.

Yaw-Ling Lin, Providence, Taiwan 5

2CS in PAO1

Yaw-Ling Lin, Providence, Taiwan 6

2CS in PAO1

Yaw-Ling Lin, Providence, Taiwan 7

2CS in PAO1

Yaw-Ling Lin, Providence, Taiwan 8

2CS in PAO1

• There are 123 annotated 2CS genes in PAO1.• Use systemic analysis of the evolutionary relations

hips between the sensor kinase and response regulator of a 2CS.

• Construct phylogenic trees using Clustal-W for 54 sensor kinases and 59 response regulators.

Yaw-Ling Lin, Providence, Taiwan 9

2CS in PAO1 -- Sensor Tree

Yaw-Ling Lin, Providence, Taiwan 10

2CS: Regulator Tree

Yaw-Ling Lin, Providence, Taiwan 11

Subtrees Analysis of 2CS

Yaw-Ling Lin, Providence, Taiwan 12

Co-evolution subtree Analysis

Sensor Tree Regulator Tree

versus

Yaw-Ling Lin, Providence, Taiwan 13

Different Trees• Different phylogenetic trees inference methods:

- Maximum parsimony

- Maximum likelihood

- Distance matrix fitting

- Quartet based methods

• Comparing the same set of species w.r.t. different biological sequences or different genes, hence obtaining various trees.

• How to find the largest set of items on which the trees agree ?

Yaw-Ling Lin, Providence, Taiwan 14

Previous Results• Measuring the similarity / difference between trees:

- Symmetric difference [Robinson 1979]

- Robinson and Foulds (RF) metric [Robinson 1981]

- Nearest-neighbor interchange [Waterman 1978]

- Subtree transfer distance [Allen 2001]

- Quartet metric [Estabrook 1985]

• Inferring the consensus tree: maximum agreement subtree problem (MAST) ; a.k.a the maximum homeomorphic agreement subtree

Yaw-Ling Lin, Providence, Taiwan 15

MAST: Maximum Agreement Subtree

• Problem: given a set of rooted trees whose leaves are drawn from the same set of items of size n, find the largest subset of these items so that the portions of the trees restricted to the subset are isomorphic.

• [Amir and Keselman 1997]: NP-hard even for 3 unbounded degree trees.

• [Hein 1995]: the MAST for 3 trees with unbounded degree is hard to be approximated.

• [Amir et al 1997] Polynomial time algorithms for three or more bounded degree trees, but the time complexity is exponential in the bound for the degree.

Yaw-Ling Lin, Providence, Taiwan 16

MAST: Maximum Agreement Subtree

• [Farach and Thorup 1997]: O(n1. 5 log n) time algorithm for two arbitrary degree trees.

• [Cole et al 2002]: MAST of two binary trees can be found in O(n log n) time; MAST of two degree d trees can be found in time.2(min{ log , log log })O n d n nd n d

Yaw-Ling Lin, Providence, Taiwan 17

Problem Definition• A phylogenetic tree with n leaves is a (rooted) t

ree such that all the leaf nodes are uniquely labelled from 1 to n.

• The descendent subtree of a phylogenetic tree T is the subtree composed by all edges and nodes of T descending from a vertex.

• Given a set of n-leaf phylogenetic trees, we wish to explore the descendent subtrees relationships within these trees.

Yaw-Ling Lin, Providence, Taiwan 18

Normalized cluster distance between two sets

• Symmetric set difference:

• Normalized cluster distance:

Yaw-Ling Lin, Providence, Taiwan 19

All Pairs Subtrees Comparison – A naïve O(n3) algorithm

Yaw-Ling Lin, Providence, Taiwan 20

All Pairs Subtrees Comparison – Property

Yaw-Ling Lin, Providence, Taiwan 21

All Pairs Subtrees Comparison – an O(n2) algorithm

Yaw-Ling Lin, Providence, Taiwan 22

Lowest Common Ancestor

Yaw-Ling Lin, Providence, Taiwan 23

Confluent subtree

Yaw-Ling Lin, Providence, Taiwan 24

Confluent subtree – Illustration

Yaw-Ling Lin, Providence, Taiwan 25

Consructing confluent subtree

Yaw-Ling Lin, Providence, Taiwan 26

Nearest subtree

Yaw-Ling Lin, Providence, Taiwan 27

Nearest subtree: reasoning

Yaw-Ling Lin, Providence, Taiwan 28

Nearest subtree: Algorithm

Yaw-Ling Lin, Providence, Taiwan 29

Leaf-agree / Isomorphic Subtrees

Yaw-Ling Lin, Providence, Taiwan 30

leaf-agreement – Two Trees

Yaw-Ling Lin, Providence, Taiwan 31

All-agreement: Illustration

XY

z

x yy’=Lca(Y)

T1

X

z’=Lca(x’, y’)

Y

x’=Lca(X)

T2

Yaw-Ling Lin, Providence, Taiwan 32

All-agreement Method

Yaw-Ling Lin, Providence, Taiwan 33

leaf-agreement – k Trees

Yaw-Ling Lin, Providence, Taiwan 34

Isomorphic Descendent Subtrees

Yaw-Ling Lin, Providence, Taiwan 35

Isomorphic Descendent Subtrees (2)

Yaw-Ling Lin, Providence, Taiwan 36

Conclusion• Computing all pairs normalized cluster distances between a

ll paired subtrees of two trees can be computationally optimally done in O(n2) time

• Finding nearest subtrees for a collection of pairwise disjointed subsets of leaves can be done in O(n) time.

• Finding all descendent subtrees consisting of the same set of leaves in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees.

• Finding all isomorhpic descendent subtrees in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees.

Yaw-Ling Lin, Providence, Taiwan 37

Future Research

• Clustering analysis of 2CS for functional prediction of uncharacterized genes

• Co-evolutionary analysis of 2CS

• (Rooted / unrooted) phylogenetic trees comparison: when edges are labeled with (likelihood, log-odds) distances.

Yaw-Ling Lin, Providence, Taiwan 38

The End

Yaw-Ling Lin, Providence, Taiwan 39

What Date is Today?

• Magic Number:– 4/4, 6/6, 8/8, 10/10, 12/12– 7/11, 9/5 [also 11/7, 5/9]– 3/0? [implying 2/28, 2/0 = 1/31]

• Extension:– 365 = 52 * 7 + 1– Leap Year?

• 2003: 5 ; 2004: 7 ; 2005: 1 ; 2005:2

top related