descendent subtrees comparison of phylogenetic trees with applications to co-evolutionary...

39
Descendent Subtrees Compari son of Phylogenetic Trees w ith Applications to Co-evol utionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng Hsu 2 1 Dept Computer Sci. & Info. Management, Providence University, Taichung, Taiwan. 2 Institute of Information Science Academia Sinica, Taipei, Taiwan

Upload: delilah-mcgee

Post on 31-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in

Bacterial Genome

Yaw-Ling Lin 1 Tsan-Sheng Hsu2

1 Dept Computer Sci. & Info. Management,Providence University, Taichung, Taiwan.

2 Institute of Information ScienceAcademia Sinica, Taipei, Taiwan

Page 2: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 2

Motivation – Where the problems

come from?

Page 3: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 3

Two-Component System

• Two-component systems (2CS):– Sensor histidine kinase– response regulator

• The major controlling machinery in order for bacteria to encounter a diverse and often hostile environment

Page 4: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 4

2CS in Pseudomonas aeruginosa PAO1

http://www.pseudomonas.com/

“Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.” Nature. 2000 Aug 31;406(6799):947-8. by Stover CK, Pham XQ, Erwin AL, et al.

• Genome: 6.3M bp• predicted genes: 5570• 123 genes were classif

ied as 2CSs.

Page 5: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 5

2CS in PAO1

Page 6: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 6

2CS in PAO1

Page 7: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 7

2CS in PAO1

Page 8: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 8

2CS in PAO1

• There are 123 annotated 2CS genes in PAO1.• Use systemic analysis of the evolutionary relations

hips between the sensor kinase and response regulator of a 2CS.

• Construct phylogenic trees using Clustal-W for 54 sensor kinases and 59 response regulators.

Page 9: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 9

2CS in PAO1 -- Sensor Tree

Page 10: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 10

2CS: Regulator Tree

Page 11: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 11

Subtrees Analysis of 2CS

Page 12: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 12

Co-evolution subtree Analysis

Sensor Tree Regulator Tree

versus

Page 13: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 13

Different Trees• Different phylogenetic trees inference methods:

- Maximum parsimony

- Maximum likelihood

- Distance matrix fitting

- Quartet based methods

• Comparing the same set of species w.r.t. different biological sequences or different genes, hence obtaining various trees.

• How to find the largest set of items on which the trees agree ?

Page 14: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 14

Previous Results• Measuring the similarity / difference between trees:

- Symmetric difference [Robinson 1979]

- Robinson and Foulds (RF) metric [Robinson 1981]

- Nearest-neighbor interchange [Waterman 1978]

- Subtree transfer distance [Allen 2001]

- Quartet metric [Estabrook 1985]

• Inferring the consensus tree: maximum agreement subtree problem (MAST) ; a.k.a the maximum homeomorphic agreement subtree

Page 15: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 15

MAST: Maximum Agreement Subtree

• Problem: given a set of rooted trees whose leaves are drawn from the same set of items of size n, find the largest subset of these items so that the portions of the trees restricted to the subset are isomorphic.

• [Amir and Keselman 1997]: NP-hard even for 3 unbounded degree trees.

• [Hein 1995]: the MAST for 3 trees with unbounded degree is hard to be approximated.

• [Amir et al 1997] Polynomial time algorithms for three or more bounded degree trees, but the time complexity is exponential in the bound for the degree.

Page 16: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 16

MAST: Maximum Agreement Subtree

• [Farach and Thorup 1997]: O(n1. 5 log n) time algorithm for two arbitrary degree trees.

• [Cole et al 2002]: MAST of two binary trees can be found in O(n log n) time; MAST of two degree d trees can be found in time.2(min{ log , log log })O n d n nd n d

Page 17: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 17

Problem Definition• A phylogenetic tree with n leaves is a (rooted) t

ree such that all the leaf nodes are uniquely labelled from 1 to n.

• The descendent subtree of a phylogenetic tree T is the subtree composed by all edges and nodes of T descending from a vertex.

• Given a set of n-leaf phylogenetic trees, we wish to explore the descendent subtrees relationships within these trees.

Page 18: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 18

Normalized cluster distance between two sets

• Symmetric set difference:

• Normalized cluster distance:

Page 19: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 19

All Pairs Subtrees Comparison – A naïve O(n3) algorithm

Page 20: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 20

All Pairs Subtrees Comparison – Property

Page 21: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 21

All Pairs Subtrees Comparison – an O(n2) algorithm

Page 22: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 22

Lowest Common Ancestor

Page 23: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 23

Confluent subtree

Page 24: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 24

Confluent subtree – Illustration

Page 25: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 25

Consructing confluent subtree

Page 26: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 26

Nearest subtree

Page 27: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 27

Nearest subtree: reasoning

Page 28: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 28

Nearest subtree: Algorithm

Page 29: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 29

Leaf-agree / Isomorphic Subtrees

Page 30: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 30

leaf-agreement – Two Trees

Page 31: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 31

All-agreement: Illustration

XY

z

x yy’=Lca(Y)

T1

X

z’=Lca(x’, y’)

Y

x’=Lca(X)

T2

Page 32: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 32

All-agreement Method

Page 33: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 33

leaf-agreement – k Trees

Page 34: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 34

Isomorphic Descendent Subtrees

Page 35: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 35

Isomorphic Descendent Subtrees (2)

Page 36: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 36

Conclusion• Computing all pairs normalized cluster distances between a

ll paired subtrees of two trees can be computationally optimally done in O(n2) time

• Finding nearest subtrees for a collection of pairwise disjointed subsets of leaves can be done in O(n) time.

• Finding all descendent subtrees consisting of the same set of leaves in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees.

• Finding all isomorhpic descendent subtrees in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees.

Page 37: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 37

Future Research

• Clustering analysis of 2CS for functional prediction of uncharacterized genes

• Co-evolutionary analysis of 2CS

• (Rooted / unrooted) phylogenetic trees comparison: when edges are labeled with (likelihood, log-odds) distances.

Page 38: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 38

The End

Page 39: Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng

Yaw-Ling Lin, Providence, Taiwan 39

What Date is Today?

• Magic Number:– 4/4, 6/6, 8/8, 10/10, 12/12– 7/11, 9/5 [also 11/7, 5/9]– 3/0? [implying 2/28, 2/0 = 1/31]

• Extension:– 365 = 52 * 7 + 1– Leap Year?

• 2003: 5 ; 2004: 7 ; 2005: 1 ; 2005:2