the cobweb of life revealed by genome-scale estimates of horizontal gene transfer fan ge, li-san...

Post on 19-Dec-2015

220 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer

Fan Ge, Li-San Wang, Junhyong Kim

Mourya Vardhan

Pallapotu, Naga Venkata Alekhya
Ne peru petti chav
mony
petti chacha

Outline

• Controversy : The extent of HGT affecting the core genealogical history• Examination of this controversy by assessing the extent among

core orthologous genes

• A novel statistical method : To asses the extent of HGT based on comparisons of tree topology

Introduction

• Horizontal gene transfer (HGT) refers to the transfer of genes between organisms in a manner other than traditional reproduction.

• Whole genome analyses of different prokaryotes have been thought to indicate rampant HGTs

• There is an on going debate over the estimation of HGT frequency and its impact on phylogeny

• Inference of HGT from tree comparisons should be done under a proper statistical framework

Methodology to assess the extent

• New method to explicitly test for phylogenetic incongruence due to horizontal transfer versus statistical tree errors

• Used Clusters of Orthologous Groups (COG) from NCBI databases• Extracted most reliable COGs

• Built gene tree for every COG and integrated to construct W-G tree

• Comparisons of each gene tree with W-G tree to infer significant HGT

• Augmented this method to pairwise comparisons of gene trees to detect conflicts

High-Quality Gene Groups and the W-G Tree

• COG database is built by redoing sequence comparisons over 43 genomes

• This resulted in retention of 297 high quality COG entries out of 3852

• To approximate the W-G tree, they used median tree estimator

• The estimate used boot strap values from bootstrap sampling

Detection of HGT events

• By comparison of estimated trees against other gene trees or against trees that represent the history of genomes, we infer HGTs

• Discrepancy in the trees maybe caused due to HGT or other errors

• Distance metrics are used to test discrepancies

• The paper explicitly asks if the discrepancies are caused by HGT events, as an additional precaution.

Comparison Metrics

• Maximum agreement subtree (MAST) - If two trees differ by branches, they share common subtree, the bound on size of the shared subtree can be calculated using MAST

• Symmetric Difference (SD) - Difference in the trees can be found by this metric

Interpretation of HGT events…

• Case 1: • If both MAST and SD are low, trees are most likely not different

• Case 2: • If both the metrics are large, can be either HGT events or errors

• Case 3: • But if they have large SD and low MAST values, it is most likely an HGT event.

• Case 4: • Large MAST and low SD cannot occur due to algorithmic reasons

SD and MAST scores for Gene Tree 1 and the W-G tree are 2 and 2, while the scores for Gene Tree 2 and the W-G tree are

8 and 2

The Hypothesis Test• Hypothesis test Ɣ – difference of the two metrics

• Computed by generating null distribution by bootstrapping gene trees

• HGT was inferred when the observed Ɣ was significant with the p-value below the 5% level

• Simulation studies applied to each COG showed it detecting HGT events as follows, in a COG tree using the 5% significanceHGT Events Rates

1 53.8

2 70

3 77.3

• ds is the SD metric

• dm is the MAST metric

• m,n are the no. of branch splits

• X is the no. of taxa

• Used PAUP software to calculate

HGT Estimation via Comparisons between Each Gene Tree and the W-G Tree

• Hypothesis Test was applied to each COG

• Observations showed that the test does not significantly vary with the p-value

• At 5% level, 33/297 (11.1%) COGs showed putative HGTs

• These COGs are termed hCOGs

The Relationship between Detecting COG entries with HGT and the p-Values

HGT Estimation via Comparisons among Gene Trees

• Problem with comparing the Gene tree and W-G tree is that the results are sensitive to the W-G tree

• COG entries do not all share the same taxa

• If its a hCOG, it should test differently for all the comparisons

• 14,004 pairs of gene trees that contained greater than or equal to six shared taxa were compared

• At 5% level, 1,764/14,004 (12.6%) pairs were significant

Identification of transferred branches in gene trees.

• For each COG that tested positive for HGT events, transferred branches were found by exhaustive enumeration of possible subtree matches

• Searched for all combinations of branch prunings to find the ‘‘troublesome’’ branches

• If there’s only one way to prune to make the trees congruent, it is an HGT event

Color HGT Rates

Red >4%

Yellow 3%–4%

Pink 2%–3%

Blue 1%–2%

Green 1%

References

1. Goddard W, Kubicka E, Kubicki G, McMorris FR (1994) The agreement metric for labeled binary trees. Math Biosci 123: 215–226.

2. Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53: 131–147

3. Conover WJ (1999) Practical nonparametric statistics, 3rd ed. New York: Wiley. 584 p.

4. Eisen JA (2000) Horizontal gene transfer among microbial genomes: New insights from complete genome analysis. Curr Opin Genet Dev 10: 606–611

Thank You!

top related