a network framework to explore phylogenetic structure in genomic data

11
A network framework to explore phylogenetic structure in genomic data Guifang Zhou 1 , Jeremy Ash 2 , Wen Huang 3 , Melissa Marchand 4 , David Morris 1 , Paul Van Dooren 3 , James C. Wilgenbusch 5 , Jeremy M. Brown 1 , Kyle A. Gallivan 4 1 Department of Biological Sciences, Louisiana State University 2 Bioinformatics Research Center, North Carolina State University 3 ICTEAM Institute, Université catholique de Louvain 4 Department of Mathematics, Florida State University 5 Minnesota Supercomputing Institute, University of Minnesota June 20, 2016 June 20, 2016

Upload: jembrown

Post on 14-Apr-2017

143 views

Category:

Science


5 download

TRANSCRIPT

A network framework to explore phylogeneticstructure in genomic data

Guifang Zhou 1, Jeremy Ash 2, Wen Huang 3, Melissa Marchand 4, David Morris 1,Paul Van Dooren 3, James C. Wilgenbusch 5, Jeremy M. Brown 1, Kyle A. Gallivan 4

1Department of Biological Sciences, Louisiana State University2Bioinformatics Research Center, North Carolina State University

3ICTEAM Institute, Université catholique de Louvain4Department of Mathematics, Florida State University

5Minnesota Supercomputing Institute, University of Minnesota

June 20, 2016

June 20, 2016

Motivations

Phylogenetic analyses often produce large sets ofcompeting trees

Summarize interesting evolutionary history:HybridizationRecombinationHorizontal Gene TransferIncomplete Lineage Sorting

Identify Systematic Error

June 20, 2016

Shortcomings of Current Approaches

Consensus treeDiscards information concerning competing trees

Dimensionality ReductionMay be difficult to interpret

June 20, 2016

Shortcomings of Current Approaches

ClusteringBased on pairwise tree to tree distanceOnly consider nonnegative links

June 20, 2016

Our Approaches

Tree topologies Bipartitions within treetopologies

June 20, 2016

Our Approaches

Apply graph-based methods to understand relationship among:

Tree topologies Bipartitions within treetopologies

June 20, 2016

Application

Yeast dataset with 5 species, 106 loci106 gene trees were reconstructed using maximumparsimony

June 20, 2016

Topology-based Network Analysis

Affinity matrixReciprocal of pairwisedistances

Detect communitiesDiscovered 11 communities

Consensus trees for eachcommunity

Top 2 recovers the top 2candidate species trees 62/106

17/10611/106

4/106

3/106

2/106

2/106

2/106

· · ·

June 20, 2016

Bipartition-based Network Analysis

Covariance matrix based on presence or absence ofbipartitions in the gene trees

June 20, 2016

TreeScaper Software

Available on GitHub

https://github.com/whuang08/TreeScaper

June 20, 2016

Acknowledgements

Computing support from FSU’s Research ComputingCenter and HPC@LSUThe National Science Foundation for funding to supportsome of this work (ABI-1262476)

June 20, 2016