computing the tree of life the university of texas at austin department of computer sciences tandy...

15
Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Upload: scarlett-copeland

Post on 24-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Computing the Tree of Life

The University of Texas at Austin

Department of Computer Sciences

Tandy Warnow

Page 2: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

PhylogenyFrom the Tree of the Life Website,

University of Arizona

Orangutan Gorilla Chimpanzee Human

Page 3: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

DNA Sequence Evolution

AAGACTT

TGGACTTAAGGCCT

-3 mil yrs

-2 mil yrs

-1 mil yrs

today

AGGGCAT TAGCCCT AGCACTT

AAGGCCT TGGACTT

TAGCCCA TAGACTT AGCGCTTAGCACAAAGGGCAT

AGGGCAT TAGCCCT AGCACTT

AAGACTT

TGGACTTAAGGCCT

AGGGCAT TAGCCCT AGCACTT

AAGGCCT TGGACTT

TAGCCCA TAGACTT AGCGCTTAGCACAAAGGGCAT

AGGGCAT TAGCCCT AGCACTT

Page 4: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Molecular Phylogenetics

TAGCCCA TAGACTT TGCACAA TGCGCTTAGGGCAT

U V W X Y

U

V W

X

Y

(Tree is unrooted)

Page 5: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Evolutionary trees and the pharmaceutical industry

• Big genome sequencing projects just produce data -- so what? Evolutionary history relates all organisms and genes, and evolutionary trees are used to make important biological discoveries.

• The pharmaceutical industry uses phylogenies for many applications, such as the development of influenza vaccine!

• Inaccuracies in the phylogenies lead to inaccurate predictions (e.g., vaccines that don’t work, drugs that don’t have the required properties). Current software isn’t accurate enough, or fast enough!

• This means $$$!

Page 6: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

We are world leaders in research in Computational Phylogenetics

• “DCM-boosting” for phylogeny reconstruction - improves accuracy and speeds up heuristics for NP-hard problems (Warnow, UT-Austin)

• GRAPPA -- software for whole genome phylogeny (Moret, UNM)

• Visualization of large trees, and sets of trees (Amenta, UC Davis)

• Phylogenetic databases (Miranker)

Page 7: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

DCM-boosting phylogenetic reconstruction methods[Nakhleh et al. ISMB 2001]

• DCM-boosting makes fast methods more accurate

• DCM-boosting speeds-up heuristics for hard optimization problems

NJ

DCM-NJ

0 400 800 16001200No. Taxa

0

0.2

0.4

0.6

0.8

Err

or R

ate

Page 8: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Whole-Genome Phylogenetics

A

B

C

D

E

F

X

Y

ZW

A

B

C

D

E

F

Page 9: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Benchmark gene order dataset: Campanulaceae

• 12 genomes + 1 outgroup (Tobacco), 105 gene segments• NP-hard optimization problems: breakpoint and inversion

phylogenies

1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)

Page 10: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Benchmark gene order dataset: Campanulaceae

• 12 genomes + 1 outgroup (Tobacco), 105 gene segments• NP-hard optimization problems: breakpoint and inversion

phylogenies

1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)2000: Using GRAPPA v1.1 on the 512-processor Los Lobos

Supercluster machine: 2 minutes (200,000-fold speedup per processor)

Page 11: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Benchmark gene order dataset: Campanulaceae

• 12 genomes + 1 outgroup (Tobacco), 105 gene segments• NP-hard optimization problems: breakpoint and inversion

phylogenies

1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)2000: Using GRAPPA v1.1 on the 512-processor Los Lobos

Supercluster machine: 2 minutes (200,000-fold speedup per processor)

2003: Using latest version of GRAPPA: 2 minutes on a single processor (1-billion-fold speedup per processor)

Page 12: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

GRAPPA (Genome Rearrangement Analysis under Parsimony and other

Phylogenetic Algorithms)http://www.cs.unm.edu/~moret/GRAPPA/

• Heuristics for NP-hard optimization problems

• Fast polynomial time distance-based methods

• Contributors: U. New Mexico,U. Texas at Austin, Universitá di Bologna, Italy

• Fastest and most accurate software for whole genome phylogeny worldwide

Page 13: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Opportunities

• New phylogenetic reconstruction software can improve pharmaceutical R&D (making more accurate solutions achievable in hours or days, rather than months or years)

• Software for researchers is available as free (open source), but users need the latest tools now, with proper interfaces -- business opportunity.

Page 14: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Participants and Funding

• University of Texas Computer Scientists: Warnow, Dhillon, Hunt, and Miranker

• University of Texas biologists: Jansen, Linder, and Hillis

• Other institutions: UNM, UC Davis, Central Washington, CUNY, JGI

• Funding: Three NSF ITR grants, NSF Biocomplexity, David and Lucile Packard Foundation

Page 15: Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow

Phylolab, U. TexasPlease visit us athttp://www.cs.utexas.edu/users/phylo/