towards a simple, standards-compliant, and generic phylogenetic database

Post on 14-Dec-2014

965 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Towards a Simple, Standards Compliant, and

Generic Phylogenetic Database Module

Hilmar Lapp and Todd VisionNational Evolutionary Synthesis Center

(NESCent)

Rich diversity of online data repositories

Most data is not online

Clark J.R. et al. (2008) A Comparative Study in Ancestral Range Reconstruction Methods: Retracing the Uncertain Histories of Insular Lineages. Systematic Biology,57:5,693-707

Syst. Biol.Data Archive

Little standards support

Accelerating knowledge dissemination: A Story

• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.

• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.

• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.

Accelerating knowledge dissemination: A Story

• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.

• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.

• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.

Accelerating knowledge dissemination: A Story

• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.

• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.

• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.

Accelerating knowledge dissemination: A Story

• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.

• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.

• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.

• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.

• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.

• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.

• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.

• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.

• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.

• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.

• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.

• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.

• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.

• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.

• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

How to get there?

Phylogenetic Database supporting- ontologies

- arbitrary metadata(PhyloDB / BioSQL)

Precompute Query

Optimization

Data loading tools (BioSQL)

Language binding for database model

(BioPerl, Biojava, Biopython, Bioruby)

Topology-oriented Queries

Embeddable Tools

(PhyloWidget,

GBrowse TreeWidget)

Phylogenetic Trees

(Gene, Species)

ITIS, NCBI Taxonomies

Parser libraries for data and semantics

standards (NeXML, CDAO)

Middleware: Query & Persistence Management

Data and other services API (PhyloWS)

supporting exchange standards (NeXML, CDAO)

TaxonomiesCharacter

Data

Metadata (Evolutionary, Biodiversity,

Computational)

Client-based Query

Interfaces

Data Aggregators,

Mash-up Applications

Molecular Data

(Sequences, Annotation)

Ontologies

Data

Management

Tools

Achieving the Vision:Coordinated & open

development,nurturing & harnessing

existing efforts

Database:PhyloDB module

Tree-Name-Identifier-Is_Rooted

Node-Label-Left_Idx-Right_Idx

Edge

Node_Path- distance

Biodatabase

TermTaxon

Bioentry Ontology

-Value-Rank

Node_Qualifier_Value

Tree_Dbxref

-Value-Rank

Edge_Qualifier_Value

Node_Dbxref

-Value-Rank

Tree_Qualifier_Value

-Is_Alternate-Significance

Tree_Root

Dbxref

-Rank

Node_Taxon

-Rank

Node_Bioentry

Semantics: CDAO

http://www.evolutionaryontology.org

Service API: PhyloWShttp://evoinfo.nescent.org/PhyloWS

Nurturing the community

Phyloinformatics Hackathon, Dec 2006

• James Estill (U. Georgia):“A Perl-based Command Line Interface to a Topological Query Application for BioSQL in Support of High Throughput Classification and Analysis of LTR Retrotransposons in Plant Genomes”

Acknowledgments

• Phyloinformatics Hackathon participants

• BioHackathon 2008 participants

• EvoInformatics Working Group participants

• Google Summer of Code Students:Jamie Estill

• Sponsors & support:

• NESCent

• BioSynC

• TDWG

• DBCLS, CBRC (Japan)

top related