we are developing a web database for plant comparative genomics, named phytome, that, when complete,...
Post on 21-Dec-2015
214 views
TRANSCRIPT
We are developing a web database for plant comparative genomics,named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps and gene phylogenies for over a dozen model plant species. Phytome’s aim is to help users (i) explore relationships among genes/proteins and chromosome segments within and between species and (ii) predict gene content in uncharacterized chromosomal regions. Phytome has been implemented as a relational database that currently allows users to search and retrieve protein sequences, gene families, multiple alignments and phylogenetic trees for nine species. The interface enables the user to obtain customized displays of multiple alignments and phylogenetic trees.
Why a plant comparative genomics database? Comparisons of the composition, organization, and functional components
of genomes are needed to answer many different basic and applied research questions (in areas such as gene prediction, functional sequence annotation and candidate gene identification) [3].
There exist genomic maps and large sequence datasets for a wide variety of plant taxa but these data are currently uncentralized.
Many comparative genomic analyses require intensive computation and use relatively arcane computational methods.
Phytome: A Plant Comparative Genomics DatabasePhytome: A Plant Comparative Genomics DatabaseDihui Lu1,2, Jason Phillips3, Todd Vision2,3
1School of Information and Library Science, 2Program in Bioinformatics and Computational Biology, 3 Dept. of Biology University of North Carolina at Chapel Hill
Why include phylogenetic information? Phylogenetics provides a framework to make predictions about poorly
known species and genes by virtue of their relationship to better known species and genes [3].
Phylogenetic information has not yet been incorporated into major genomic database resources despite its acknowledged utility to the user community. Few phylogenetics database resources exist at all (major exceptions being TreeBase [4] and the Tree of Life Project [1]).
Target Users We are designing Phytome for the following classes of users: Plant breeders who wish to predict the possible location and function of
an unknown marker or DNA/protein sequence. Molecular biologists who are interested in knowing the relationships
among members of a particular gene/protein family. Molecular evolutionists who are interested in genome and chromosomal
evolution.
Further develop tools for searching, browsing, data retrieval; data analysis/mining, and enable users to contribute content.
Refine analysis pipeline for phylogenetics and comparative mapping Increase interconnectivity with related databases Incorporate genomic maps together with analysis and visualization tools
for comparative mapping [e.g. 2].
1) Anonymous (2000) Assembling the Tree of Life II: Research Needs in Phyloinformatics. http://research.amnh.org/biodiversity/center/features/tol.html
2) Calabrese PP, Chakravarty S, Vision TJ 2003. Fast identification and statistical evaluation of segmental homologies in comparative maps. Bioinformatics 19, i74-i80.
3) Eisen JA, Wu M, (2002) Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theoretical Population Biology 61, 481-9.
4) Piel WH, Donoghue M, Sanderson M (2000) TreeBase: A database of Phylogenetic Informaton. Proceedings of the 2nd International Workshop of Species 2000, Tsukuba, Japan.
AcknowledgmentsWe thank Dr. Brad Hemminger for useful guidance. This work is supported
by NSF grant DBI-0227314 to TJV
• Search proteins and families by sequence similarity (BLAST), keywords, database IDs.
Retrieve raw data (proteins, protein families, multiple alignments and phylogenetic trees) from Phytome for local analysis.
Dynamically display multiple alignments and phylogenetic trees for selected proteins within a family.
Cross reference Phytome proteins with other genomic DNA and EST data sources such as GenBank, TIGR, and TAIR.
An authoritative plant organismal phylogeny Gene family information
•Protein-coding DNA sequences from a variety of species.•Pre-computed multiple sequence alignments•Pre-computed gene family phylogenies
Genetic and physical maps for diverse species (giving the locations of genes and other markers along the chromosomes), to be added in the future
Gene Ontology terms, other protein functional annotations, and database cross-references
Phylogenetic tree generated using Drawtree (from PHYLIP package) for selected proteins from family number 300 (putative nucleotide sugar epimerases)
We have developed a user friendly user interface for Phytome. Text search, sequence similarity search and other web applications are available for users to search individual proteins and families and to download data from Phytome.
Abstract
Overview
Web interface
Phytome data pipeline
Current functionality of Phytome
Data stored in Phytome
Future work
Dynamically generated phylogenetic tree
Dynamically generated multiple alignments
References
Identify protein sequencematches (BLAST)
Align proteins within families (CLUSTALW)
Protein sequence prediction(ESTWise)
Cluster proteins into families (TRIBE-MCL)
Estimate phylogenies (PHYLIP)
TIGR Gene Index
GenBank IDsComponent ESTsGene Ontology
Protein sequences
Family clusters
Phylogenetic trees
Phytome
Multiple alignments
Multiple alignments generated using Jalview for selected protein from family number 234 (putative zinc finger proteins)