we are developing a web database for plant comparative genomics, named phytome, that, when complete,...

1
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps and gene phylogenies for over a dozen model plant species. Phytome’s aim is to help users (i) explore relationships among genes/proteins and chromosome segments within and between species and (ii) predict gene content in uncharacterized chromosomal regions. Phytome has been implemented as a relational database that currently allows users to search and retrieve protein sequences, gene families, multiple alignments and phylogenetic trees for nine species. The interface enables the user to obtain customized displays of multiple alignments and phylogenetic trees. Why a plant comparative genomics database? Comparisons of the composition, organization, and functional components of genomes are needed to answer many different basic and applied research questions (in areas such as gene prediction, functional sequence annotation and candidate gene identification) [3]. There exist genomic maps and large sequence datasets for a wide variety of plant taxa but these data are currently uncentralized. Many comparative genomic analyses require intensive computation and use relatively arcane computational methods. Phytome: A Plant Comparative Genomics Phytome: A Plant Comparative Genomics Database Database Dihui Lu 1,2 , Jason Phillips 3 , Todd Vision 2,3 1 School of Information and Library Science, 2 Program in Bioinformatics and Computational Biology, 3 Dept. of Biology University of North Carolina at Chapel Hill Why include phylogenetic information? Phylogenetics provides a framework to make predictions about poorly known species and genes by virtue of their relationship to better known species and genes [3]. Phylogenetic information has not yet been incorporated into major genomic database resources despite its acknowledged utility to the user community. Few phylogenetics database resources exist at all (major exceptions being TreeBase [4] and the Tree of Life Project [1]). Target Users We are designing Phytome for the following classes of users: Plant breeders who wish to predict the possible location and function of an unknown marker or DNA/protein sequence. Molecular biologists who are interested in knowing the relationships among members of a particular gene/protein family. Molecular evolutionists who are interested in genome and chromosomal evolution. Further develop tools for searching, browsing, data retrieval; data analysis/mining, and enable users to contribute content. Refine analysis pipeline for phylogenetics and comparative mapping Increase interconnectivity with related databases Incorporate genomic maps together with analysis and visualization tools for comparative mapping [e.g. 2]. 1) Anonymous (2000) Assembling the Tree of Life II: Research Needs in Phyloinformatics . http://research.amnh.org/biodiversity/center/feat ures/tol.html 2) Calabrese PP, Chakravarty S, Vision TJ 2003. Fast identification and statistical evaluation of segmental homologies in comparative maps. Bioinformatics 19, i74-i80. 3) Eisen JA, Wu M, (2002) Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theoretical Population Biology 61, 481-9. 4) Piel WH, Donoghue M, Sanderson M (2000) TreeBase: A database of Phylogenetic Informaton. Proceedings of the 2nd International Workshop of Species 2000, Tsukuba, Japan. Acknowledgments We thank Dr. Brad Hemminger for useful guidance. This work is supported by NSF grant DBI-0227314 to TJV Search proteins and families by sequence similarity (BLAST), keywords, database IDs. Retrieve raw data (proteins, protein families, multiple alignments and phylogenetic trees) from Phytome for local analysis. Dynamically display multiple alignments and phylogenetic trees for selected proteins within a family. Cross reference Phytome proteins with other genomic DNA and EST data sources such as GenBank, TIGR, and TAIR. An authoritative plant organismal phylogeny Gene family information •Protein-coding DNA sequences from a variety of species. •Pre-computed multiple sequence alignments •Pre-computed gene family phylogenies Genetic and physical maps for diverse species (giving the locations of genes and other markers along the chromosomes), to be added in the future Gene Ontology terms, other protein functional annotations, and database cross-references Phylogenetic tree generated using Drawtree (from PHYLIP package) for selected proteins from family number 300 (putative nucleotide sugar epimerases) We have developed a user friendly user interface for Phytome. Text search, sequence similarity search and other web applications are available for users to search individual proteins and families and to download data from Phytome. Abstract Overview Web interface Phytome data pipeline Current functionality of Phytome Data stored in Phytome Future work Dynamically generated phylogenetic tree Dynamically generated multiple alignments References Identify protein sequence matches (BLAST) Align proteins within families (CLUSTALW) Protein sequence prediction (ESTWise) Cluster proteins into families (TRIBE-MCL) Estimate phylogenies (PHYLIP) TIGR Gene Index GenBank IDs Component ESTs Gene Ontology Protein sequences Family clusters Phylogenetic trees Phytome Multiple alignments Multiple alignments generated using Jalview for selected protein from family number 234 (putative zinc finger proteins)

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps

We are developing a web database for plant comparative genomics,named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps and gene phylogenies for over a dozen model plant species. Phytome’s aim is to help users (i) explore relationships among genes/proteins and chromosome segments within and between species and (ii) predict gene content in uncharacterized chromosomal regions. Phytome has been implemented as a relational database that currently allows users to search and retrieve protein sequences, gene families, multiple alignments and phylogenetic trees for nine species. The interface enables the user to obtain customized displays of multiple alignments and phylogenetic trees.

Why a plant comparative genomics database? Comparisons of the composition, organization, and functional components

of genomes are needed to answer many different basic and applied research questions (in areas such as gene prediction, functional sequence annotation and candidate gene identification) [3].

There exist genomic maps and large sequence datasets for a wide variety of plant taxa but these data are currently uncentralized.

Many comparative genomic analyses require intensive computation and use relatively arcane computational methods.

Phytome: A Plant Comparative Genomics DatabasePhytome: A Plant Comparative Genomics DatabaseDihui Lu1,2, Jason Phillips3, Todd Vision2,3

1School of Information and Library Science, 2Program in Bioinformatics and Computational Biology, 3 Dept. of Biology University of North Carolina at Chapel Hill

Why include phylogenetic information? Phylogenetics provides a framework to make predictions about poorly

known species and genes by virtue of their relationship to better known species and genes [3].

Phylogenetic information has not yet been incorporated into major genomic database resources despite its acknowledged utility to the user community. Few phylogenetics database resources exist at all (major exceptions being TreeBase [4] and the Tree of Life Project [1]).

Target Users We are designing Phytome for the following classes of users: Plant breeders who wish to predict the possible location and function of

an unknown marker or DNA/protein sequence. Molecular biologists who are interested in knowing the relationships

among members of a particular gene/protein family. Molecular evolutionists who are interested in genome and chromosomal

evolution.

Further develop tools for searching, browsing, data retrieval; data analysis/mining, and enable users to contribute content.

Refine analysis pipeline for phylogenetics and comparative mapping Increase interconnectivity with related databases Incorporate genomic maps together with analysis and visualization tools

for comparative mapping [e.g. 2].

1) Anonymous (2000) Assembling the Tree of Life II: Research Needs in Phyloinformatics. http://research.amnh.org/biodiversity/center/features/tol.html

2) Calabrese PP, Chakravarty S, Vision TJ  2003. Fast identification and statistical evaluation of segmental homologies in comparative maps. Bioinformatics 19, i74-i80.

3) Eisen JA, Wu M, (2002) Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theoretical Population Biology 61, 481-9.

4) Piel WH, Donoghue M, Sanderson M (2000) TreeBase: A database of Phylogenetic Informaton. Proceedings of the 2nd International Workshop of Species 2000, Tsukuba, Japan.

AcknowledgmentsWe thank Dr. Brad Hemminger for useful guidance. This work is supported

by NSF grant DBI-0227314 to TJV

• Search proteins and families by sequence similarity (BLAST), keywords, database IDs.

Retrieve raw data (proteins, protein families, multiple alignments and phylogenetic trees) from Phytome for local analysis.

Dynamically display multiple alignments and phylogenetic trees for selected proteins within a family.

Cross reference Phytome proteins with other genomic DNA and EST data sources such as GenBank, TIGR, and TAIR.

An authoritative plant organismal phylogeny Gene family information

•Protein-coding DNA sequences from a variety of species.•Pre-computed multiple sequence alignments•Pre-computed gene family phylogenies

Genetic and physical maps for diverse species (giving the locations of genes and other markers along the chromosomes), to be added in the future

Gene Ontology terms, other protein functional annotations, and database cross-references

Phylogenetic tree generated using Drawtree (from PHYLIP package) for selected proteins from family number 300 (putative nucleotide sugar epimerases)

We have developed a user friendly user interface for Phytome. Text search, sequence similarity search and other web applications are available for users to search individual proteins and families and to download data from Phytome.

Abstract

Overview

Web interface

Phytome data pipeline

Current functionality of Phytome

Data stored in Phytome

Future work

Dynamically generated phylogenetic tree

Dynamically generated multiple alignments

References

Identify protein sequencematches (BLAST)

Align proteins within families (CLUSTALW)

Protein sequence prediction(ESTWise)

Cluster proteins into families (TRIBE-MCL)

Estimate phylogenies (PHYLIP)

TIGR Gene Index

GenBank IDsComponent ESTsGene Ontology

Protein sequences

Family clusters

Phylogenetic trees

Phytome

Multiple alignments

Multiple alignments generated using Jalview for selected protein from family number 234 (putative zinc finger proteins)