generic model/many/my organism database oct/nov 2007 don gilbert genome informatics lab, biology...

29
generic model/ many/my organism database Oct/Nov 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University [email protected] GMOD

Post on 18-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

genericmodel/many/my organismdatabase

Oct/Nov 2007Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University

[email protected]

GMOD

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

Indiana GMOD Potpourri

Recent Updates for GMOD-CSHL-0711• Genome Grid• GMODTools update• Gene Summary Pages in XML

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

Genome Grid

• Middleware to easily use TeraGrid (& other Grid) for genome analyses

• Give me your genomes to Gridalyze• Collaborators wanted !

• Apply BioMart, Ergatis, LuceGene, Galaxy• Science gateway to use TeraGrid for genome

analyses • Blast: proteome x non-redudant; organisms x genome• gene finders, interproscan, others

gmod.org/Genome_grid

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

GMODTools update

• Update: config for new genome chado dbs (sea urchin, paramecium) • loaded via GMOD gff2chado

• New: GO gene-association output• Please publish your Chado DB

• gmod.org/Public_Chado_Databases• each project chado has variations

• Cleans database contents for public use

• Todo: add gene page xml, others?gmod.org/GMODTools

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

Gene Summary Pages

• Simple, readable XML summarizes gene info.

• In use at Daphnia (wFleaBase.org) base • wfleabase.org/lucegene

/lookup?id=NCBI_GNO_149114

• Created from Chado DB or overloaded GFF• Software is simple Perl lib, XML DTD

• eugenes.org/gmod/gene-report-examples/

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

Gene Page XML<GeneSummary

id="wFleaBase:NCBI_GNO_200214"><Type>Gene Summary</Type> <BASIC_INFORMATION> <Date>2007-Sep-02</Date> <GeneID>NCBI_GNO_200214</GeneID> <Species>Daphnia pulex</Species> </BASIC_INFORMATION> <GENE_ONTOLOGY> <terms> <goterm id="GO:0016021">C:integral to

membrane</goterm> <goterm id="GO:0001584">F:rhodopsin-like

receptor activity</goterm> <goterm id="GO:0007186">P:G-protein coupled

receptor protein signalin...</goterm> <goterm

id="GO:0007602">P:phototransduction</goterm>

</terms> </GENE_ONTOLOGY>

<SIMILAR_GENES> <Similarity> <Description>Rh3-PA</Description> <Species>Drosophila virilis</Species> <db_xref>UniProt:Q8I138</db_xref> </Similarity> </SIMILAR_GENES> <FUNCTION> <Expression type="biotic">Bacterial

infection</Expression><Protein_domains> <db_xref>Pfam:PF00001 7tm_1</db_xref> </Protein_domains> </FUNCTION> <REAGENTS> <Reagent type="EST"> <db_xref>WFes0143594</db_xref> </Reagent> </REAGENTS></GeneSummary>

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

.. on to Introduction to GMOD ..

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Generic Model Organism Database • Built by and for many contributing projects

• Loosely coupled tool kit• Work as separate parts and together

• Complex and simple• No more complex than necessary; complexity is part of this

territory.

GMOD Introduction

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• New Genome?• Draft assembly in parts; many computed annotations;

little literature;

• Known Genome?• Large literature base; rich and complex biology

knowledge;

• Lab integration?• Support and integrate with focused lab

research project

Your project needs?

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• gmod.org/Getting Started• Documentation is now rich and improving• Installation options:

• distribution tar-ball • Virtual Machine-Ware for demo• YUM Unix packages

Getting Started w/ GMOD

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Chado – database schema and middleware • GBrowse – Web-based genome annotation

viewing• Apollo – Desktop-based genome

annotation editing• CMap – Web-based comparative map

viewing • BioMart – Genome data mining from

Ensembl/GMOD

GMOD Components

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Chado - Getting Started• gmod.org/Chado_Manual

modules, conventions, design principles• Worked examples @ gmod.org

Load_RefSeq_Into_Chado

Load_BLAST_Into_Chado

Sample_Chado_SQL

Chado Database How-To

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Modularity: inherent Chado schema, core module, biology groupings, with common structure.

• Ontologies: standard biology vocabularies a core of Chado design.

• Associated software: Perl and Java middleware, stand-alone programs with Chado adaptors.

Chado Design

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Complexity and Detail: inherent in genome data, Chado embraces with room to grow, plus long-term stability.

• Data Integration: key component of Chado, public and lab data sets can be combined.

• Support: shared responsibility among the GMOD community.

Chado Design [2]

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• CV: Controlled vocabularies and ontologies• Sequence: Biological sequences and objects

which can be localized on them • Companalysis: Adjunct to sequence module for in-

silico analysis • Map: Adjunct to sequence module for non-sequence

localization

• Organism: Taxonomy / species information • Pub: Publication / Biblio. / Reference information • General: General information / database cross-

references

Chado Schema: Core

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Expression: Transcript and protein expression events

• Mage: for microarray data• Genetics: Genetic/phenotypic interactions in

genotypic/environmental context • Phenotype: for phenotypic data • Library: for descriptions of molecular libraries• Phylogeny: for organisms and phylogenetic trees• Stock: for specimens and biological collections • Contact: for people, groups, and organizations

Chado Schema: More

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• GFF to Chado data loader, with BioPerl extensions (GenBank2GFF -> Chado , …)

• GMODTools - Output Bulk genome data• XORT - Chado XML input and output • Modware - OO-Perl Chado access

package (in/out)• Java middleware (Hibernate; others)

Chado Middleware

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Sybil – Web-based synteny viewing at gene & chromosome level

• Turnkey – “Skinable” Chado-based web site • Pathway Tools – metabolic pathways• PubFetch – Literature management• Textpresso – Automatic paper classification • LuceGene - Genome object/text/web search

system

GMOD Components [2]

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Wikipedia Community Annotation (in development; EcoliWiki ++)

• Comparative visualization - SynBrowse & SynView

• Genome grid - Teragrid methods for genome computations (in dev.)

GMOD Components [3]

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

WikiGenomes (ecoliwiki.net)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

Database Frameworks:• VMWare: virtual machine package with

basic GMOD components for demo• YUM distribution package• ARGOS : replication framework for genome

databases

GMOD Components [4]

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Core: PostgreSQL database; Chado Schema; Sequence & OBO Ontologies

• System: Apache web server; Unix; BioPerl; …• Load data: GFF to Chado• View: Gbrowse (Chado; MySql; ..)• Edit/Update: Apollo, Wiki (coming), bulk-file

updates• Output: BulkFiles; BioMart;

Putting GMOD together

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

Example new MOD

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• New Genome? Known? Lab integration?• Assess your customer needs

• Full database/toolset is overkill for some

• Loosely coupled tools; complex and simple• Pick the parts you need

• Learn tools with examples first

Recap:Your project needs?

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Genome Annotations• Proteome annotations, EST/cDNA, gene

predictions, RNA, transposon, promotor, etc.

• Database cross-refs: UniProt, Gene Ontology, KEGG, KOG, etc.

• Web-Database• Gbrowse maps, Blast server with Chado

output, Gene detail reports, BioMart data mining; Wikipedia community editing

Chado-centric Genome

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• Current components• Need adopters to share effort• Re-use rather than re-invent• Describe : GMOD.org Wiki needs more examples

• New components• Discuss with other projects: common need?• Shared specifications, use cases• GMOD recommended practices

Contributing to GMOD

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf

• https://lists.sourceforge.net/lists/listinfo/ • gmod-announce • gmod-schema All Chado schema issues• gmod-gbrowse GBrowse mailing list• gmod-devel General development• Related: Ontologies (SO, OBO); BioPerl;

Apollo; Biomart;

Active GMOD Mailing Lists

http://eugenes.org/gmod/docs/gmod-intro-07oct.pdf