first gus workshop july 6-8, 2005 penn center for bioinformatics philadelphia, pa
TRANSCRIPT
![Page 1: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/1.jpg)
First GUS WorkshopJuly 6-8, 2005
Penn Center for Bioinformatics
Philadelphia, PA
![Page 2: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/2.jpg)
Workshops Goals
Work through issues – Installing GUS– Loading data into GUS– Analyzing and viewing data in GUS
Coordinate future development– Changes to schema and application framework– New plug-ins– New application adapters
![Page 3: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/3.jpg)
A Brief History of GUS Genomics Unified Schema
– V1.0 in 2000– Previously had separate databases for:
• Genome annotation• EST assemblies (DoTS)• Microarrays and SAGE (RAD)• Transcription element search software (TESS)
– Strengthen each effort by providing deep annotation• e.g., cDNAs on microarray in RAD get annotation from assemblies in
DoTS
– Learn and store relationships between genes, RNAs, and proteins
• Strong typing: meaningful relationships
![Page 4: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/4.jpg)
RAD EST clustering and assembly
DoTS
Genomic alignmentand comparativesequence analysis
Identify sharedTF binding sites
TESS
BioMaterial annotation SRES
![Page 5: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/5.jpg)
GUS versus Chado
GUS represents biology in the database tables– Forces applications to load and retrieve
data consistently Chado represents biology in the
applications– Allows flexibility in what can be stored but
applications may not be consistent
![Page 6: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/6.jpg)
GUS Project Goals
Provide:– A platform for broad genomics data integration– An infrastructure system for functional
genomics
Support:– Websites with advanced query capabilities– Research driven queries and mining
![Page 7: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/7.jpg)
Schemas Domain Features
DoTS Sequence and annotation
EST clustersGene models
RAD Gene expression MIAME
Prot Protein expression
Mass specmzdata
Study Experiments FuGE
TESS Gene Regulation TFBS organization
SRes Shared resources Ontologies
Core Administration Documentation, Data Provenance
GUS 3.5 Schemas
![Page 8: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/8.jpg)
DoTS: Central dogma and relating biological sequences
NA Sequence
GeneFeature
RNAFeature
ProteinFeature
AA Sequence
Load GenBank, NRDB, sequencing center files, dbEST entries
![Page 9: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/9.jpg)
DoTS: Central dogma and relating biological sequences
Gene RNA Protein
NA Sequence AA Sequence
GeneFeature
RNAFeature
ProteinFeature
Concepts that are independent of any individual sequence because sequences may be incomplete, a variant, or not well annotated.
![Page 10: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/10.jpg)
DoTS: Central dogma and relating biological sequences
Gene RNA Protein
NA Sequence AA Sequence
genome
Multiple sequences (experimental variety)
Gene 1 Gene 2
RNA
Multiple genes
Concepts may be related to multiple sequences due to biology, experiments, or computational predictions.
![Page 11: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/11.jpg)
DoTS: Central dogma and relating biological sequences
Gene RNA Protein
NA Sequence AA Sequence
GeneInstance
RNAInstance
ProteinInstance
GeneFeature
RNAFeature
ProteinFeature
Instances reflect our understanding of sequence associations.
![Page 12: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/12.jpg)
GUS::Supported::LoadArrayDesign
GUS::Supported::Plugin::LoadArrayResults Or GUS::Community::Plugin::LoadBatchArrayResults
GUS::Supported::Plugin::InsertRadAnalysis
Load Array Info
Create new study (web)
Create assays, acquisitions and quantifications
Load quantification data
Load processed data or analysis results
End
RAD::StudyAnnotator::Module IIRAD::StudyAnnotator::Module III
Annotate experimental designand biomaterials (web)
RAD::StudyAnnotator::Module I (all software) Or (some software)GUS::Community::Plugin::InsertMAS5Assay2Quantification or GUS::Community::Plugin::InsertGenePixAssay2Quantification
RAD::StudyAnnotator::Study Form
RAD: Loading/Annotation
![Page 13: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/13.jpg)
Prot and Study: Generalization of RAD to other technologies RAPAD prototype made a copy of RAD and
dropped/inserted tables for 2-D gels and mass spec.– Jones et al. Bioinformatics. 2004
In GUS 3.5, Study contains descriptions of samples (BioMaterials), sample protocols, and experimental design.– Technology-specific protocols are in RAD, Prot.
In GUS 3.5, Prot is now based on standard mzdata output of mass spectrometers – To add soon, Peptide identification from programs like
Sequest and MASCOT (held in DoTS currently)
![Page 14: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/14.jpg)
TESS: TF to binding site relationships in the context of computational models
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 15: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/15.jpg)
Sequence& Features
Functional Annotation of the Genome
Central Dogma(DoTS)
Regulation (TESS)
Expression (RAD)
Image Analysis
Statistical Processing
Interaction
Proteomics (Prot)
Image Analysis
Statistical Processing
MIAME MIAPE
Experimental Design and Samples (Study)
New schemas for additional
domains
![Page 16: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/16.jpg)
Future Schemas
Population genetics– Relate polymorphisms, genotypes, phenotypes– Currently in DoTS
Comparative genomics– Syntenies, phylogenies– Currently in DoTS
Metabolomics– Small molecules– Use Study and adapt Prot
In situs / Immunohistochemistry– Use Study and adapt RAD
![Page 17: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/17.jpg)
GUS Components Schema Application Framework
– Object/Relational Layer– Plugin API– Pipeline API
Plug-ins Web Development
Kit (WDK) QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 18: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/18.jpg)
GUS Application Framework
Motivation: Consistent and reusable access and manipulation of data
Object Relational: 1:1 Mapping between tables and language objects
Provides– Relationship Management– Cascading Operations– Cache Management– Basic Access Control
Automation of Data Provenance and Evidence With APIs, foundation for advanced tools and
applications.
![Page 19: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/19.jpg)
Web Development Kit (WDK) Database Independent Facilitates development of data mining oriented websites:
– Multiple parameterized canned queries– Sophisticated records– Graphical views– Boolean query facility– Query history– Session management, process pooling, flow control
Model, View, Controller (MVC) Design– Separates application logic (Model) from website layout (View) and
application flow (Controller)– Model: XML-based queries and records– View: JSP– Controller: Struts
![Page 20: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/20.jpg)
GUS Version Caveat GUS 3.0 ~ 12/02 GUS 3.1 ~ 12/03 GUS 3.2 ~ 02/04
– Concrete Schema Versions– Application Code in Flux
GUS 3.5 - 6/05– First concrete release with distributable
Proposal: Separate versioning for Schema and Application Framework
![Page 21: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/21.jpg)
GUS 3.5 Improved Distribution
– Installer, DBAdmin Tools– Bootstrap Data -- Algorithm Parameters, Core.TableInfo– Plugin Quality -- “New” API, Tested– Documentation -- Install, User’s, and Developer’s Guides– Requisite jars Included -- Oracle, PostgreSQL
Extended Support– PostgreSQL Compatible– Java Object Model -- Consistently Compiles
Schema Improvements– Proteomics Support– Standard Study Support– Schema Cleanup
• Requested schema fixes primarily to DoTS• Removal of deprecated tables -- Workflow
![Page 22: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/22.jpg)
GUS 3.? -> 3.5 Migration
Not Trivial– Many potential starting points– Not all data has a migration path
Upgrade Possibilities– In Place Upgrade– Data load and transform– Start New
Possible Routes– GUS DBAdmin Tools– Third party (OEM) Tools– Everyone for themselves
![Page 23: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/23.jpg)
GUS 3.5.1
Small Schema Changes– TESS, Attribute Changes
Improved Developer’s and User’s Guides
Additional Supported Plug-ins DBAdmin Code Cleanup Upgrade Scripts Expected early August
![Page 24: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/24.jpg)
GUS 4.0 and beyond
Object Layer Improvements– Class::DBI-- Perl O/R Layer– Hibernate -- Java O/R Layer
Improved Subclassing– Multiple Layers– Eliminate Performance Issues
Refactor DoTS Redistribute tables between RAD, Prot, and
Study Additional Biological Domains
![Page 25: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/25.jpg)
GUS Project Resources Website -- http://www.gusdb.org
– News, Documentation, Distributable, GUS-based Projects
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 26: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/26.jpg)
GUS Project Resources
Mailing Listhttp://lists.sourceforge.net/lists/listinfo/gusdev-gusdev
– ~ 90 Subscribers– 1700 Messages over 3 years
GUS Wiki -- http://www.gusdb.org/wiki
– User Notes and Documentation• Central Dogma Schema Design• Subclassing System• Data Provenance• Development Tracking: 3.5 Roadmap, 4.0 Schema Ideas• WDK Documentation
![Page 27: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/27.jpg)
GUS Project Resources
Subversion Source Control System– Anonymous Read Access for “Bleeding Edge” releases– Web-based Code Review -- https://www.cbil.upenn.edu/svnweb/
– “Commits” Mailing List
Schema Browserhttp://www.gusdb.org/cgi-bin/schemaBrowser
– Online Schema and Relationships Review
GUS Issue Tracker -- https://www.cbil.upenn.edu/tracker/– Bugzilla Based
![Page 28: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/28.jpg)
GUS Project Coordination - Areas of Focus Administration
– Installer, Data Bootstrapping, dba Utilities Schema
– Data model, Subclassing Techniques, Data Provenance
Framework– Object/Relational Technologies, Plugin & Pipeline
APIs Plug-in
– Data loading mechanisms
![Page 29: First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA](https://reader036.vdocuments.site/reader036/viewer/2022062802/56649e9f5503460f94ba1a0e/html5/thumbnails/29.jpg)
GUS Project Coordination - Areas of Focus Documentation
– Installation, User’s, and Developer’s Guides– Wiki
Web Development Kit– Well established working group
Tool adapters– GBrowse, Apollo, etc. Integration
Later: Development Priorities Discussion– Where should we focus our efforts?