first gus workshop july 6-8, 2005

29
First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA

Upload: glenys

Post on 18-Jan-2016

31 views

Category:

Documents


2 download

DESCRIPTION

First GUS Workshop July 6-8, 2005. Penn Center for Bioinformatics Philadelphia, PA. Workshops Goals. Work through issues Installing GUS Loading data into GUS Analyzing and viewing data in GUS Coordinate future development Changes to schema and application framework New plug-ins - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: First GUS Workshop July 6-8, 2005

First GUS WorkshopJuly 6-8, 2005

Penn Center for Bioinformatics

Philadelphia, PA

Page 2: First GUS Workshop July 6-8, 2005

Workshops Goals

Work through issues – Installing GUS– Loading data into GUS– Analyzing and viewing data in GUS

Coordinate future development– Changes to schema and application framework– New plug-ins– New application adapters

Page 3: First GUS Workshop July 6-8, 2005

A Brief History of GUS Genomics Unified Schema

– V1.0 in 2000– Previously had separate databases for:

• Genome annotation• EST assemblies (DoTS)• Microarrays and SAGE (RAD)• Transcription element search software (TESS)

– Strengthen each effort by providing deep annotation• e.g., cDNAs on microarray in RAD get annotation from assemblies in

DoTS

– Learn and store relationships between genes, RNAs, and proteins

• Strong typing: meaningful relationships

Page 4: First GUS Workshop July 6-8, 2005

RAD EST clustering and assembly

DoTS

Genomic alignmentand comparativesequence analysis

Identify sharedTF binding sites

TESS

BioMaterial annotation SRES

Page 5: First GUS Workshop July 6-8, 2005

GUS versus Chado

GUS represents biology in the database tables– Forces applications to load and retrieve

data consistently Chado represents biology in the

applications– Allows flexibility in what can be stored but

applications may not be consistent

Page 6: First GUS Workshop July 6-8, 2005

GUS Project Goals

Provide:– A platform for broad genomics data integration– An infrastructure system for functional

genomics

Support:– Websites with advanced query capabilities– Research driven queries and mining

Page 7: First GUS Workshop July 6-8, 2005

Schemas Domain Features

DoTS Sequence and annotation

EST clustersGene models

RAD Gene expression MIAME

Prot Protein expression

Mass specmzdata

Study Experiments FuGE

TESS Gene Regulation TFBS organization

SRes Shared resources Ontologies

Core Administration Documentation, Data Provenance

GUS 3.5 Schemas

Page 8: First GUS Workshop July 6-8, 2005

DoTS: Central dogma and relating biological sequences

NA Sequence

GeneFeature

RNAFeature

ProteinFeature

AA Sequence

Load GenBank, NRDB, sequencing center files, dbEST entries

Page 9: First GUS Workshop July 6-8, 2005

DoTS: Central dogma and relating biological sequences

Gene RNA Protein

NA Sequence AA Sequence

GeneFeature

RNAFeature

ProteinFeature

Concepts that are independent of any individual sequence because sequences may be incomplete, a variant, or not well annotated.

Page 10: First GUS Workshop July 6-8, 2005

DoTS: Central dogma and relating biological sequences

Gene RNA Protein

NA Sequence AA Sequence

genome

Multiple sequences (experimental variety)

Gene 1 Gene 2

RNA

Multiple genes

Concepts may be related to multiple sequences due to biology, experiments, or computational predictions.

Page 11: First GUS Workshop July 6-8, 2005

DoTS: Central dogma and relating biological sequences

Gene RNA Protein

NA Sequence AA Sequence

GeneInstance

RNAInstance

ProteinInstance

GeneFeature

RNAFeature

ProteinFeature

Instances reflect our understanding of sequence associations.

Page 12: First GUS Workshop July 6-8, 2005

GUS::Supported::LoadArrayDesign

GUS::Supported::Plugin::LoadArrayResults Or GUS::Community::Plugin::LoadBatchArrayResults

GUS::Supported::Plugin::InsertRadAnalysis

Load Array Info

Create new study (web)

Create assays, acquisitions and quantifications

Load quantification data

Load processed data or analysis results

End

RAD::StudyAnnotator::Module IIRAD::StudyAnnotator::Module III

Annotate experimental designand biomaterials (web)

RAD::StudyAnnotator::Module I (all software) Or (some software)GUS::Community::Plugin::InsertMAS5Assay2Quantification or GUS::Community::Plugin::InsertGenePixAssay2Quantification

RAD::StudyAnnotator::Study Form

RAD: Loading/Annotation

Page 13: First GUS Workshop July 6-8, 2005

Prot and Study: Generalization of RAD to other technologies RAPAD prototype made a copy of RAD and

dropped/inserted tables for 2-D gels and mass spec.– Jones et al. Bioinformatics. 2004

In GUS 3.5, Study contains descriptions of samples (BioMaterials), sample protocols, and experimental design.– Technology-specific protocols are in RAD, Prot.

In GUS 3.5, Prot is now based on standard mzdata output of mass spectrometers – To add soon, Peptide identification from programs like

Sequest and MASCOT (held in DoTS currently)

Page 14: First GUS Workshop July 6-8, 2005

TESS: TF to binding site relationships in the context of computational models

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 15: First GUS Workshop July 6-8, 2005

Sequence& Features

Functional Annotation of the Genome

Central Dogma(DoTS)

Regulation (TESS)

Expression (RAD)

Image Analysis

Statistical Processing

Interaction

Proteomics (Prot)

Image Analysis

Statistical Processing

MIAME MIAPE

Experimental Design and Samples (Study)

New schemas for additional

domains

Page 16: First GUS Workshop July 6-8, 2005

Future Schemas

Population genetics– Relate polymorphisms, genotypes, phenotypes– Currently in DoTS

Comparative genomics– Syntenies, phylogenies– Currently in DoTS

Metabolomics– Small molecules– Use Study and adapt Prot

In situs / Immunohistochemistry– Use Study and adapt RAD

Page 17: First GUS Workshop July 6-8, 2005

GUS Components Schema Application Framework

– Object/Relational Layer– Plugin API– Pipeline API

Plug-ins Web Development

Kit (WDK) QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 18: First GUS Workshop July 6-8, 2005

GUS Application Framework

Motivation: Consistent and reusable access and manipulation of data

Object Relational: 1:1 Mapping between tables and language objects

Provides– Relationship Management– Cascading Operations– Cache Management– Basic Access Control

Automation of Data Provenance and Evidence With APIs, foundation for advanced tools and

applications.

Page 19: First GUS Workshop July 6-8, 2005

Web Development Kit (WDK) Database Independent Facilitates development of data mining oriented websites:

– Multiple parameterized canned queries– Sophisticated records– Graphical views– Boolean query facility– Query history– Session management, process pooling, flow control

Model, View, Controller (MVC) Design– Separates application logic (Model) from website layout (View) and

application flow (Controller)– Model: XML-based queries and records– View: JSP– Controller: Struts

Page 20: First GUS Workshop July 6-8, 2005

GUS Version Caveat GUS 3.0 ~ 12/02 GUS 3.1 ~ 12/03 GUS 3.2 ~ 02/04

– Concrete Schema Versions– Application Code in Flux

GUS 3.5 - 6/05– First concrete release with distributable

Proposal: Separate versioning for Schema and Application Framework

Page 21: First GUS Workshop July 6-8, 2005

GUS 3.5 Improved Distribution

– Installer, DBAdmin Tools– Bootstrap Data -- Algorithm Parameters, Core.TableInfo– Plugin Quality -- “New” API, Tested– Documentation -- Install, User’s, and Developer’s Guides– Requisite jars Included -- Oracle, PostgreSQL

Extended Support– PostgreSQL Compatible– Java Object Model -- Consistently Compiles

Schema Improvements– Proteomics Support– Standard Study Support– Schema Cleanup

• Requested schema fixes primarily to DoTS• Removal of deprecated tables -- Workflow

Page 22: First GUS Workshop July 6-8, 2005

GUS 3.? -> 3.5 Migration

Not Trivial– Many potential starting points– Not all data has a migration path

Upgrade Possibilities– In Place Upgrade– Data load and transform– Start New

Possible Routes– GUS DBAdmin Tools– Third party (OEM) Tools– Everyone for themselves

Page 23: First GUS Workshop July 6-8, 2005

GUS 3.5.1

Small Schema Changes– TESS, Attribute Changes

Improved Developer’s and User’s Guides

Additional Supported Plug-ins DBAdmin Code Cleanup Upgrade Scripts Expected early August

Page 24: First GUS Workshop July 6-8, 2005

GUS 4.0 and beyond

Object Layer Improvements– Class::DBI-- Perl O/R Layer– Hibernate -- Java O/R Layer

Improved Subclassing– Multiple Layers– Eliminate Performance Issues

Refactor DoTS Redistribute tables between RAD, Prot, and

Study Additional Biological Domains

Page 25: First GUS Workshop July 6-8, 2005

GUS Project Resources Website -- http://www.gusdb.org

– News, Documentation, Distributable, GUS-based Projects

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 26: First GUS Workshop July 6-8, 2005

GUS Project Resources

Mailing Listhttp://lists.sourceforge.net/lists/listinfo/gusdev-gusdev

– ~ 90 Subscribers– 1700 Messages over 3 years

GUS Wiki -- http://www.gusdb.org/wiki

– User Notes and Documentation• Central Dogma Schema Design• Subclassing System• Data Provenance• Development Tracking: 3.5 Roadmap, 4.0 Schema Ideas• WDK Documentation

Page 27: First GUS Workshop July 6-8, 2005

GUS Project Resources

Subversion Source Control System– Anonymous Read Access for “Bleeding Edge” releases– Web-based Code Review -- https://www.cbil.upenn.edu/svnweb/

– “Commits” Mailing List

Schema Browserhttp://www.gusdb.org/cgi-bin/schemaBrowser

– Online Schema and Relationships Review

GUS Issue Tracker -- https://www.cbil.upenn.edu/tracker/– Bugzilla Based

Page 28: First GUS Workshop July 6-8, 2005

GUS Project Coordination - Areas of Focus Administration

– Installer, Data Bootstrapping, dba Utilities Schema

– Data model, Subclassing Techniques, Data Provenance

Framework– Object/Relational Technologies, Plugin & Pipeline

APIs Plug-in

– Data loading mechanisms

Page 29: First GUS Workshop July 6-8, 2005

GUS Project Coordination - Areas of Focus Documentation

– Installation, User’s, and Developer’s Guides– Wiki

Web Development Kit– Well established working group

Tool adapters– GBrowse, Apollo, etc. Integration

Later: Development Priorities Discussion– Where should we focus our efforts?