gus 3.0: web sites and tools june 20, 2002 jonathan crabtree [email protected]
TRANSCRIPT
Outline
Current web interfaces examples: allgenes.org, PlasmoDB.org Java Servlet, CGI-based reusable Java and Perl code, install scripts
The future? PHP and JSP "GUSWWW" schema redesign
GUS - Multiple Views & ProjectsAllGenes.org PlasmoDB.orgEPConDB
CoreSResTESSRADDoTS
Oracle RDBMS Perl Object Layer for Data Loading
Java Servlets + Perl CGI
Other sitesOther projects
allgenes.org query:
"Is my cDNA similar to any mouse genes that are predicted to encode transcription factors and have
been localized to mouse chromosome 5?"
Select the allgenes.org boolean query page
Click on the "AND" button
Choose the RH map and GO function queries
Select mouse chromosome 5 and "transcription factor"
There are 22 mouse RNAs (assemblies) that meet these criteria:
This query result set now appears on the query "history" page:
Now use the BLAST page to identify RNAs similar to my cDNA
The results of the BLAST search appear in the query history
Intersect ("AND") the BLAST search with the previous query:
And we have our answer (the third row on the query history page):
Predicted GO function(s)(some manually reviewed)
predicted protein CAP4 assembly EST expression profile UCSC BLAT
Other transcripts fromthe same gene
External links
Mapping information
Protein/motif hits
Gene trap insertions,etc.
PlasmoDB:Combining Expression
and Sequence Data
"List all genes whose proteins are predicted to contain a signal peptide and for which there is
evidence that they are expressed in Plasmodium falciparum's late schizont stage."
Web Interface ComponentsGUS/www/allgenes/htdocs/
GUS/www/allgenes/htdocs/index.html.in...
GUS/www/allgenes/cgi-bin/GUS/www/allgenes/cgi-bin/rnaProtSimPng.pl.in...
GUS/java/cbil/gus/servlet/GUS/java/cbil/gus/servlet/SiteServlet.java...
GUS/www/install/GUS/www/install/allgenes-config.inGUS/www/install/installServlet.pl
GUS/perl/servlet/allgenes/GUS/perl/servlet/allgenes/rnaProtSim.pl.in...
rnaProtSimPng.pl.in#!@PERL@# -------------------------------------------------------------------# rnaProtSimPng.pl## $Revision: 1.3 $ $Date: 2001/03/22 14:44:57 $ $Author: crabtree $# -------------------------------------------------------------------
use strict;require 'cgi_lib.perl';require '@CGI_DIR@/rnaSimilarityPng.pm';
# Input using cgi_lib.perl#my %rq = &get_request();my $naSeqId = $rq{'id'} || 118619;$naSeqId =~ s/[^\d]//g;
my $maxHits = $rq{'max_hits'};$maxHits =~ s/[^\d]//g;
# Generate image using rnaSimilarityPng.pm#$| = 1;my $mapName = "$naSeqId-prot";my $imgData = &getImage($mapName, $naSeqId, 'ExternalAASequence');print "Content-type: image/png\n\n$imgData";
cbil.gus.servlet.SiteServlet
extends javax.servlet.http.HttpServlet and is the only actual servlet in our Java code
reads a configuration file and instantiates the set of JavaBeans defined therein: instances of PageGeneratorI - content generators SqlQuery - parameterized SQL queries "Param" and "Formatter" classes
implements logging, dispatches requests
allgenes-config.in
# Oracle-specific routines#gusOraSql.class=cbil.gus.servlet.db.oracle.SQL
# Set of logins to GUS or GUSdev#gusLogin.class=cbil.gus.servlet.db.ConnectionPoolgusLogin.Login=@[email protected]=@[email protected]=@[email protected]=6gusLogin.MaxQueryTime=120gusLogin.CheckInterval=30gusLogin.JDBCDrivers=oracle.jdbc.driver.OracleDrivergusLogin.Sql=gusOraSqlgusLogin.PrintStatusMessages=true...
# Retrieve an RNA's sequence from the DB#rnaSeqQ.class=SqlQueryrnaSeqQ.DisplayName=RNA sequencernaSeqQ.Name=rnaSeqQrnaSeqQ.Abbrev=rnaSeqrnaSeqQ.SQL=select nas.sequence \ from dots.NASequenceImp nas, dots.ProjectLink pl \ where nas.na_sequence_id = $$0$$ \ and nas.na_sequence_id = pl.id \ and pl.project_id = 813 \ and pl.table_id in (56, 89)rnaSeqQ.HtmlBrief=RNA sequence for RNA DT.<!--ST0-->rnaSeqQ.Params=rnaIDrnaSeqQ.ResultFormatter=rnaSeqF
# RH map location (DOTS only)#rhLocnID.DisplayName=Chromosomal location based on RH mappingrhLocnID.Name=rhmap_locn_idrhLocnID.Abbrev=rhLocnrhLocnID.SQL=select distinct epcr.na_sequence_id \ from dots.EPCR epcr, dots.RHMapMarker rmm, dots.RHMarker rm, dots.ProjectLink pl \ where rmm.chromosome = '$$0$$' and rmm.centirays >= $$1$$ and rmm.centirays <= $$2$$ \ and rm.rh_marker_id = rmm.rh_marker_id \ and rm.taxon_id $$3$$ \ and epcr.map_table_id = 366 \ and rmm.rh_map_marker_id = epcr.map_id \ and epcr.na_sequence_id = pl.id \ and pl.project_id = @PROJECT_ID@ \ and pl.table_id = 56rhLocnID.HtmlBrief=<!--ST3--> RNAs radiation hybrid mapped to \chromosome <!--ST0--> between <!--ST1--> and <!--ST2--> cRrhLocnID.HtmlLong=This query returns DoTS predicted transcripts that can be \linked to a specific chromosomal location by the radiation hybrid map data. A DoTS \predicted transcript consists of an ...rhLocnID.Params=humanOrMouseChromP,centirayStartP,centirayEndP,taxonIdPrhLocnID.ResultFormatter=dotsIdListF1
humanOrMouseChromP.class=EnumParamhumanOrMouseChromP.Prompt=Select a chromosome:humanOrMouseChromP.Description=Human or mouse chromosomehumanOrMouseChromP.Values=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,YhumanOrMouseChromP.Help=Please select a human or mouse chromosome from \the list provided; note that chromosomes 'Y', '20', '21', and '22' are \only valid for humans.
centirayStartP.class=DoubleParamcentirayStartP.Prompt=Start position in centirays:centirayStartP.Description=Start position in centirayscentirayStartP.Min=0.0centirayStartP.Max=10290centirayStartP.Initial=0.0centirayStartP.Help=Enter a "start" position in centirays. The centiray \ is the unit of distance used in radiation hybrid mapping \ assays and the form should indicate the range of values \that are valid for this particular parameter.
"GUSwww" Cache TablesSQL> describe queries; Name Null? Type ----------------------------------------- -------- ---------------------------- QUERY_ID NOT NULL NUMBER(12) SERVLET_NAME NOT NULL VARCHAR2(30) QUERY_NAME NOT NULL VARCHAR2(100) PARAM0 VARCHAR2(100) PARAM1 VARCHAR2(100) . . . PARAM74 VARCHAR2(100) PARAM75 VARCHAR2(100) RESULT_TABLE NOT NULL VARCHAR2(30) START_TIME NOT NULL DATE END_TIME DATE
SQL> describe cache435; Name Null? Type ----------------------------------------- -------- ---------------------------- SPOT_FAMILY_RESULT_ID NOT NULL NUMBER(10) I NUMBER
SQL> describe cache30687; Name Null? Type ----------------------------------------- -------- ---------------------------- NA_SEQUENCE_ID NUMBER(12) I NUMBER(12)
installServlet.pl[crabtree@zeus install]$ ./installServlet.pl \
--port=9000 \--cgiDir=/world/www.allgenes/cgi-bin/ \--htdocsDir=/world/www.allgenes/htdocs \--cgiURL=http://www.allgenes.org/cgi-bin \--htdocsURL=http://www.allgenes.org \--installDir=/world/www.allgenes/servlet \--servletName=allgenes-zeus \--servletFilePrefix=allgenes \--servletConfig=allgenes-zeus \--production \--servletURL=http://www.allgenes.org/gc/servlet
-install htdocs and cgi-bin files perform substitutions defined by 'allgenes-zeus' (e.g. ORA_LOGIN, ORA_PASSWORD, PROJECT_ID)-compile Java code, create .jar file and install-install servlet configuration file
Features of Current [Servlet] Implementation
Automatic generation of HTML FORMs Automated input checking Integrated help features INPUT elements populated from the database
Query history facility Boolean queries (AND, OR, SUBTRACT) Declarative configuration file Base system is relatively independent of GUS
Limitations of Current Implementation
Relatively steep learning curve Monolithic solution
No support for modifying configuration at runtime All objects instantiated when config. file read
Limited ability to customize presentation layer (i.e., HTML) without programming in Java
Technical problems with Servlets/Tomcat Must restart all servlets as a group Not currently using Serializable sessions
Dynamic Web Content
HTML fragments embedded in a program: CGI programs (e.g. Perl - interpreted) Java Servlets (compiled)
Program fragments embedded in HTML: PHP (interpreted) JSP (compiled; once, as needed)
Another axis: persistent vs. not (CGI/FastCGI)
Program Fragments in HTML
Advantages: faster development cycle; can edit in place easier to see/validate structure of HTML pages HTML has no functions, Java and PHP do
Disadvantages: must take care to manage complexity of application
Recommendations: move towards adopting this approach move all persistent state into the database
PHP: PHP Hypertext Processor
http://www.php.net Scripting language; can be embedded in HTML http://www.php.net/usage.php (Netcraft survey):
JSP - Java Server Pages
Based on and can interact with Java Servlets Essentially Java embedded in HTML XML-based tags, scriptlets, and JavaBean calls Standard tag libraries available Pages typically compiled on demand Multiple implementations? (vs. single for PHP)
Next steps
Agree on desired user interface functionality saving queries for PlasmoDB persistent preferences for genome browser
Design parts of the schema to support it Migrate old code/write new code Easier to migrate existing code with JSP