gus 3.0: web sites and tools june 20, 2002 jonathan crabtree [email protected]

30
GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree [email protected]

Upload: ethelbert-barrett

Post on 13-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

GUS 3.0: Web Sites and Tools

June 20, 2002

Jonathan Crabtree

[email protected]

Page 2: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Outline

Current web interfaces examples: allgenes.org, PlasmoDB.org Java Servlet, CGI-based reusable Java and Perl code, install scripts

The future? PHP and JSP "GUSWWW" schema redesign

Page 3: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

GUS - Multiple Views & ProjectsAllGenes.org PlasmoDB.orgEPConDB

CoreSResTESSRADDoTS

Oracle RDBMS Perl Object Layer for Data Loading

Java Servlets + Perl CGI

Other sitesOther projects

Page 4: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

allgenes.org query:

"Is my cDNA similar to any mouse genes that are predicted to encode transcription factors and have

been localized to mouse chromosome 5?"

Page 5: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Select the allgenes.org boolean query page

Click on the "AND" button

Page 6: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Choose the RH map and GO function queries

Select mouse chromosome 5 and "transcription factor"

Page 7: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

There are 22 mouse RNAs (assemblies) that meet these criteria:

This query result set now appears on the query "history" page:

Page 8: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Now use the BLAST page to identify RNAs similar to my cDNA

The results of the BLAST search appear in the query history

Page 9: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Intersect ("AND") the BLAST search with the previous query:

And we have our answer (the third row on the query history page):

Page 10: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Predicted GO function(s)(some manually reviewed)

predicted protein CAP4 assembly EST expression profile UCSC BLAT

Other transcripts fromthe same gene

External links

Mapping information

Protein/motif hits

Gene trap insertions,etc.

Page 11: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

PlasmoDB:Combining Expression

and Sequence Data

"List all genes whose proteins are predicted to contain a signal peptide and for which there is

evidence that they are expressed in Plasmodium falciparum's late schizont stage."

Page 12: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu
Page 13: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu
Page 14: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu
Page 15: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Web Interface ComponentsGUS/www/allgenes/htdocs/

GUS/www/allgenes/htdocs/index.html.in...

GUS/www/allgenes/cgi-bin/GUS/www/allgenes/cgi-bin/rnaProtSimPng.pl.in...

GUS/java/cbil/gus/servlet/GUS/java/cbil/gus/servlet/SiteServlet.java...

GUS/www/install/GUS/www/install/allgenes-config.inGUS/www/install/installServlet.pl

GUS/perl/servlet/allgenes/GUS/perl/servlet/allgenes/rnaProtSim.pl.in...

Page 16: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

rnaProtSimPng.pl.in#!@PERL@# -------------------------------------------------------------------# rnaProtSimPng.pl## $Revision: 1.3 $ $Date: 2001/03/22 14:44:57 $ $Author: crabtree $# -------------------------------------------------------------------

use strict;require 'cgi_lib.perl';require '@CGI_DIR@/rnaSimilarityPng.pm';

# Input using cgi_lib.perl#my %rq = &get_request();my $naSeqId = $rq{'id'} || 118619;$naSeqId =~ s/[^\d]//g;

my $maxHits = $rq{'max_hits'};$maxHits =~ s/[^\d]//g;

# Generate image using rnaSimilarityPng.pm#$| = 1;my $mapName = "$naSeqId-prot";my $imgData = &getImage($mapName, $naSeqId, 'ExternalAASequence');print "Content-type: image/png\n\n$imgData";

Page 17: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

cbil.gus.servlet.SiteServlet

extends javax.servlet.http.HttpServlet and is the only actual servlet in our Java code

reads a configuration file and instantiates the set of JavaBeans defined therein: instances of PageGeneratorI - content generators SqlQuery - parameterized SQL queries "Param" and "Formatter" classes

implements logging, dispatches requests

Page 18: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

allgenes-config.in

# Oracle-specific routines#gusOraSql.class=cbil.gus.servlet.db.oracle.SQL

# Set of logins to GUS or GUSdev#gusLogin.class=cbil.gus.servlet.db.ConnectionPoolgusLogin.Login=@[email protected]=@[email protected]=@[email protected]=6gusLogin.MaxQueryTime=120gusLogin.CheckInterval=30gusLogin.JDBCDrivers=oracle.jdbc.driver.OracleDrivergusLogin.Sql=gusOraSqlgusLogin.PrintStatusMessages=true...

Page 19: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

# Retrieve an RNA's sequence from the DB#rnaSeqQ.class=SqlQueryrnaSeqQ.DisplayName=RNA sequencernaSeqQ.Name=rnaSeqQrnaSeqQ.Abbrev=rnaSeqrnaSeqQ.SQL=select nas.sequence \ from dots.NASequenceImp nas, dots.ProjectLink pl \ where nas.na_sequence_id = $$0$$ \ and nas.na_sequence_id = pl.id \ and pl.project_id = 813 \ and pl.table_id in (56, 89)rnaSeqQ.HtmlBrief=RNA sequence for RNA DT.<!--ST0-->rnaSeqQ.Params=rnaIDrnaSeqQ.ResultFormatter=rnaSeqF

Page 20: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

# RH map location (DOTS only)#rhLocnID.DisplayName=Chromosomal location based on RH mappingrhLocnID.Name=rhmap_locn_idrhLocnID.Abbrev=rhLocnrhLocnID.SQL=select distinct epcr.na_sequence_id \ from dots.EPCR epcr, dots.RHMapMarker rmm, dots.RHMarker rm, dots.ProjectLink pl \ where rmm.chromosome = '$$0$$' and rmm.centirays >= $$1$$ and rmm.centirays <= $$2$$ \ and rm.rh_marker_id = rmm.rh_marker_id \ and rm.taxon_id $$3$$ \ and epcr.map_table_id = 366 \ and rmm.rh_map_marker_id = epcr.map_id \ and epcr.na_sequence_id = pl.id \ and pl.project_id = @PROJECT_ID@ \ and pl.table_id = 56rhLocnID.HtmlBrief=<!--ST3--> RNAs radiation hybrid mapped to \chromosome <!--ST0--> between <!--ST1--> and <!--ST2--> cRrhLocnID.HtmlLong=This query returns DoTS predicted transcripts that can be \linked to a specific chromosomal location by the radiation hybrid map data. A DoTS \predicted transcript consists of an ...rhLocnID.Params=humanOrMouseChromP,centirayStartP,centirayEndP,taxonIdPrhLocnID.ResultFormatter=dotsIdListF1

Page 21: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

humanOrMouseChromP.class=EnumParamhumanOrMouseChromP.Prompt=Select a chromosome:humanOrMouseChromP.Description=Human or mouse chromosomehumanOrMouseChromP.Values=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,YhumanOrMouseChromP.Help=Please select a human or mouse chromosome from \the list provided; note that chromosomes 'Y', '20', '21', and '22' are \only valid for humans.

centirayStartP.class=DoubleParamcentirayStartP.Prompt=Start position in centirays:centirayStartP.Description=Start position in centirayscentirayStartP.Min=0.0centirayStartP.Max=10290centirayStartP.Initial=0.0centirayStartP.Help=Enter a "start" position in centirays. The centiray \ is the unit of distance used in radiation hybrid mapping \ assays and the form should indicate the range of values \that are valid for this particular parameter.

Page 22: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

"GUSwww" Cache TablesSQL> describe queries; Name Null? Type ----------------------------------------- -------- ---------------------------- QUERY_ID NOT NULL NUMBER(12) SERVLET_NAME NOT NULL VARCHAR2(30) QUERY_NAME NOT NULL VARCHAR2(100) PARAM0 VARCHAR2(100) PARAM1 VARCHAR2(100) . . . PARAM74 VARCHAR2(100) PARAM75 VARCHAR2(100) RESULT_TABLE NOT NULL VARCHAR2(30) START_TIME NOT NULL DATE END_TIME DATE

SQL> describe cache435; Name Null? Type ----------------------------------------- -------- ---------------------------- SPOT_FAMILY_RESULT_ID NOT NULL NUMBER(10) I NUMBER

SQL> describe cache30687; Name Null? Type ----------------------------------------- -------- ---------------------------- NA_SEQUENCE_ID NUMBER(12) I NUMBER(12)

Page 23: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

installServlet.pl[crabtree@zeus install]$ ./installServlet.pl \

--port=9000 \--cgiDir=/world/www.allgenes/cgi-bin/ \--htdocsDir=/world/www.allgenes/htdocs \--cgiURL=http://www.allgenes.org/cgi-bin \--htdocsURL=http://www.allgenes.org \--installDir=/world/www.allgenes/servlet \--servletName=allgenes-zeus \--servletFilePrefix=allgenes \--servletConfig=allgenes-zeus \--production \--servletURL=http://www.allgenes.org/gc/servlet

-install htdocs and cgi-bin files perform substitutions defined by 'allgenes-zeus' (e.g. ORA_LOGIN, ORA_PASSWORD, PROJECT_ID)-compile Java code, create .jar file and install-install servlet configuration file

Page 24: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Features of Current [Servlet] Implementation

Automatic generation of HTML FORMs Automated input checking Integrated help features INPUT elements populated from the database

Query history facility Boolean queries (AND, OR, SUBTRACT) Declarative configuration file Base system is relatively independent of GUS

Page 25: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Limitations of Current Implementation

Relatively steep learning curve Monolithic solution

No support for modifying configuration at runtime All objects instantiated when config. file read

Limited ability to customize presentation layer (i.e., HTML) without programming in Java

Technical problems with Servlets/Tomcat Must restart all servlets as a group Not currently using Serializable sessions

Page 26: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Dynamic Web Content

HTML fragments embedded in a program: CGI programs (e.g. Perl - interpreted) Java Servlets (compiled)

Program fragments embedded in HTML: PHP (interpreted) JSP (compiled; once, as needed)

Another axis: persistent vs. not (CGI/FastCGI)

Page 27: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Program Fragments in HTML

Advantages: faster development cycle; can edit in place easier to see/validate structure of HTML pages HTML has no functions, Java and PHP do

Disadvantages: must take care to manage complexity of application

Recommendations: move towards adopting this approach move all persistent state into the database

Page 28: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

PHP: PHP Hypertext Processor

http://www.php.net Scripting language; can be embedded in HTML http://www.php.net/usage.php (Netcraft survey):

Page 29: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

JSP - Java Server Pages

Based on and can interact with Java Servlets Essentially Java embedded in HTML XML-based tags, scriptlets, and JavaBean calls Standard tag libraries available Pages typically compiled on demand Multiple implementations? (vs. single for PHP)

Page 30: GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu

Next steps

Agree on desired user interface functionality saving queries for PlasmoDB persistent preferences for genome browser

Design parts of the schema to support it Migrate old code/write new code Easier to migrate existing code with JSP