having a blast data mining in oracle 10g · oracle life sciences user group meeting ... va 2004...
TRANSCRIPT
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Having a BLAST Data Mining in Oracle 10g:
Implementing A Bioinformatics Target Database
John Burke, Ph.D.UCB Research, Inc.
Having a BLAST Data Mining in Oracle 10g:
Implementing A Bioinformatics Target Database
John Burke, Ph.D.UCB Research, Inc.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Having a BLAST Data Mining in Oracle 10gPreviewPreview
UCB Discovery Research
Designing the Target Database
Building the Target Database
Looking Forward
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Discovery ResearchUCB Discovery Research
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Pharma
Discovery ResearchDiscovery Research
StructureChemistry
BiologyN
NCl OOH
O
ClH2
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Pharma Discovery Research
Discovery Research SitesDiscovery Research Sites
Lille
Cambridge Braine-l’Alleud
?
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Pharma Discovery Research
Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Protein db
LS-Graph
MascotProtein Prospector
Biotools
SwisProtGenbank
...
MALDI-TOFQ-TOF
SIMS +ProteinMine
SAN
Custom on Oracle 10g
GeneXpressSpotfire
Sequencher andOmiga
GCG and SeqwebHuman genome browser (UCSC)
UnigeneTIGR
Proteome PSD
GeneXpress Proteinscape
UCB Pharma Discovery Research
Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target DatabaseDesigning the Target Database
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target DatabaseGeneral RequirementsGeneral Requirements
Purpose: to store and manage target discovery research information efficiently and effectively
Scope: corporate, global, multi-project, multi-user
Content: gene and protein targets and ancillary information
Functionality: BLAST search, Web access, Link to other DB and applications
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Text
Dat
a
Searchable fields Searchable fields Im
age
Dat
a
Northern Hybridization image
Western Hybridization
MC ENorthern Tissue ENorthernQPCR
3-D Structure Small molecule hits
(link to compound DB?)
Clone alignment
Designing the Target DatabaseGene name
EST selectedSource of identification
cDNA IMAGE cloneUniGene Hs.#Transcript sizeFull-length cDNA clone name
Reading FramecDNA clone sequences
ORF nucleotide numberORF aa Predicted Size
Protein homologyProtein Sequence
NoteProtein FunctionMouse KOKey Literature
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Data ModelData Model
THERAPEUTIC_AREA* AREA_NAME* FOCUS_GROUP* PROJECTID
PROJECT# PROJECTID* PROJECT_MANAGER* PROJECT_NAME* REVIEW_DATE* STATUS
EXPERIMENT# EXPID* RESEARCHERo CHIPo COMMENTo NOTEBOOB_REFo SPECIES
LITERATURE# LITIDo DATE_PUBLISHEDo JOURNALo LIT_AUTHORo LIT_TITLEo URL
CELL# CELLID* CELL_NAMEo SPECIES
COMPOUND# COMPOUNDID* COMMENT* COMPOUND_NAME* UCB_NUMBER
SOURCE# SOURCEIDo CELLIDo EXPIDo LITIDo MOUSE_KO
GENEALIAS# GENE_NAME# GENE_SYMBOL
IMAGE# IMAGEID* COMMENT* GENE_SYMBOL* PICTURE* TYPE
CDNA# CDNAID* FULL_LENGTH_CDNA* READING_FRAME* SEQUENCEo ORF_NT_NUMBERo ORF_PREDICTED_SIZE
PROTEIN# PROTEINID* GENE_SYMBOL* PROTEIN_FUNCTION* PROTEIN_HOMOLOGY* SEQUENCE* SOURCEIDo COMPONENTo COMPOUNDID
GENE# GID* EST_SELECTED* GENE_SYMBOL* SOURCEID* UNIGENE_HS_NOo ASSAYIDo PROTEINID
involved with
involved with
invloved with
involved
involved
involved with
is
a
is
a
is
a
binds with
binds with
is from
contains
is from
contains
has
hashas
has
represents
represents
expressed by
expresses
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target Database
Typical QueriesTypical Queries
• Find all targets similar to this protein with size x in gate y or therapeutic area z.
• Find all targets with a specified (or unknown) function.
• Find all targets scheduled to be reviewed on a specified date .
• Find all projects and targets managed by a given person.
• Find all targets from Affy study x, or literature search, cell line y or species z.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target Database
Critical Factors in Choosing Oracle 10gCritical Factors in Choosing Oracle 10g
Oracle already a UCB standard
Confidence in Oracle product and support
Smaller resource requirement
Shorter development time
Inclusion of BLAST in database• No need to build interface between DB and BLAST• No need to move data from DB to BLAST• Ability to execute other queries combined with BLAST
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target Database
Core NCBI BLAST SubroutinesCore NCBI BLAST Subroutines
Subroutine Descriptionblastp Compares an amino acid query sequence against a protein
sequence database.
blastn Compares a nucleotide query sequence against a nucleotide sequence database.
blastxCompares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence.
tblastnCompares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.
tblastxCompares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target Database
System ArchitectureSystem Architecture
Application Server 10G
Web Client
OS: Windows XP
Platform: HP Workstation
Web Client
Web Client
Oracle Database 10GOS: Solaris 8
Platform: Sun Enterprise 250
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target DatabaseBuilding the Target Database
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Oracle System Components InstalledOracle System Components Installed
10g Database• Data Mining Option
10g JDeveloper10gAS Infrastructure
• Infrastructure database• OracleAS Identity Management components• OracleAS Metadata Repository
10gAS Middle Tier• J2EE and Web Cache• Portal
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
N-tiered Application ArchitectureN-tiered Application Architecture
Client Tier• Web Browser
Application Server Tier • JSP Pages• Jakarta Struts Framework • BC4J• Java Beans• Portal
EIS Tier• Oracle 10g Database• BLAST Data Mining
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
JSP Model 2 Architecture – MVC PatternJSP Model 2 Architecture – MVC Pattern
Web Browser
Servlet(Controller)
JSP(View)
User Action
ResponseRedirect
Instantiates
Java Beans(Model)
Data
Oracle 10g Database(Database Server)
Web Container(Application Server)
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Page Flow Page Flow
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Classes - Jakarta Struts FrameworkClasses - Jakarta Struts Framework
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
An Issue with SQL in JavaAn Issue with SQL in Java
Nested IN-Clause Statement failed in Java.
OraclePreparedStatement pstmt = (OraclePreparedStatement)conn.prepareStatement("Select genesymbol from proteins where proteinid " +" IN(Select proteinid from projects_proteins where project_projectid " +" IN(Select projectid from projects where status LIKE :1))");
Identical SELECT statement worked in SQL Plus.
Equivalent statement implemented as Stored Procedure.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
An Issue with SQL in JavaAn Issue with SQL in Java
Equivalent Statement as Stored Procedure
SELECT genesymbolFROM proteins,projects,projects_proteins,therapeutic_areasWHERE PROTEINS.PROTEINID =
PROJECTS_PROTEINS.PROTEIN_PROTEINID AND PROJECTS_PROTEINS.PROJECT_PROJECTID = PROJECTS.PROJECTID AND PROJECTS.PROJECTID = THERAPEUTIC_AREAS.PROJECTIDAND (PROJECTS.status = query OR query IS NULL)AND (THERAPEUTIC_AREAS.AREA_NAME = areaName OR areaName IS NULL) ;
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
JSP interacts with Database via Stored ProceduresJSP interacts with Database via Stored Procedures
Use of stored procedures:
Centralizes SQL, facilitating reuse
Allows the DBA to tune SQL statements
Leverages Oracle’s dependency tracking mechanism
Provides greater security since JSP user unable to directly modify base tablesProvides precompiled code
Offers better performance• Stored procedures load once into the shared pool and remain there unless
they become paged out.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
BLASTN Stored ProcedureBLASTN Stored ProcedurePROCEDURE "BLASTNTARGETS" IS
--DECLARET_SEQ_ID blastn.T_SEQ_ID%TYPE;SCORE blastn.SCORE%TYPE;EXPECT blastn.EXPECT%TYPE;
-- Using the default parameters in BlastCURSOR blastn_cursor is
select * from TABLE(BLASTN_MATCH ((select seq_data from targets), CURSOR(selectgenesymbol,clonesequence from cdnas, genes where genes.cdnaid=cdnas.cdnaid))) t
where t.score > 25;
BEGIN--OPEN blastn_cursor;OPEN blastn_cursor;--delete the rows in the blastn tableDELETE FROM BLASTN;LOOP
FETCH blastn_cursor INTO T_SEQ_ID,SCORE,EXPECT;EXIT WHEN blastn_cursor%NOTFOUND;INSERT INTO BLASTN VALUES(T_SEQ_ID,SCORE,EXPECT);
END LOOP; CLOSE blastn_cursor;
END blastntargets;
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
An Issue with 10g ASAn Issue with 10g AS
Attempts to deploy application to AS gave server error.Identifying proper expertise and mode of resolution proved difficult.Teamwork ultimately solved problem.
• Oracle Life Sciences• Oracle Customer Service• OLSUG membership• Oracle Consulting Practice
Solution SIMPLE, but of course NOT OBVIOUS
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
An Issue with 10g ASAn Issue with 10g AS
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Request PageRequest Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
BLAST Query PageBLAST Query Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
BLAST Query PageBLAST Query Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
BLAST Result PageBLAST Result Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Request PageRequest Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Query PageQuery Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Query PageQuery Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Query Result PageQuery Result Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Looking Forward
Short TermShort TermAdditional Features and Improvements• Data input page
• Sexy new name for Target Database
• Integrated BLAST and query
• Report pages
• Integration with other systemsLong TermLong Term
Bioinformatics Portal
Integrated Knowledge Base
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Team
MISMIS ResearchResearch
Prasoon Kejriwal, Cambridge
David Wei, Cambridge
Bob Johnson, Cambridge
Didier Generet, Braine
Didier Chalon, Braine
Karl Nocka, Cambridge
Bob Coopersmith, Cambridge
Zhidong Zhang, Cambridge
Rich Fisher, Cambridge
Pierre Chatelain, Braine