february 24, 20061 the design and application of a generic query toolkit presented by: lichun (jack)...

29
February 24, 200 6 1 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A. K. Aggarwal University of Windsor

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 1

The Design And Application Of A Generic Query Toolkit

Presented by: Lichun (Jack) ZhuCourse: 60-520Winter 2006Instructor: Dr. A. K. Aggarwal University of Windsor

Page 2: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 2

Agenda Introduction Contemporary Business Intelligence

Solutions The Design of Generic Query Toolkit Applications Works Undergoing & Future Scope Summary Demo Q & A

Page 3: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 3

Introduction Traditional way of developing

Information Systemswaterfall model. problems:hard coded programs, less flexibility for change of specification

Prototyped developing methodscrew & rise, allows better communication, locate problems as early as possible, needs RAD tool support

My Project - Generic Query Toolkit Extended SQL language(GQL), query interface generator

Page 4: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 4

Related Subject - Contemporary Business Intelligence Solutions

What is Business Intelligence ?BI means the process of turning data into

information, then into knowledge.

It uses all means including data warehousing, data mining, decision support techniques to collect, organize and process the enterprise data.

The goal of BI is to support the analysis & decision process, improve the competitive power of the enterprise.

Page 5: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 5

Contemporary Business Intelligence Solutions

Common BI tools Business Objects Brio Cognos etc

Common features of BI software Customizable report and query interface

automation OLAP / Data Mining Analysis Data Integration Broadcast / Push Information

Page 6: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 6

Contemporary Business Intelligence Solutions

Problems of commercial BI software Highly complicated systems. require sharp

learning curve Too expensive to suit for small projects

Open Source BI software Pentaho Business Intelligence Project Which integrates:

Mondrian OLAP server, Jpivot, Weka Data Mining etc

Page 7: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 7

The Design of Generic Query Toolkit

Architecture GQL Language Specification Components How do all these components work

together?

Page 8: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 8

Architecture

WML/XHTML

MetadataRepositoryp_query,

p_queryq ...

Database/Datamart

GQL App At Server Side

GQL DaemonTODO:Scheduler, Workflow JDBC

HTTP

Application Server - Tomcat

GQL Server(Web ServicesBased on Axis)

GQLViewer(JSP, Servlet,Struts…TODO: Report tools,Jpivot OLAP, Weka

Data Mining)

WAPGateway

HTTP

GQL Parser TODO:Language Extension

Client App Web Browser

File SystemCache Directory(in compressed

XML format)

SOAP/WSDL

HibernateO-R Mapping1.get waiting tasks2.set task status

HibernateO-R Mapping1. get script,2. put into queue3. get task info

Write query resultinto the cachedirectory

Read cachedquery results fordisplay

MobileDevice

Page 9: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 9

GQL Language Specification

The GQL language is an extension on SQL. We use field attributes and query criteria attributes to replace the select-expressions and condition- expressions in SQL statements. Display attributes Field_Attribute ::={Field_Name;

Field_Description; Field_Type; Display_Attribute[;

[Aggregate_Attribute] ; [Key_Attribute ] ] }

Page 10: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 10

GQL Language Specification

Query criteria attributes

Condition_Attribute ::= <Condition_Expression; Condition_Description; Condition_Type [; [Domain]; [Required_Attribute]; [Default_Attribute];

[Hint] ]>

Page 11: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 11

GQL Language Specification

For example:select {id;Item;INTEGER;SHOW;;GROUP}, {mark;Type;STRING;SHOW;;GROUP}, {catelog;Category;STRING;SHOW;;GROUP}, {cdate;Date;DATE;SHOW;;GROUP}, {sum(income) incom;Credit;MONEY;SHOW;SUM}, {sum(outcome) outcom;Debit;MONEY;SHOW;SUM}, {sum((income-outcome)) pure;Net;MONEY;SHOW;SUM} from t_dacewhere <id;Item;INTEGER;#select id,name from t_item where id between 500 and 999 order by id> and <note;Description;STRING> and <mark;Type;STRING;#1> and <catelog;Category;STRING;#3> and <cdate;Date;DATE> and <income*exrate;Credit;MONEY> and <outcome*exrate;Debit;MONEY>group by #1, #2, #3, #4order by #1, #2, #3, #4;

Page 12: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 12

GQL Language Specification

Generated User Interface:

Page 13: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 13

Metadata Repository

Query directory – p_query Task queue – p_queryq

p_query

PK seq

idexplainrefqrypermskindscriptrefnumtemplate

p_queryq

PK uid

FK1 seqidstimeetimecondfldsdatapathstatustellnoerrmsgrefnumserverdatasize

Architecture of GQL Toolkit

Page 14: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 14

Components of GQL Toolkit

GQL Parser (using Jflex, Cup) Parse: generate internal objects that represent fields & criteria attributes XML interface – get, bind Sample XML Schema

<?xml version="1.0" encoding="UTF-8"?><SQLGenerator xmlns="Parameters" seq="1" title="Revenue/Expense Analysis" ...> <Fields> <ID0 fstr="id" fdesc="Item" ftype="INTEGER" fprecision="0" fdatefmt="" fflag="SHOW" fagg="" fkey="GROUP" fappend="" factive="ENABLE" /> <ID1 fstr="mark" fdesc="Type" ftype="STRING" fprecision="0" fdatefmt="" fflag="SHOW" fagg="" fkey="GROUP" fappend="" factive="ENABLE" /> ... <ID6 fstr="sum((income-outcome)) pure" fdesc="Pure" ftype="MONEY" fprecision="2" fdatefmt="" fflag="SHOW" fagg="SUM" fkey="" fappend="" factive="ENABLE" /> </Fields> <Condis> <ID0 fstr="id" fdesc="Item" ftype="INTEGER" fprecision="0" fdatefmt="" facq="" fdefault="" fcomment="" fop="=" fexpflag="0"><fvalue>"501|Cash","502|Saving","503|Checking",...</fvalue> <fexp>501,512<fexp/></ID0> ... <ID6 fstr="outcome*exrate" fdesc="Debit" ftype="MONEY" fprecision="2" fdatefmt="" facq="" fdefault="" fcomment="" fop="=" fexpflag="0"><fvalue /><fexp /></ID6> </Condis></SQLGenerator>

Page 15: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 15

GQL Parser

Execute: generate target SQL statements (using expression reduce algorithm)

Generated Sample SQL

select mark , catelog , sum(income) incom , sum(outcome) outcom , sum((income-outcome)) pure from t_dace where id between 501 and 512 and mark = 'P' and cdate >= '01-01-2006' group by mark , catelog order by mark , catalog

Components of GQL Toolkit

Page 16: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 16

Components of GQL Toolkit

GQL Daemon (using Hibernate, SAX, JDom) Runs background, multi-thread

Procedure run()Begin Set the status of the task to “Running”; Try Get script from corresponding p_query persistence object; Create new instance of GQL Parser class and call its Parse method to parse the script; Get XML schema which stored in condfld attribute from p_queryq persistence object; Call GQLParser.XMLBindFieldsAndConditions to bind the XML schema; Call GQLParser.Execute to get a list of SQL statements; Submit these SQL statements to database server one by one;

Export the query results and save them into the cache directory, as compressed XML document. Set the status of the task to “Success”; Exception

Set the status of the task to “Error” and record the accompany error message; End;End.

Page 17: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 17

GQL Daemon Result data format, compatible with Delphi

Clientdataset

<?xml version="1.0" encoding=”UTF-8” standalone="yes"?><DATAPACKET Version="2.0"> <METADATA> <FIELDS> <FIELD attrname="date_" fieldtype="date" WIDTH="23"/> <FIELD attrname="account_no" fieldtype="string" WIDTH="9"/> <FIELD attrname="trans_num" fieldtype="r8"/> <FIELD attrname="trans_amt" fieldtype="r8" SUBTYPE="Money"/> </FIELDS> <PARAMS LCID="1033"/> </METADATA> <ROWDATA> <ROW date_="20040128" account_no="11000” trans_num="2" trans_amt="240.34" /> <ROW date_="20040129" account_no="11004” trans_num="1" trans_amt="436.40" /> <ROW date_="20040130" account_no="11000” trans_num="2" trans_amt="1240.75" />

… </ROWDATA></DATAPACKET>

Components of GQL Toolkit

Page 18: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 18

Components of GQL Toolkit

GQL Server Using Hibernate and Apache Axis, support

SOAP/WSDL

Providing intermediate Access Service and GQL Service

Access Service including user login, change password, system logging services

GQL Service communicates with presentation layer, co-operates with GQL Daemon and manages the query queue table.

Page 19: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 19

Components of GQL Toolkit

GQL Viewer Currently using Jsp, Servlet, struts, various other TAG

libraries and using XSLT to present the data

Communicate with GQL Server, construct user interface, feed back user input, monitor task queue and display query results.

Connect with Legacy Client Application

Page 20: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 20

Architecture of GQL Toolkit How do all these components work together?

MetadataMetadataQuerys and task queue

GQL ViewerGQL Viewer GQL SererGQL Serer

QueryQuery InterfaceInterface

1.Select a query from directory …1.Select a query from directory …

Return Return XML SchemaXML Schema

Select a Select a query from query from directorydirectory

Call Call getXMLSchemgetXMLSchem

a()a()

Build Query Build Query InterfaceInterface

Page 21: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 21

Architecture of GQL Toolkit

How do all these components work together?

2.Input criteria then submit the 2.Input criteria then submit the query …query …

MetadataMetadataQuerys and task queue

GQL ViewerGQL Viewer GQL SererGQL Serer

Input criteria Input criteria and submitand submit

Call Call CheckCachedQuery() CheckCachedQuery()

using XML Schema bind using XML Schema bind with input datawith input data

XML SchemaXML Schema

Data DisplayData Display

Task MonitorTask Monitor

XML DataXML Data

File SystemFile SystemCache DirectoryCache Directory

If found If found matching matching query and query and

owner choose owner choose to view the to view the data, then data, then return the return the data from data from

cache.cache.

Display Display cached cached

matching matching datadata

Otherwise, Otherwise, put the query put the query

into queue into queue and display and display

the task the task monitormonitor

Page 22: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 22

Architecture of GQL Toolkit How do all these components work together?

DatabaseDatabaseGQL DaemonGQL Daemon

Task MonitorTask Monitor

3.The GQL Daemon detects and runs the 3.The GQL Daemon detects and runs the query …query …

MetadataMetadataQuerys and task queue

XML DataXML Data

GQL ViewerGQL Viewer GQL SererGQL Serer

View the data. View the data. Other actions: Other actions: delete data, delete data,

make footnotemake footnote

Data DisplayData Display

XML Schema XML Schema & Data& Data

Display dataDisplay data

Call ExtractCondflds() & Call ExtractCondflds() & ExtractData() to get ExtractData() to get data. Other actions: data. Other actions:

Cleardata(), MarkQuery()Cleardata(), MarkQuery()

File SystemFile SystemCache DirectoryCache Directory

XML DataXML Data

GQL Daemon detects GQL Daemon detects the waiting task and the waiting task and create a thread to create a thread to

run it. Data result will run it. Data result will be exported to cache be exported to cache

directorydirectory

Page 23: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 23

Applications The Management Information & Report

System for DCC Project – Jiangsu Branch, China Construction Bank, 2003

The Long Credit Card Management Information System (CMIS) of China Construction Bank, 2002

Long Card Data Analysis System – Shanghai Branch, China Construction Bank, 2001

Page 24: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 24

Works undergoing and future scope

GQL Language extension Report template support and multi-

format data export support OLAP support Data mining support WAP support Scheduler and Workflow support GQL Visualized Designer

Page 25: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 25

Summary and Conclusion

Goal Build a testbed for the research of new data

warehousing techniques and testing of new data mining algorithms;

Provide valuable solutions for future commercial use in Business Intelligence area

Page 26: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 26

References1. Tetsuo Tamai, Akito Itou, Requirements and design change in large-

scale-software development: analysis from the viewpoint of process backtracking, Proceedings of the 15th international conference on Software Engineering, p.167-176, May 17-21, 1993, Baltimore, Maryland, United States.

2. M. Golfarelli, S. Rizzi, I. Cella, Beyond Data Warehousing: What's next in business intelligence?, Proceedings 7th International Workshop on Data Warehousing and OLAP (DOLAP 2004), Washington DC, 2004.

3. James Dixon, Pentaho Open Source Business Intelligence Platform Technical White Paper, http://sourceforge.net/project/showfiles.php?group_id=140317, © 2005 Pentaho Corporation.

4. XML for Analysis Specification Version 1.1, http://www.xmla.org/docs_pub.asp, Microsoft Corporation, Hyperion Solutions Corporation, 2002.

Page 27: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 27

References

5. Marenco,L., Tosches,N., Crasto,C., Shepherd,G., Miller,P.L. and Nadkarni,P.M. (2003), Achieving evolvable Web-database bioscience applications using the EAV/CR framework: recent advances, J. Am. Med. Inform. Assoc., 10, 444–453.

6. Hibernate Object-Relational Persistent solution, http://www.hibernate.org

7. Jpiviot Tag Library, http://jpivot.sourceforge.net/

8. Weka Data Mining Software, http://www.cs.waikato.ac.nz/ml/weka/

Page 28: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 28

Demonstration

Page 29: February 24, 20061 The Design And Application Of A Generic Query Toolkit Presented by: Lichun (Jack) Zhu Course: 60-520 Winter 2006 Instructor: Dr. A

February 24, 2006 29

Q & A

Thanking You