ogsa-dqp steven lynden university of manchester. data access & integration with ogsa-dai: ggf 17...
DESCRIPTION
Data access & integration with OGSA-DAI: GGF 17 3 OGSA-DQP high-level overview OGSA-DQP uses a middleware approach. It can be seen as a mediator over OGSA-DAI wrappers. Usability: use it as an OGSA- DAI data service. DQP is capable of planning, scheduling and executing in parallel the distributed queries Calls to analysis (Web) services can be declared within queries and invoked by DQP. DBMS data OGSA-DQP QueryResults OGSA-DAI DBMS dataTRANSCRIPT
![Page 1: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/1.jpg)
OGSA-DQP
Steven LyndenUniversity of Manchester
![Page 2: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/2.jpg)
Data access & integration with OGSA-DAI: GGF 172
Introduction
OGSA-DQP is a service based distributed query processor It evaluates queries over distributed data sources wrapped by
OGSA-DAI It is built using OGSA-DAI extensibility points People involved:
• University of Manchester• Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou,
Norman Paton
• University of Newcastle• Jim Smith, Arijit Mukherjee, Paul Watson
• OGSA-DAI Prototype release 3.0 available from the OGSA-DAI website
![Page 3: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/3.jpg)
Data access & integration with OGSA-DAI: GGF 173
OGSA-DQP high-level overview
OGSA-DQP uses a middleware approach.
It can be seen as a mediator over OGSA-DAI wrappers.
Usability: use it as an OGSA-DAI data service.
DQP is capable of planning, scheduling and executing in parallel the distributed queries
Calls to analysis (Web) services can be declared within queries and invoked by DQP.
DBMS
data
OGSA-DQP
Query Results
OGSA-DAI
OGSA-DAI
DBMS
data
![Page 4: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/4.jpg)
Data access & integration with OGSA-DAI: GGF 174
OGSA-DQP architecture
OGSA-DAIdata service
perform
EvaluatorQE
EvaluatorQE
EvaluatorQE
The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator”
AKA Grid Query Evaluation Service (GQES)
DQP activities installed
![Page 5: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/5.jpg)
Data access & integration with OGSA-DAI: GGF 175
OGSA-DQP architecture DQP evaluator services:
• Are plain Web services• Implement the QueryEvaluation port type:
•evaluate – the input is a query plan partition which is subsequently executed
•receiveData – allows the evaluator to receive data from other evaluators
OGSA-DAI extensions:• DQP resource – a resource which encapsulates a distributed
query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor.
• OQL query statement activity – enables the submission of a query in Object Query Language (OQL)
• DQP factory activity – enables the creation and configuration of DQP resources.
![Page 6: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/6.jpg)
Data access & integration with OGSA-DAI: GGF 176
Example query Given two DBMSs and one analysis tool (i.e., a Web service):
• goTerm : a table in a GO Gene Ontology database running as a remote mySQL DB, exposed by an OGSA-DAI data service
• protein : a table in a protein sequence DB, exposed by an OGSA-DAI data service
• Blast (sequence alignment scoring Web service); We want to obtain alignment scores for a sequence against proteins of a certain
kind The user submits a single query referencing data stored at multiple sites. The author of the query need not be aware of how/where data is stored. Queries are written in Object Query Language (OQL):
select p.proteinId, Blast(p.sequence)from protein p, goTerm twhere t.termId = ‘GO:0005942’ and p.proteinId=t.proteinId
![Page 7: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/7.jpg)
Data access & integration with OGSA-DAI: GGF 177
Client interaction with OGSA-DQP
Two kinds of client/server interactions:
1. Configuration: the client sends a perform document requesting the service to create a DQP data service resource
2. Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1)
The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource
![Page 8: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/8.jpg)
Data access & integration with OGSA-DAI: GGF 178
DQP configuration
OGSA-DAIdata serviceperform
<perform><DQPFactory>Evaluator URLsOGSA-DAI data service resourcesWeb service URLs</DQPFactory> </perform> OGSA-DAI
data serviceGetRP
DQP factory activity
OGSA-DAIdata service
GetRP
creates
DQP DSR • Global schema of imported DBs & analysis services• Set of evaluators that can be used• Physical DB metadata (used to optimise queries)
Result: resource ID of created DSR
![Page 9: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/9.jpg)
Data access & integration with OGSA-DAI: GGF 179
DQP query evaluation
OGSA-DAIdata serviceperform
<perform><OQLQueryStatement><expression>OQL query</expression></OQLQueryStatement> </perform>
OGSA-DAIdata service
perform
OQLQueryStatement
DQP DSR
EvaluatorQE
transport
OGSA-DAIdata service
perform
Analysisservice
. . .EvaluatorQE
EvaluatorQE
Result: WebRowSet XML Stream
![Page 10: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/10.jpg)
Data access & integration with OGSA-DAI: GGF 1710
OQL Query Statement activity detail
logicaloptimiser
physicaloptimiser
partitioner scheduler evaluators
parsersingle-node
optimiser
multi-nodeoptimiser
query
query results
![Page 11: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/11.jpg)
Data access & integration with OGSA-DAI: GGF 1711
Logical optimisation
Consider the query:select p.proteinId, Blast(p.sequence)from protein p, goTerm twhere t.termId = ‘GO:0005942’ andp.proteinId = t.proteinId
Plan is expressed as a logical algebra
Multiple equivalent plans are generated
scantermId=GO:0005942(goTerm)
scan(protein)
reduce reduce
join(proteinId)
op_call(Blast)
reduce
![Page 12: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/12.jpg)
Data access & integration with OGSA-DAI: GGF 1712
Physical optimisation
Plan is expressed as a physical algebra
Plan is chosen by cost-ranking of equivalent plans
table_scantermId=GO:0005942(goTerm)
table_scan(protein)
reduce reduce
hash_join(proteinId)
op_call(Blast)
reduce
![Page 13: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/13.jpg)
Data access & integration with OGSA-DAI: GGF 1713
Query partitioning
Plan is transformed into a parallel algebra (physical operators + data exchange)
Exchange operators are placed where data exchange must take place
table_scantermId=GO:0005942(goTerm)
table_scan(protein)
reduce reduce
hash_join(proteinId)
op_call(Blast)
reduce
exchange exchange
exchange
![Page 14: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/14.jpg)
Data access & integration with OGSA-DAI: GGF 1714
Query scheduling
Allocate operators to evaluator nodes
table_scantermId=GO:0005942(goTerm)
table_scan(protein)
reduce reduce
hash_join(proteinId)
op_call(Blast)
reduce
exchange exchange
exchange
4,5
2
1 3select p.proteinId, Blast(p.sequence)from protein p, goTerm twhere t.termId = ‘GO:0005942’ andp.proteinId = t.proteinId
partitionedparallelism
pipelinedparallelism
![Page 15: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c027f8b9ab0599f04d4/html5/thumbnails/15.jpg)
Data access & integration with OGSA-DAI: GGF 1715
Conclusion
OGSA-DQP is a service based distributed query processor that is:
• Exposed as a service• Implemented as an orchestration of services
Benefits:• Queries are executed in parallel• OGSA-DAI OGSA-DQP can take advantage of the host of
delivery options provided by OGSA-DAI• Web services can be invoked during query execution, merging
data access with data analysis