ogsa-dqp steven lynden university of manchester. data access & integration with ogsa-dai: ggf 17...

15
OGSA-DQP Steven Lynden University of Manchester

Upload: melinda-flowers

Post on 20-Jan-2018

212 views

Category:

Documents


0 download

DESCRIPTION

Data access & integration with OGSA-DAI: GGF 17 3 OGSA-DQP high-level overview OGSA-DQP uses a middleware approach. It can be seen as a mediator over OGSA-DAI wrappers. Usability: use it as an OGSA- DAI data service. DQP is capable of planning, scheduling and executing in parallel the distributed queries Calls to analysis (Web) services can be declared within queries and invoked by DQP. DBMS data OGSA-DQP QueryResults OGSA-DAI DBMS data

TRANSCRIPT

Page 1: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

OGSA-DQP

Steven LyndenUniversity of Manchester

Page 2: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 172

Introduction

OGSA-DQP is a service based distributed query processor It evaluates queries over distributed data sources wrapped by

OGSA-DAI It is built using OGSA-DAI extensibility points People involved:

• University of Manchester• Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou,

Norman Paton

• University of Newcastle• Jim Smith, Arijit Mukherjee, Paul Watson

• OGSA-DAI Prototype release 3.0 available from the OGSA-DAI website

Page 3: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 173

OGSA-DQP high-level overview

OGSA-DQP uses a middleware approach.

It can be seen as a mediator over OGSA-DAI wrappers.

Usability: use it as an OGSA-DAI data service.

DQP is capable of planning, scheduling and executing in parallel the distributed queries

Calls to analysis (Web) services can be declared within queries and invoked by DQP.

DBMS

data

OGSA-DQP

Query Results

OGSA-DAI

OGSA-DAI

DBMS

data

Page 4: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 174

OGSA-DQP architecture

OGSA-DAIdata service

perform

EvaluatorQE

EvaluatorQE

EvaluatorQE

The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator”

AKA Grid Query Evaluation Service (GQES)

DQP activities installed

Page 5: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 175

OGSA-DQP architecture DQP evaluator services:

• Are plain Web services• Implement the QueryEvaluation port type:

•evaluate – the input is a query plan partition which is subsequently executed

•receiveData – allows the evaluator to receive data from other evaluators

OGSA-DAI extensions:• DQP resource – a resource which encapsulates a distributed

query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor.

• OQL query statement activity – enables the submission of a query in Object Query Language (OQL)

• DQP factory activity – enables the creation and configuration of DQP resources.

Page 6: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 176

Example query Given two DBMSs and one analysis tool (i.e., a Web service):

• goTerm : a table in a GO Gene Ontology database running as a remote mySQL DB, exposed by an OGSA-DAI data service

• protein : a table in a protein sequence DB, exposed by an OGSA-DAI data service

• Blast (sequence alignment scoring Web service); We want to obtain alignment scores for a sequence against proteins of a certain

kind The user submits a single query referencing data stored at multiple sites. The author of the query need not be aware of how/where data is stored. Queries are written in Object Query Language (OQL):

select p.proteinId, Blast(p.sequence)from protein p, goTerm twhere t.termId = ‘GO:0005942’ and p.proteinId=t.proteinId

Page 7: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 177

Client interaction with OGSA-DQP

Two kinds of client/server interactions:

1. Configuration: the client sends a perform document requesting the service to create a DQP data service resource

2. Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1)

The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource

Page 8: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 178

DQP configuration

OGSA-DAIdata serviceperform

<perform><DQPFactory>Evaluator URLsOGSA-DAI data service resourcesWeb service URLs</DQPFactory> </perform> OGSA-DAI

data serviceGetRP

DQP factory activity

OGSA-DAIdata service

GetRP

creates

DQP DSR • Global schema of imported DBs & analysis services• Set of evaluators that can be used• Physical DB metadata (used to optimise queries)

Result: resource ID of created DSR

Page 9: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 179

DQP query evaluation

OGSA-DAIdata serviceperform

<perform><OQLQueryStatement><expression>OQL query</expression></OQLQueryStatement> </perform>

OGSA-DAIdata service

perform

OQLQueryStatement

DQP DSR

EvaluatorQE

transport

OGSA-DAIdata service

perform

Analysisservice

. . .EvaluatorQE

EvaluatorQE

Result: WebRowSet XML Stream

Page 10: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 1710

OQL Query Statement activity detail

logicaloptimiser

physicaloptimiser

partitioner scheduler evaluators

parsersingle-node

optimiser

multi-nodeoptimiser

query

query results

Page 11: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 1711

Logical optimisation

Consider the query:select p.proteinId, Blast(p.sequence)from protein p, goTerm twhere t.termId = ‘GO:0005942’ andp.proteinId = t.proteinId

Plan is expressed as a logical algebra

Multiple equivalent plans are generated

scantermId=GO:0005942(goTerm)

scan(protein)

reduce reduce

join(proteinId)

op_call(Blast)

reduce

Page 12: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 1712

Physical optimisation

Plan is expressed as a physical algebra

Plan is chosen by cost-ranking of equivalent plans

table_scantermId=GO:0005942(goTerm)

table_scan(protein)

reduce reduce

hash_join(proteinId)

op_call(Blast)

reduce

Page 13: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 1713

Query partitioning

Plan is transformed into a parallel algebra (physical operators + data exchange)

Exchange operators are placed where data exchange must take place

table_scantermId=GO:0005942(goTerm)

table_scan(protein)

reduce reduce

hash_join(proteinId)

op_call(Blast)

reduce

exchange exchange

exchange

Page 14: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 1714

Query scheduling

Allocate operators to evaluator nodes

table_scantermId=GO:0005942(goTerm)

table_scan(protein)

reduce reduce

hash_join(proteinId)

op_call(Blast)

reduce

exchange exchange

exchange

4,5

2

1 3select p.proteinId, Blast(p.sequence)from protein p, goTerm twhere t.termId = ‘GO:0005942’ andp.proteinId = t.proteinId

partitionedparallelism

pipelinedparallelism

Page 15: OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction…

Data access & integration with OGSA-DAI: GGF 1715

Conclusion

OGSA-DQP is a service based distributed query processor that is:

• Exposed as a service• Implemented as an orchestration of services

Benefits:• Queries are executed in parallel• OGSA-DAI OGSA-DQP can take advantage of the host of

delivery options provided by OGSA-DAI• Web services can be invoked during query execution, merging

data access with data analysis