ogsa-dai dqp: a developer's view

25
OGSA-DAI DQP OGSA DAI DQP AD l Vi A Developers View Bartosz Dobrzelecki Applications Consultant, EPCC [email protected] +44 131 650 5137

Upload: bartosz-dobrzelecki

Post on 16-Apr-2017

87 views

Category:

Technology


0 download

TRANSCRIPT

OGSA-DAI DQPOGSA DAI DQP

A D l ’ ViA Developer’s View

Bartosz DobrzeleckiApplications Consultant, EPCC

[email protected]+44 131 650 5137

User’s View

DQPConfiguration.xml

<DQPConfiguration xmlns="http://ogsadai.org.uk/dqp/namespaces/2008/12">

<dataResources>< l "htt //l lh t 8080/d i/ i "<resource url="http://localhost:8080/dai/services"

dsos="DataSourceService"drerID="DataRequestExecutionResource"resourceID="MySQLResource"i L l "t "/>isLocal="true"/>

<resource url="http://localhost:8090/dai/services"dsos="DataSourceService"drerID="DataRequestExecutionResource"reso rceID "Reso rce2"/>resourceID="Resource2"/>

<resource url="http://localhost:8095/dai/services"dsos="DataSourceService"drerID="DataRequestExecutionResource"resourceID="MySQLResource"resourceID="MySQLResource"alias="MySQL"/>

</dataResources>

<evaluationResources><evaluationResources><resource url="http://localhost:8085/dai/services"

drerID="DataRequestExecutionResource"/></evaluationResources>

</DQPConfiguration> MySQLResource_employee MySQL_employee

Query processing steps

SQL LQP

LogicalQuery

Plan

SQLParser

SQL queryexpression

LQPBuilder

AbstractSyntax

OptimiserOptimiser

SyntaxTree

Results

OptimiserOptimiser

W kfl PartitionerWorkflowBuilder

PartitionedLQP

Execute OptimisedLQP

LQP

OGSA-DAIRequests

andSub-workflows

4

Query execution

OGSA-DAIRequest

SQLQuery Result

OGSA-DAI-DQP

DQP Coordinator

DB3OGSA-DAI

Data Node 3

DQP Coordinator

OGSA-DAIRequest

data

Sub-Workflow

OGSA-DAIData Node 1

DB1

data OGSA-DAIData Node 2

DB2

Producing Abstract Syntax Tree (AST)

• First step: parse SQL and generate AST.

• We use ANTLR 3 to generate code from grammars.

• Two grammars:– SQL to AST– AST to SQL (tree grammar)

• The tree grammar is used in our OGSA-DAI Views product which implements read only SQL Views by rewriting AST.p y y g

• In DQP the tree grammar is used to generate string representations for column definitions conditions ectrepresentations for column definitions, conditions, ect.

AST is a contract

• We do not expect AST to be changed.

• However, we do provide a mechanism for exposing new operators to the language surface.

SELECT A.aname AS nameFROM aircraft A, certified CWHERE A.aid = C.aid

Relation valued functions

SELECT A.aname AS nameFROM outerUnion((SELECT * FROM aircraft A),(SELECT * FROM certified C), 'ALL') A

Logical Query Plan

• Second step: translate AST to a logical query planplan.

SELECT aname AS name FROM aircraftFROM aircraft WHERE aid = 10

• Operator anatomy• Operator anatomy.

parentAttribute

H di

(name, source, type)Operatorspecific

children

Heading(list of Attributes)

pinternals

c d eOperatorID

Operators

• Behaviour defined in the Operator interfaceValidation checks if operator gets all the input data it needs– Validation – checks if operator gets all the input data it needs, detects missing attributes, ambiguities, deals with correlation, performs type checking.

– Update – updates operator internals after it was (re) connected.

• Operator, Heading, Attribute objects can be annotated with arbitrary annotations (key :String -> value :Object)– Sample uses:

– Attribute is sorted, correlated, temporary– Which physical algorithm for join operator

E ti t d di lit– Estimated cardinality– There will be a set of default annotations

Operator family

• Unary:SELECT

• Binary:– INNER JOIN– SELECT

– PROJECT– RENAME

INNER JOIN– PRODUCT– UNION

– DUPLICATE ELIMINATION– SORT

– INTERSECTION– DIFFERENCE

FULL OUTER JOIN– GROUP BY– SCALAR GROUP BY

ONE ROW ONLY

– FULL OUTER JOIN– [LEFT][RIGHT] OUTER JOIN– [ANTI] SEMI JOIN– ONE ROW ONLY

– TABLE SCAN– EXCHANGE

[ANTI] SEMI JOIN– APPLY– [UNARY][BINARY][SCAN]

REL_FUNCTION

Data Dictionary

• Data Dictionary provides information about federated data resources, available evaluators (DRERs), logicaldata resources, available evaluators (DRERs), logical and physical table schemas.

• It is populated when the resource is initialised.It is populated when the resource is initialised.

• Most of the entries can be annotated– you can plug in your own code to be executed on y p g y

initialisation– you may want to annotate attributes with histograms.

• TABLE SCAN t b ild it H di i d t• TABLE_SCAN operator builds its Heading using data from Data Dictionary (on update).

• Aft bli LQP i lid t d• After assembling LQP is validated.

Optimisation

• After successful validation LQP is optimised by a chain of optimisersoptimisers.

• This chain is defined as part of the Compiler configuration.

• Optimisers need to implement a single method:

Operator optimise(Operator lqpRoot,DataDictionary dataDictionary,DataDictionary dataDictionary,CompilerConfiguration compilerConfiguration)

throws LQPException;

Default optimisers

• Query normalisation + heuristicsRemove redundant operators– Remove redundant operators

– Select Push Down + implicit join detection– Rename Pull Upp– Project Pull Up

• Join orderingg

• Partitioning – finding best places for EXCHANGE operators

TABLE SCAN i l i hi h i• TABLE_SCAN implosion – pushing as much processing as we can to the RDBMS

Normalisation

SELECT Temp.name, Temp.AvgSalaryFROM (

SELECT A.aid, A.aname AS name,AVG (E.salary) AS y( y) y

FROM aircraft A, certified C, employees E

WHERE A.aid = C.aid AND C.eid = E.eid AND A.cruisingrange > 1000

GROUP BY A aid A aname

AST to LQP translator is not trying to be smart GROUP BY A.aid, A.aname

) AS Temptrying to be smart -it takes it easy

LQP is then normalised by a chain yof optimisers

Join Ordering

• Not there yet.

• Will be based on the same cost model as in OGSA-DQP.

• We will also reuse the same algorithm that produces left deep trees.

• More sophisticated models and algorithms (considering• More sophisticated models and algorithms (considering bushy trees, semi joins, etc.) will be implemented later on.

• You can always implement your own and replace the default.

Partitioning optimiser

• Pluggable optimiser decides how to split LQP into partitions by inserting the EXCHANGE operator.

• Default optimiser will put most load on the “local” evaluator (DRER) – otherwise it will choose randomly.

TABLE_SCAN Implosion

• Not there yet.

• We will always try to push as much processing asWe will always try to push as much processing as we can to the RDBMS.

• TABLE_SCAN “eats” as much of a tree as it can _and builds up an equivalent SQL query.

SELECT * FROM (SELECT * FROM aircraftWHERE aircraft.cruisingrange>1000g g

) aircraftJOIN (SELECT * FROM certifiedSELECT FROM certified

) certified ON aircraft.aid=certified.aid

SQL support level of a relational resource

• TABLE_SCAN implosion needs to know what level of SQL is supported by the underlying resource.by the underlying resource.– fully featured RDBMS– simple SQL interface for csv files supporting only simple filtering or records– a web service wrapper

• Relational resources will expose a resource property – a serialised object i l ti SQLS tL l i t f i il t th t d fi d b JDBCimplementing SQLSupportLevel interface similar to that defined by JDBC:

java sql DatabaseMetaDatajava.sql.DatabaseMetaDatapublic boolean supportsColumnAliasing()public boolean supportsCorrelatedSubqueries()public boolean supportsSubqueriesInComparisons()public boolean supportsSubqueriesInExists()...

Executing the plan

• Build phaseEach LQP Operator has associated Activity Pipeline Builder class– Each LQP Operator has associated Activity Pipeline Builder class which takes in Operator and returns Activity Output.

– Most operators can be mapped directly to single Activity.– Some operators may have different implementations (for example join

operator), builder chooses default one or is guided by an Annotation.Operator > Builder class mapping is configurable– Operator -> Builder class mapping is configurable.

• Setup phaseF h EXCHANGE D t S R i t d– For each EXCHANGE Data Source Resource is created.

• Execution phase– All workflows (partitions) are submitted.– Coordinator always executes sub workflow (with at least the

EXCHANGE CONSUMER operator)EXCHANGE_CONSUMER operator)

Extensibility points

• New Operator can be introduced by mapping relation valued function to Operators to Activity Pipeline Builder.p y p

• New Operator can be included in the default query normalisation by providing strategies for SELECT push down, RENAME/PROJECT pull up.

• Optimisation chain is configurable – it is easy to plug in new LQP transformations.

• Alternative physical operator implementations can be introduced by replacing defa lt Acti it Pipeline B ilders annotations can be sed toreplacing default Activity Pipeline Builders – annotations can be used to choose between several implementations.

• Scalar aggregate and relation valued User Defined Functions will beScalar, aggregate and relation valued User Defined Functions will be supported.

Introducing a new operator

SELECT A.aname AS nameFROM outerUnion((SELECT * FROM aircraft A),(SELECT * FROM certified C), 'ALL') A

• LQP Builder will check if there is a mapping from outerUnion > Operator• LQP Builder will check if there is a mapping from outerUnion -> Operator and use Operator object in LQP.

• If there is no mapping look for a relation valued function outerUnion in• If there is no mapping – look for a relation valued function outerUnion in the Function Repository and connect generic RELVAL_FUNCION operator.operator.

CompilerConfiguration.xml

<LQPCompilerConfiguration xmlns="http://ogsadai.org.uk/dqp/namespaces/2008/12">

<builders operator="GROUP_BY“

default="uk org ogsadai dqp execute workflow GroupBy"/>default= uk.org.ogsadai.dqp.execute.workflow.GroupBy />

<builders operator="INNER_THETA_JOIN“

default="uk.org.ogsadai.dqp.execute.workflow.ProductSelect">

<builder name="HASH JOIN“<builder name HASH_JOIN

class="uk.org.ogsadai.dqp.execute.workflow.HashJoin"/>

</builders>

<relationFunction name="outerUnion" operator="OUTER_UNION"/>

<operator name="OUTER_UNION“

class="uk.org.ogsadai.dqp.lqp.operators.extra.OuterUnionOperator"/> g g qp qp p p

<builders operator=“OUTER_UNION“

default="uk.org.ogsadai.dqp.execute.workflow.OuterUnion"/>

<optimisationChain>

<optimiser class="uk.org.ogsadai.dqp.lqp.optimiser.QueryNormaliser" />

<optimiser class="uk.org.ogsadai.dqp.lqp.optimiser.SelectPushDown" />

</optimisationChain>

</LQPCompilerConfiguration>

User Defined Functions

• Three typesScalar SELECT ditDi t ( ‘J h ’) FROM– Scalar SELECT editDistance(a.name, ‘John’) FROM a

– Aggregate SELECT * FROM a HAVING a.age<median(a.age)

– Relation valued– Unary SELECT * FROM sample(a, 0.75)

– Binary SELECT * FROMf (SELECT * FROM ) (SELECT * FROM b))fuse(SELECT * FROM a), (SELECT * FROM b))

– Scan (tuple producing) SELECT * FROM randomInt(0, 10, 1000)

• Implementations of sub interfaces of the Function interface• Implementations of sub interfaces of the Function interface.

• Function Repository is part of the Data Dictionary.

Discovering Evaluator Capabilities

• We assume that every evaluation resource has the same set of activities and UDFsof activities and UDFs.

• Checking if activities are supported is quite easy– Get list of supported activities from each evaluation resource (DRER)– Ask Activity Pipeline Builder for a list of required activities

• Checking for UDF availability is more tricky– Introduce UDF Resource + “GetUDFSchemas” activity– Match by name and parameter list, types, return type– Relation valued functions are problematic – they need to validate

themselves inside LQP and provide headings – this is dynamic –themselves inside LQP and provide headings this is dynamic function schema as a script?