Download - Drill / SQL / Optiq
![Page 1: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/1.jpg)
Drill / SQL / Optiq
Julian Hyde
Apache Drill User Group2013-03-13
![Page 2: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/2.jpg)
![Page 3: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/3.jpg)
SQL
![Page 4: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/4.jpg)
SQL: Pros & cons
Fact: SQL is older than Macaulay Culkin
Less interesting but more relevant: Can be written by (lots of) humans Can be written by machines Requires query optimization Allows query optimization Based on “flat” relations and basic relational
operations Can be extended
![Page 5: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/5.jpg)
Quick intro to Optiq
![Page 6: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/6.jpg)
Introducing Optiq
Framework
Derived from LucidDB
Minimal query mediator: No storage No runtime No metadata Query planning engine Core operators & rewrite rules Optional SQL parser/validator
![Page 7: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/7.jpg)
MySQL
Splunk
Expression treeSELECT p.“product_name”, COUNT(*) AS cFROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id”WHERE s.“action” = 'purchase'GROUP BY p.”product_name”ORDER BY c DESC
join
Key: product_id
group
Key: product_nameAgg: count
filter
Condition:action =
'purchase'
sort
Key: c DESC
scan
scan
Table: splunk
Table: products
![Page 8: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/8.jpg)
Splunk
Expression tree(optimized)
SELECT p.“product_name”, COUNT(*) AS cFROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id”WHERE s.“action” = 'purchase'GROUP BY p.”product_name”ORDER BY c DESC
join
Key: product_id
group
Key: product_nameAgg: count
filter
Condition:action =
'purchase'
sort
Key: c DESC
scan
Table: splunk
MySQL
scanTable: products
![Page 9: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/9.jpg)
Metadata SPI
interface Table RelDataType getRowType()
interface TableFunction List<Parameter> getParameters() Table apply(List arguments) e.g. ViewTableFunction
interface Schema Map<String, List<TableFunction>>
getTableFunctions()
![Page 10: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/10.jpg)
Operators and rules
Rule: interface RelOptRule Operator: interface RelNode Core operators: TableAccess, Project,
Filter, Join, Aggregate, Order, Union, Intersect, Minus, Values
Some rules: MergeFilterRule, PushAggregateThroughUnionRule, RemoveCorrelationForScalarProjectRule + 100 more
![Page 11: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/11.jpg)
Planning algorithm
Start with a logical plan and a set of rewrite rules
Form a list of rewrite rules that are applicable (based on pattern-matching)
Fire the rule that is likely to do the most good
Rule generates an expression that is equivalent (and hopefully cheaper)
Queue up new rule matches Repeat until cheap enough
![Page 12: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/12.jpg)
Concepts
Cost Equivalence sets Calling convention Logical vs Physical Traits Implementation
![Page 13: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/13.jpg)
Outside the kernel
SQL parser/validator
JDBC driver SQL function
library (validation + code-generation)
Lingual (Cascading adapter)
Splunk adapter Drill adapter Clone (compact
in-memory tables)
JSON-based catalog
Views
JDBC server
SQL parser /validatorQuery
planner
3rd partydata
3rd partydata
JDBC client
3rd
partyops
3rd
partyops
Optional
Pluggable
Core
MetadataSPI
Pluggablerules
![Page 14: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/14.jpg)
Optiq roadmap
Building blocks for analytic DB: In-memory tables in a distributed cache Materialized views Partitioned tables
Faster planning Easier rule development ODBC driver Adapters for XXX, YYY
![Page 15: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/15.jpg)
Applying Optiq to Drill
1. Enhance SQL2. Query translation
![Page 16: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/16.jpg)
Drill vs Traditional SQL
SQL: Flat data Schema up front
Drill: Nested data (list & map) No schema
We'd like to write: SELECT name, toppings[2] FROM donuts
WHERE ppu > 0.6 Solution: ARRAY, MAP, ANY types
![Page 17: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/17.jpg)
ARRAY & MAP SQL types
ARRAY is like java.util.List MAP is like java.util.LinkedHashMap
Examples: VALUES ARRAY ['a', 'b', 'c'] VALUES MAP ['Washington', 1, 'Obama', 44] SELECT name, address[1], address[2], state
FROM Employee SELECT * FROM Donuts WHERE
CAST(donuts['ppu'] AS DOUBLE) > 0.6
![Page 18: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/18.jpg)
ANY SQL type
ANY means “type to be determined at runtime” Validator narrows down possible type based
on operators used Similar to converting Java's type system into
JavaScript's. (Not easy.)
![Page 19: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/19.jpg)
Sugaring the donut
Query: SELECT c['ppu'], c['toppings'][1] FROM
Donuts
Additional syntactic sugar: c.x means c['x']
So: CREATE TABLE Donuts(c ANY) SELECT c.ppu, c.toppings[1] FROM
Donuts
Better: CREATE TABLE Donuts(_MAP
MAP(VARCHAR TO ANY)) SELECT ppu, toppings[1] FROM Donuts
![Page 20: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/20.jpg)
UNNEST
Employees nested inside departments: CREATE TYPE employee (empno INT, name
VARCHAR(30)); CREATE TABLE dept (deptno INT, name
VARCHAR(30), employees EMPLOYEE ARRAY);
Unnest: SELECT d.deptno, d.name, e.empno, e.name
FROM department AS d CROSS JOIN UNNEST(d.employees) AS e
SQL standard provides other operations on collections:
COLLECT, FUSION, MEMBER OF
More: http://farrago.sourceforge.net/design/CollectionTypes.html
![Page 21: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/21.jpg)
Applying Optiq to Drill
1. Enhance SQL2. Query translation
![Page 22: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/22.jpg)
Query translation
SQL: select d['name'] as name, d['xx'] as xx
from ( select _MAP['donuts'] as d from donuts)where cast(d['ppu'] as double) > 0.6
Drill: { head: { … },
storage: { … }, query: [ { op: “sequence”, do: [ { op: “scan”, … selection: { path: "/donuts.json", … }}, { op: “transform”, transforms: [ { ref: "d", expr: "donuts"} ] }, { op: “filter”, expr: “d.ppu > 0.6” }, { op: “transform”, transforms: [ { ref: “name”, expr: “d.name”, … } ] } ] }
![Page 23: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/23.jpg)
Planner log
Original rel:
AbstractConverter(subset=[rel#14:Subset#3.ARRAY], convention=[ARRAY])
ProjectRel(subset=[rel#10:Subset#3.NONE], NAME=[ITEM($0, 'name')], XX=[ITEM($0, 'xx')])
FilterRel(subset=[rel#8:Subset#2.NONE], condition=[>(CAST(ITEM($0, 'ppu')):DOUBLE NOT NULL, 0.6)])
ProjectRel(subset=[rel#6:Subset#1.NONE], D=[ITEM($0, 'donuts')])
DrillScan(subset=[rel#4:Subset#0.DRILL], table=[[DONUTS, DONUTS]])
…
Cheapest plan:
EnumerableDrillRel
DrillProjectRel(NAME=[ITEM($0, 'name')], XX=[ITEM($0, 'xx')])
DrillProjectRel(D=[ITEM($0, 'donuts')])
DrillFilterRel(condition=[>(CAST(ITEM(ITEM($0, 'donuts'), 'ppu')):DOUBLE NOT NULL, 0.6)])
DrillScan(table=[[DONUTS, DONUTS]])
![Page 24: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/24.jpg)
Next
Translate join, aggregate, sort, set ops Operator overloading with ANY Mondrian on Drill
![Page 25: Drill / SQL / Optiq](https://reader035.vdocuments.site/reader035/viewer/2022081519/558b233dd8b42aa5478b461d/html5/thumbnails/25.jpg)
Thank you!
https://github.com/julianhyde/share/tree/master/slideshttps://github.com/julianhyde/incubator-drillhttps://github.com/julianhyde/optiqhttp://incubator.apache.org/drill
@julianhyde