sql for everything at cwt2014
DESCRIPTION
Presto presentation at Cloudera World Tokyo 2014TRANSCRIPT
![Page 1: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/1.jpg)
Masahiro NakagawaNov 6, 2014
Cloudera World Tokyo
SQL for EverythingPresto: Distributed SQL Query Engine
![Page 2: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/2.jpg)
Who are you?
> Masahiro Nakagawa > github/twitter: @repeatedly > Ingress: Blue
> Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer
> I love OSS :) > D language - Phobos committer > Fluentd - Main maintainer > MessagePack / RPC- D and Python (only RPC) > The organizer of Presto Source Code Reading > etc…
![Page 3: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/3.jpg)
SQL on Hadoop?
![Page 4: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/4.jpg)
> Hive > Spark SQL
Batch
Short Batch Low latency
Stream
> Presto > Impala > Drill
> Norikra > StreamSQL
> HAWQ > Actian > etc…
This color indicates a commercial product
SQL Players on Hadoop
Latency: minutes - hours
Latency: seconds - minutes
Latency: immediate
![Page 5: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/5.jpg)
> Hive > Spark SQL
SQL Players on Hadoop
Batch
Short Batch Low latency
Stream
> Presto > Impala > Drill
> HAWQ > Actian > etc…
Red Ocean
Blue Ocean?> Norikra > StreamSQL
This color indicates a commercial product
![Page 7: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/7.jpg)
Presto overview> Open sourced by Facebook
> https://github.com/facebook/presto • github is a primary
> written in Java > latest version is 0.81
> Built-in useful features > Connectors > Machine Learning > Window function > Approximate query > etc…
![Page 8: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/8.jpg)
What’s Presto?
A distributed SQL query engine for interactive data analisys against GBs to PBs of data.
![Page 9: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/9.jpg)
What problems does it solve?> We couldn’t visualize data in HDFS directly
using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an interactive DB for quick response(PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable
> Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
![Page 10: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/10.jpg)
What problems does it solve?> We couldn’t visualize data in HDFS directly
using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an interactive DB for quick response(PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable
> Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
![Page 11: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/11.jpg)
What problems does it solve?> We couldn’t visualize data in HDFS directly
using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an interactive DB for quick response(PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable
> Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
![Page 12: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/12.jpg)
What problems does it solve?> We couldn’t visualize data in HDFS directly
using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an interactive DB for quick response(PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable
> Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
![Page 13: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/13.jpg)
HDFS
Hive Dashboard
Presto
PostgreSQL, etc.
Daily/Hourly Batch
HDFS
HiveDashboard
Daily/Hourly Batch
Interactive query
Interactive query
![Page 14: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/14.jpg)
Presto
HDFS
HiveDashboard
Daily/Hourly BatchInteractive query
Cassandra MySQL Commertial DBs
SQL on any data sets CommercialBI Tools
✓ IBM Cognos✓ Tableau ✓ ...
Data analysis platform
![Page 15: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/15.jpg)
Presto’s deployment> Facebook
> Multiple geographical regions > scaled to 1,000 nodes > actively used by 1,000+ employees > processing 1PB/day
> Netflix, Dropbox, Treasure Data, Airbnb, Qubole, LINE, GREE, Scaleout, etc
> Presto as a Service > Treasure Data, Qubole
![Page 16: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/16.jpg)
PostgreSQL gateway for Presto> A PostgreSQL protocol gateway based on
PostgreSQL’s stable ODBC / JDBC drivers > Developed by Sadayuki Furuhashi
https://github.com/treasure-data/prestogres
![Page 17: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/17.jpg)
Distributed architecture
![Page 18: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/18.jpg)
Client
Coordinator ConnectorPlugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
![Page 19: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/19.jpg)
What’s Connectors?> Access to storage and metadata
> provide table schema to coordinators > provide table rows to workers
> Connectors are pluggable to Presto > written in Java
> Implementations: > Hive(CDH, HDP, Community), Cassandra,
MySQL, JDBC, Kafka, etc… > Or your own connector
• Treasure Data has own connector
![Page 20: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/20.jpg)
Client
Coordinator
otherconnectors
...
Worker
Worker
Worker
Cassandra
Discovery Service
find servers in a cluster
Hive Connector
HDFS / Metastore
Multiple connectors in a query
CassandraConnector
Other data sources...
![Page 21: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/21.jpg)
Distributed architecture
> 3 type of servers: > Coordinator, worker, discovery service
> Get data/metadata through connector plugins. > Presto is NOT a database > Presto provides SQL to existent data stores
> Client protocol is HTTP + JSON > Language bindings:
Ruby, Python, PHP, Java (JDBC), R, Node.JS...
![Page 22: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/22.jpg)
Presto’s execution model
> Presto is NOT MapReduce > Use its own execution engine
> Presto’s query plan is based on DAG > more like Apache Tez / Spark or
traditional MPP databases > Impala and Drill use a similar model
![Page 23: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/23.jpg)
Query Planner
SELECT name, count(*) AS c FROM impressions GROUP BY name
SQL
impressions ( name varchar time bigint)
Table schemaTable scan
(name:varchar)
GROUP BY (name,
count(*))
Output (name, c)
+
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Output
Exchange
Logical query plan
Distributed query plan
![Page 24: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/24.jpg)
Query Planner - Stages
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Output
Exchange
inter-worker data transfer
pipelined aggregation
inter-worker data transfer
Stage-0
Stage-1
Stage-2
![Page 25: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/25.jpg)
Sink
Partial aggr
Table scan
Sink
Partial aggr
Table scan
Execution Planner
+Node list✓ 2 workers
Sink
Final aggr
Exchange
Output
Exchange
Sink
Final aggr
Exchange
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Output
Exchange
Worker 1 Worker 2
![Page 26: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/26.jpg)
All stages are pipe-lined ✓ No wait time ✓ No fault-tolerance
MapReduce vs. Presto
MapReduce Presto
map map
reduce reduce
task task
task task
task
task
memory-to-memory data transfer ✓ No disk IO ✓ Data chunk must fit in memory
task
disk
map map
reduce reduce
disk
disk
Write data to disk
Wait betweenstages
![Page 27: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/27.jpg)
Demo
![Page 28: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/28.jpg)
Presto Meetup
The first half of 2015
![Page 29: SQL for Everything at CWT2014](https://reader033.vdocuments.site/reader033/viewer/2022052908/55957eac1a28ab06038b4780/html5/thumbnails/29.jpg)
Check: treasuredata.com
Cloud service for the entire data pipeline, including Presto