presto at facebook - presto meetup @ boston (10/6/2015)

Presto @ FacebookMartin Traverso and Dain Sundstrom

Presto @ Facebook

• Ad-hoc/interactive queries for warehouse

• Batch processing for warehouse

• Analytics for user-facing products

• Analytics over various specialized stores

Analytics for Warehouse

ArchitectureUI CLI Dashboards Other tools

Gateway

Presto Presto

WarehouseCluster

Deployment

Presto

HDFS Datanode

Presto

HDFS Datanode

Stats• 1000s of internal daily active users

• Millions of queries each month

• Scan PBs of data every day

• Process trillions of rows every day

• 10s of concurrent queries

Features• Pipelined partition/split enumeration

• Streaming

• Admission control

• Resource management

• System reliability

Batch workloads

Batch Requirements

• INSERT OVERWRITE

• More data types

• UDFs

• Physical properties (partitioning, etc)

Analytics for User-facing Products

Requirements

• Hundreds of ms to seconds latency, low variability

• Availability

• Update semantics

• 10 - 15 way joins

Architecture

Loader

PrestoWorker

Loader

Client

Stats• > 99.99% query success rate

• 100% system availability

• 25 - 200 concurrent queries

• 1 - 20 queries per second

• <100ms - 5s latency

Presto Raptor

Requirements• Large data sets

• Seconds to minutes latency

• Predictable performance

• 5-15 minute load latency

• Reliable data loads (no duplicates, no missing data)

• 10s of concurrent queries

Basic Architecture

Coordinator

MySQL Worker Flash

Worker Flash

Client

But isn’t that exactly what Hive does?

Additional Features• Full featured and atomic DDL

• Table statistics

• Tiered storage

• Atomic data loads

• Physical organization

Table Statistics

• Table is divided into shards

• Each shard is stored in a separate replication unit (i.e., file)

• Typically 1 to 10 million rows

• Node assignment and stats stored in MySQL

Table Schema in MySQL

Tablesid name1 orders2 line_items3 parts

table1 shardsuuid nodes c1_min c1_max c2_min c2_max c3_min c3_max43a5 A 30 90 cat dog 2014 20146701 C 34 45 apple banana 2005 20159c0f A,D 25 26 cheese cracker 1982 1994df31 B 23 71 tiger zebra 1999 2006

Tiered Storage

Coordinator

MySQL Worker Flash

Worker Flash

Client Backup

Tiered Storage

• One copy in local, expensive, flash

• Backup copy in cheap durable backup tier

• Currently Gluster internally, but can be anything durable

• Only assumes GET and PUT with client assigned ID methods

Atomic Data Loads

• Import data periodically from streaming event system

• Internally a Scribe based system similar to Kafka or Kinesis

• Provides continuation tokens

• Loads performed using SQL

Atomic Data Loads

INSERT INTO target SELECT * FROM source_stream WHERE token BETWEEN ${last_token} AND ${next_token}

Loader Process1. Record new job with “now” token in MySQL

2. Execute INSERT from last committed token to “now” token with external batch id

3. Wait for INSERT to commit (check external batch status)

4. Record job complete

5. Repeat

Failure Recovery• Loader crash

• Check status of jobs using external batch id

• INSERT hang

• Cancel query and rollback job (verify status to avoid race)

• Duplicate loader processes

• Process guarantees only one job can complete

• Monitor for lack of progress (catches no loaders also)

Physical Organization• Temporal organization

• Assure files don’t cross temporal boundaries

• Common filter clause

• Eases retention policies

• Sorted files

• Can reduce file sections processed (local stats)

• Can reduce shards processed

Unorganized Data

Sort Columns

Organized Data

Sort Columns

Background Organization• Compaction

• Balance data

• Eager data recover (from backup)

• Garbage collection

• Junk created by compaction, delete, balance, recovery

Future Use Cases• Hot data cache for Hadoop data

• 0-N local copies of “backup” tier

• Query results cache

• Raw, not rolled-up, data store for Sharded MySql customers

• Materialized view store

presto at facebook - presto meetup @ boston (10/6/2015)

Software

elasticsearch in production boston meetup october 2014

presto for the enterprise @ hadoop meetup

ibm watson boston meetup may 27 2015

ceph & openstack - boston meetup

boston bluemix meetup 5/15/14

boston hadoop meetup: presto for the enterprise

openstack boston meetup 12 4-2014

2012 09 maria2012 09 mariadb boston meetup - mariadb 是...

boston cloud foundry meetup 5-22-14

devops introduction - aws boston meetup - aws presentation

boston spark meetup event slides update

presto @ treasure data - presto meetup boston 2015

boston seo meetup presentation

20140120 presto meetup

web performance introduction boston web performance meetup

presto testing tools: benchto & tempto (presto boston meetup...

understanding presto - presto meetup @ tokyo #1

boston design thinking meetup: lego serious play

boston education video meetup trends in education video

docker and openstack boston meetup