presto at wayfair - starburst data...4 1. optimize hive queries 2. set up queues to prioritize batch...
TRANSCRIPT
![Page 1: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/1.jpg)
Presto at WayfairVinay Narayanahttps://
www.linkedin.com/in/vinaynarayana/
@nvinay26
![Page 2: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/2.jpg)
2
1. Problem Statement
2. Why Presto ?
3. Presto at Wayfair
DeploymentAdoption
Performance
Monitoring
4. What’s Next
![Page 3: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/3.jpg)
3
Problem Statement
![Page 4: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/4.jpg)
4
1. Optimize Hive queries
2. Set up queues to prioritize batch jobs
3. Throttle users to 2 ad-hoc hive queries
4. Move jobs from Hive to Spark
5. Conduct SME training session for both Hive and Spark
Remedies
![Page 5: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/5.jpg)
5
● It’s VERY fast!● ANSI SQL Support● Presto can run separately from the storage
HDFS cluster making it great for interactive queries
● Single SQL query to access, combine and analyze data from multiple data sources (unlike Impala)
● Presto is easier to understand and use versus Spark
Why Presto ?
![Page 6: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/6.jpg)
6
Presto At Wayfair
Presto Coordinator
Clients
Hive Metastore Presto Workers
Presto Ad Hoc Cluster
![Page 7: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/7.jpg)
7
Presto At Wayfair
Presto ad hoc (Read Only Cluster)
Version: 0.217
301 VM’s (8*64) with 1 Coordinator, 300 Workers
Total available Memory ~20TB
Total CPU available 2400 vcores
Presto CLI
Presto Ad Hoc Cluster
![Page 8: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/8.jpg)
8
Adoption – before & after
80K Queries
40%
Hive Queries per Month prior to
Presto
Presto’s performance won almost half
of Hive activity in just two months.
Presto Ad Hoc Cluster
![Page 9: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/9.jpg)
9
Presto users growth over
the year
Adoption – after
Presto Ad Hoc Cluster
Presto Queries per Month
6x Growth
![Page 10: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/10.jpg)
10
● SELECT only
● 2 queries per user
● 2 queued queries per user
● Increased the time limit from 5 to 10 mins
Query Throttling
Avg execution time dropped from 51 secs to 20 secs
Presto Ad Hoc Cluster
![Page 11: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/11.jpg)
11
● In beta with 50 nodes ● Limited # of users using a default
namespace● Faster writes/inserts than hive● Resource grouping is enabled via queues
● 10 min limit on query execution time
Presto Read/Write Cluster Beta
![Page 12: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/12.jpg)
12
POC: Starburst Presto Distribution
![Page 13: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/13.jpg)
13
Monitoring Presto
Skynet(internal)
![Page 14: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/14.jpg)
14
What’s Next
Continue
migrating jobs to
Presto
Presto to Tableau
Connector
Presto in Google
Cloud
Rationalize
BigQuery vs
Presto in GCP
![Page 15: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/15.jpg)
15
Questions?
![Page 16: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/16.jpg)
![Page 17: Presto at Wayfair - Starburst Data...4 1. Optimize Hive queries 2. Set up queues to prioritize batch jobs 3. Throttle users to 2 ad-hoc hive queries 4. Move jobs from Hive to Spark](https://reader033.vdocuments.site/reader033/viewer/2022042113/5e8f86d850881d3f1d1537f6/html5/thumbnails/17.jpg)
17
● Overall, very few surprises!
● No official Presto connector for Vertica (very popular at Wayfair) and Vertica internal libraries are closed... so we wrote our own connector
● Performance & unification has become so popular, devs now asking to point their applications to Presto as a data interface layer (evaluating…)
Adoption – surprises
Presto Ad Hoc Cluster