presto summit nyc 2019 - starburst data€¦ · slack’s internal analytics portal - product...
TRANSCRIPT
![Page 1: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/1.jpg)
Presto Summit NYC 2019December 11, 2019Slack handles: @cheolsoo; @abhonsuleslack-corp.com
![Page 2: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/2.jpg)
![Page 3: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/3.jpg)
Mission
Make people’s working lives simpler, more pleasant and more productive.
![Page 4: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/4.jpg)
Slack
![Page 5: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/5.jpg)
215B +270M 700B 250B
Logs Daily Messages Daily Records Messages Table
Data Engineering at SlackCustodian of all data generated within Slack, the product. We provide the infrastructure and tooling necessary for
stakeholders to reliably access product data for user facing features, product and business insights.
![Page 6: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/6.jpg)
Databooks
AB Testing framework
BI portal
Presto
Airflow
Analytics.ts
Sqooper
Slack’s AB testing/ Experiments framework
Tool used by Analysts, Data scientists, Marketing, Sales, Finance
BI tool used by Corp/ Biztech
Batch ingestion system
Slack’s internal analytics portal -
Product Managers, Engineers, Analysts,
Data scientists, Sales, Marketing, Finance
DAGs running on ETL scheduling system
Presto at Slack
clog queriesQuery client logs
![Page 7: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/7.jpg)
Presto at Slack
Past Present Future
Presto on EMRSingle cluster
Starburst on EC2Multiple clusters Federated clusters
![Page 8: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/8.jpg)
Query success rate
![Page 9: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/9.jpg)
Query count
![Page 10: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/10.jpg)
Multiple clusters
● Static load balancing
● Per cluster config properties
● Per cluster capacity planning
![Page 11: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/11.jpg)
Shadow clusters
● Read-only shadow cluster in parallel
● Useful for testing config changes or version upgrades
![Page 12: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/12.jpg)
Terraform module
● Provision a cluster with 25-lines of code
● ASG optionally with spot
● Dedicated HMS per cluster
![Page 13: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/13.jpg)
Resource groups ● Per cluster resource
groups config● Per group
scheduling policies config
● Fair (ad-hoc) vs weighted_fair (etl)
● Per cluster resource groups
● Per group scheduling policies
● Fair (ad-hoc) vs weighted_fair (etl)
![Page 14: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/14.jpg)
JMX exporter -javaagent:/usr/local/jmx_exporter/jmx_exporter.jar=
7071:/usr/local/jmx_exporter/exporter.yml
JVM
self.consul_job(
'presto',
datacenters=[env + '-us-east-1-dw1'],
services=['presto']
)
Prometheus
![Page 15: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/15.jpg)
Grafana dashboard
![Page 16: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/16.jpg)
Autoscaling curl -XPUT localhost:8889/v1/info/state -d "SHUTTING_DOWN" -H "Content-type: application/json"
Graceful decommission
"auto_scaling_group": {
"prepare_for_termination_cmd": "<cmd>"
}
Chef role
![Page 17: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/17.jpg)
Federated clusters
● Dynamic load balancing
● High availability● Minimize the
impact of rogue queries
![Page 18: Presto Summit NYC 2019 - Starburst Data€¦ · Slack’s internal analytics portal - Product Managers, Engineers, Analysts, Data scientists, Sales, Marketing, Finance DAGs running](https://reader033.vdocuments.site/reader033/viewer/2022042220/5ec695cf5cd67c7b0735c277/html5/thumbnails/18.jpg)
Q&A
Slack handles: @cheolsoo; @abhonsule
slack-corp.com