teaching apache spark clusters to manage their workers elastically: spark summit east talk by erik...
TRANSCRIPT
![Page 1: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/1.jpg)
Teaching Apache Spark Applications to Manage Their Workers ElasticallyErik ErlandsonTrevor McKayRed Hat, Inc.
![Page 2: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/2.jpg)
IntroductionErik ErlandsonSenior SW Engineer; Red Hat, Inc.Emerging Technologies Group
Internal Data ScienceInsightful Applications
Trevor McKayPrincipal SW Engineer; Red Hat, Inc.Emerging Technologies Group
Oshinko DevelopmentInsightful Applications
![Page 3: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/3.jpg)
Outline• Trevor
– container orchestration– containerizing spark
• Erik– spark dynamic allocation– metrics– elastic worker daemon
• Demo
![Page 4: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/4.jpg)
Containerizing Spark• Container 101
– What is a container?– Docker, Kubernetes and OpenShift
• Why Containerize Spark?• Oshinko
– features– components– cluster creation example
![Page 5: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/5.jpg)
What is a container?
• A process running in a namespace on a container host– separate process table, file system, and routing table– base operating system elements– application-specific code
• Resources can be limited through cgroups
![Page 6: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/6.jpg)
Docker and Kubernetes• “Docker is the world's leading software
containerization platform” www.docker.com
– Open-source tool to build, store, and run containers– Images can be stored in shared registries
• “Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts” kubernetes.io
![Page 7: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/7.jpg)
OpenShift Origin• Built around a core of Docker and Kubernetes• Adds application lifecycle management
functionality and DevOps tooling. www.openshift.org/
– multi-tenancy– Source-to-Image (S2I)
• Runs on your laptop with “oc cluster up”
![Page 8: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/8.jpg)
Why Containerize Spark?• Repeatable clusters with no mandatory config• Normal users can create a cluster
– No special privileges, just an account on a management platform
![Page 9: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/9.jpg)
Why Containerize Spark?• Containers allow a cluster-per-app model
– Quick to spin up and spin down– Isolation == multiple clusters on the same host– Data can still be shared through common endpoints– Do I need to share a large dedicated cluster?
![Page 10: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/10.jpg)
Why containerize Spark?• Ephemeral clusters conserve resources• Kubernetes makes horizontal scale out simple
– Elastic Worker daemon builds on this foundation– Elasticity further conserves resources
![Page 11: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/11.jpg)
Deeper on Spark + Containers
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performance● William Benton (Red Hat)● Thursday, February 9● 11:40 AM – 12:10 PM● Ballroom A
![Page 12: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/12.jpg)
Oshinko: Simplifying further• Containers simplify deployment but still lots to do ...
– Create the master and worker containers– Handle spark configuration– Wire the cluster together– Allow access to http endpoints– Tear it all down when you’re done
• Oshinko treats clusters as abstractions and does this work for us
![Page 13: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/13.jpg)
Oshinko Features• CLI, web UI, and REST interfaces• Cluster creation with sane defaults (name only)• Scale and delete with simple commands• Advanced configuration
– Enable features like Elastic Workers with a flag– Specify images and spark configuration files– Cluster configurations persisted in Kubernetes
• Source-to-Image integration (pyspark, java, scala)
![Page 14: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/14.jpg)
pod
Oshinko ComponentsOshinko web UI
Oshinko REST
Oshinko Core
Oshinko CLI
Oshinko Core
Oshinko OpenShift console OpenShift and Kubernetes API
servers
Oshinko CLI
Oshinko Core
pod
s2iimage pod
Oshinko CLI
Oshinko Core
Launch script and user code
![Page 15: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/15.jpg)
Creating a ClusterCLI from a shell …
$ oshinko-cli create mycluster --storedconfig=clusterconfig \ --insecure-skip-tls-verify=true --token=$TOKEN
Using REST from Python …
import requestsr = requests.post("http://oshinko-rest/clusters", json={"name": clustername, "config": {"name": "clusterconfig"} })
![Page 16: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/16.jpg)
What is a cluster config?$ oc export configmap clusterconfig
apiVersion: v1data: metrics.enable: "true" scorpionstare.enable: "true" sparkimage: docker.io/manyangled/var-spark-worker:latest sparkmasterconfig: masterconfig sparkworkerconfig: workerconfigkind: ConfigMapmetadata: creationTimestamp: null name: clusterconfig
![Page 17: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/17.jpg)
Source for demo’s oshinko-rest• Metrics implementation is being reviewed
– using carbon and graphite today– investigating jolokia metrics
• Metrics and elastic workers currently supported athttps://github.com/tmckayus/oshinko-rest/tree/metrics
• Both features will be merged to oshinko master soon
![Page 18: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/18.jpg)
Dynamic Allocation
CountBacklogJobs
> E ?Requestmin(2x current, backlog)
WaitInterval
ReportTarget asMetric
Shut DownIdle Executors
Yes
No
spark .dynamicAllocation .enabled
![Page 19: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/19.jpg)
Dynamic Allocation
CountBacklogJobs
> E ?Requestmin(2x current, backlog)
WaitInterval
ReportTarget asMetric
Shut DownIdle Executors
Yes
Nospark .dynamicAllocation .maxExecutors
![Page 20: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/20.jpg)
Dynamic Allocation
CountBacklogJobs
> E ?Requestmin(2x current, backlog)
WaitInterval
ReportTarget asMetric
Shut DownIdle Executors
Yes
No
spark .dynamicAllocation .schedulerBacklogTimeout
*.sink.graphite.host
![Page 21: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/21.jpg)
Dynamic Allocation
CountBacklogJobs
> E ?Requestmin(2x current, backlog)
WaitInterval
ReportTarget asMetric
Shut DownIdle Executors
Yes
Nospark .dynamicAllocation .executorIdleTimeout
![Page 22: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/22.jpg)
Executor Scalingspark.dynamicAllocation.initialExecutors
>= spark.dynamicAllocation.minExecutors
<= spark.dynamicAllocation.maxExecutors
<= backlog jobs (<= RDD partitions)
![Page 23: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/23.jpg)
Shuffle Service• Caches shuffle results independent of Executor• Saves results if Executor is shut down• Required for running Dynamic Allocation• spark.shuffle.service.enabled = true
![Page 24: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/24.jpg)
Dynamic Allocation MetricsPublished by the ExecutorAllocationManager
numberExecutorsToAdd Additional executors requested
numberExecutorsPendingToRemove Executors being shut down
numberAllExecutors Executors in any state
numberTargetExecutors Total requested (current+additional)
numberMaxNeededExecutors Maximum that could be loaded
![Page 25: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/25.jpg)
Elastic Worker Daemon
driver
metricservice
elasticdaemon
oshinkoAPIserver
sparkworkers
sparkworkers
sparkworkersspark
master
RE
ST
RE
ST
openshiftAPIserver
Spark Master Pod Spark Worker PodsExecutor Request
![Page 26: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/26.jpg)
Elastic Worker Daemon
driver
metricservice
elasticdaemon
oshinkoAPIserver
sparkworkers
sparkworkers
sparkworkersspark
master
RE
ST
RE
ST
openshiftAPIserver
numberTargetExecutors
Spark Master Pod Spark Worker Pods
![Page 27: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/27.jpg)
Elastic Worker Daemon
driver
metricservice
elasticdaemon
oshinkoAPIserver
sparkworkers
sparkworkers
sparkworkersspark
master
REST
REST
openshiftAPIserver
numberTargetExecutors
Spark Master Pod Spark Worker Pods
![Page 28: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/28.jpg)
Elastic Worker Daemon
driver
metricservice
elasticdaemon
oshinkoAPIserver
sparkworkers
sparkworkers
sparkworkersspark
master
RE
ST
RE
ST
openshiftAPIserver
Spark Master Pod Spark Worker Pods
![Page 29: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/29.jpg)
Elastic Worker Daemon
driver
metricservice
elasticdaemon
oshinkoAPIserver
sparkworkers
sparkworkers
sparkworkersspark
master
RE
ST
RE
ST
openshiftAPIserver
Spark Master Pod Spark Worker Pods
PodRep
licati
on
![Page 30: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/30.jpg)
Demo
Demo
![Page 31: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/31.jpg)
Radanalytics.ioNew community landing page at
http://radanalytics.io/
![Page 32: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/32.jpg)
Where to Find OshinkoOshinko and related bits:
http://github.com/radanalyticsio/
Docker images:https://hub.docker.com/u/radanalyticsio/
Images and notebook for today’s demo:https://hub.docker.com/u/tmckay/
https://hub.docker.com/u/manyangled/https://github.com/erikerlandson/var-notebook/pulls
![Page 33: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay](https://reader033.vdocuments.site/reader033/viewer/2022052405/58ce7ec91a28ab210a8b5103/html5/thumbnails/33.jpg)
Related Effort: Spark on K8s• Native scheduler backend for Kubernetes• https://github.com/apache-spark-on-k8s/spark• Developer Community Collaboration