native support of prometheus monitoring in apache spark 3...spark 3 provides a better integration...

Native Support of Prometheus Monitoring in Apache Spark 3

Dongjoon HyunDB Tsai

SPARK+AI SUMMIT 2020

Who am IDongjoon Hyun

Apache Spark PMC and CommitterApache ORC PMC and CommitterApache REEF PMC and Committer

https://github.com/dongjoon-hyunhttps://www.linkedin.com/in/dongjoon@dongjoonhyun

Who am IDB Tsai

Apache Spark PMC and CommitterApache SystemML PMC and CommitterApache Yunikorn CommitterApache Bahir Committer

https://github.com/dbtsaihttps://www.linkedin.com/in/dbtsai@dbtsai

Three popular methodsMonitoring Apache Spark

Web UI (Live and History Server)• Jobs, Stages, Tasks, SQL queries• Executors, Storage

Logs• Event logs and Spark process logs• Listeners (SparkListener, StreamingQueryListener, SparkStatusTracker, …)

Metrics• Various numeric values

Early warning instead of post-mortem processMetrics are useful to handle gray failures

Monitoring and alerting Spark jobs’ gray failures• Memory Leak or misconfiguration• Performance degradation• Growing streaming job’s inter-states

An open-source systems monitoring and alerting toolkit Prometheus

Provides• a multi-dimensional data model• operational simplicity• scalable data collection• a powerful query language

A good option for Apache Spark Metrics

Prometheus Server

Prometheus Web UI

Alert Manager

Pushgateway

https://en.wikipedia.org/wiki/Prometheus_(software)

Using JmxSink and JMXExporter combinationSpark 2 with Prometheus (1/3)

Enable Spark’s built-in JmxSink in Spark’s conf/metrics.propertiesDeploy Prometheus' JMXExporter library and its config fileExpose JMXExporter port, 9404, to PrometheusAdd `-javaagent` option to the target (master/worker/executor/driver/…)

-javaagent:./jmx_prometheus_javaagent-0.12.0.jar=9404:config.yaml

Using GraphiteSink and GraphiteExporter combinationSpark 2 with Prometheus (2/3)

Set up Graphite serverEnable Spark’s built-in Graphite Sink with several configurationsEnable Prometheus’ GraphiteExporter at Graphite

Custom sink (or 3rd party Sink) + Pushgateway serverSpark 2 with Prometheus (3/3)

Set up Pushgateway serverDevelop a custom sink (or use 3rd party libs) with Prometheus dependencyDeploy the sink libraries and its configuration file to the cluster

Pros and Cons

Pros• Used already in production• A general approach

Cons• Difficult to setup at new environments• Some custom libraries may have a dependency on Spark versions

Easy usageGoal in Apache Spark 3

Be independent from the existing Metrics pipeline• Use new endpoints and disable it by default• Avoid introducing new dependency

Reuse the existing resources• Use official documented ports of Master/Worker/Driver• Take advantage of Prometheus Service Discovery in K8s as much as possible

What's new in Spark 3 Metrics

SPARK-29674 / SPARK-29557DropWizard Metrics 4 for JDK11

Timeline

2.3 3.02.41.6 2.1 2.22.04.1.13.1.53.1.2DropWizard Metrics

Spark

20202019201820172016Year

DropWizard Metrics 4.x (Spark 3)

SPARK-29674 / SPARK-29557DropWizard Metrics 4 for JDK11

Timeline

DropWizard Metrics 3.x (Spark 1/2)

metrics_master_workers_Value 0.0 metrics_master_workers_Value{type="gauges",} 0.0metrics_master_workers_Number{type=“gauges",} 0.0

2.3 3.02.41.6 2.1 2.22.04.1.13.1.53.1.2DropWizard Metrics

Spark

20202019201820172016Year

A new metric sourceExecutorMetricsSource

Collect executor memory metrics to driver and expose it as ExecutorMetricsSource and REST API (SPARK-23429, SPARK-27189, SPARK-27324, SPARK-24958)

• JVMHeapMemory / JVMOffHeapMemory

• OnHeapExecutionMemory / OffHeapExecutionMemory

• OnHeapStorageMemory / OffHeapStorageMemory

• OnHeapUnifiedMemory / OffHeapUnifiedMemory

• DirectPoolMemory / MappedPoolMemory

• MinorGCCount / MinorGCTime

• MajorGCCount / MajorGCTime

• ProcessTreeJVMVMemory

• ProcessTreeJVMRSSMemory

• ProcessTreePythonVMemory

• ProcessTreePythonRSSMemory

• ProcessTreeOtherVMemory

• ProcessTreeOtherRSSMemory

JVM Process Tree

Prometheus-format endpointsSupport Prometheus more natively (1/2)

PrometheusServlet: A friend of MetricSevlet • A new metric sink supporting Prometheus-format (SPARK-29032)• Unified way of configurations via conf/metrics.properties • No additional system requirements (services / libraries / ports)

Prometheus-format endpointsSupport Prometheus more natively (1/2)

PrometheusServlet: A friend of MetricSevlet • A new metric sink supporting Prometheus-format (SPARK-29032)• Unified way of configurations via conf/metrics.properties • No additional system requirements (services / libraries / ports)

PrometheusResource: A single endpoint for all executor memory metrics • A new metric endpoint to export all executor metrics at driver (SPARK-29064/SPARK-29400)• The most efficient way to discover and collect because driver has all information already• Enabled by `spark.ui.prometheus.enabled` (default: false)

spark_info and service discoverySupport Prometheus more natively (2/2)

Add spark_info metric (SPARK-31743) • A standard Prometheus way to expose

version and revision• Monitoring Spark jobs per version

Support driver service annotation in K8S (SPARK-31696)• Used by Prometheus service discovery

Under the hood

SPARK-29032 Add PrometheusServlet to monitor Master/Worker/DriverPrometheusServlet

Make Master/Worker/Driver expose the metrics in Prometheus format at the existing portFollow the output style of "Spark JMXSink + Prometheus JMXExporter + javaagent" way

Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since initial release)

Driver 4040 /metrics/prometheus/ /metrics/json/

Worker 8081 /metrics/prometheus/ /metrics/json/

Master 8080 /metrics/master/prometheus/ /metrics/master/json/

Master 8080 /metrics/applications/prometheus/ /metrics/applications/json/

Spark Driver Endpoint Example

Use conf/metrics.properties like the other sinksPrometheusServlet Configuration

Copy conf/metrics.properties.template to conf/metrics.propertiesUncomment like the following in conf/metrics.properties

*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet*.sink.prometheusServlet.path=/metrics/prometheusmaster.sink.prometheusServlet.path=/metrics/master/prometheusapplications.sink.prometheusServlet.path=/metrics/applications/prometheus

SPARK-29064 Add PrometheusResource to export executor metricsPrometheusResource

New endpoint with the similar information of JSON endpointDriver exposes all executor memory metrics in Prometheus format

Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since 1.4)

Driver 4040 /metrics/executors/prometheus/ /api/v1/applications/{id}/executors/

Use spark.ui.prometheus.enabledPrometheusResource Configuration

Run spark-shell with configuration

Run `curl` with the new endpoint

$ bin/spark-shell \-c spark.ui.prometheus.enabled=true \-c spark.executor.processTreeMetrics.enabled=true

$ curl http://localhost:4040/metrics/executors/prometheus/ | grep executor | head -n1metrics_executor_rddBlocks{application_id="...", application_name="...", executor_id="..."} 0

Monitoring in K8s cluster

Key Monitoring Scenarios on K8s clusters

Monitoring batch job memory behaviorMonitoring dynamic allocation behaviorMonitoring streaming job behavior

Key Monitoring Scenarios on K8s clusters

Monitoring batch job memory behaviorMonitoring dynamic allocation behaviorMonitoring streaming job behavior

=> A risk to be killed?=> Unexpected slowness?=> Latency?

Use Prometheus Service DiscoveryMonitoring batch job memory behavior (1/2)

Configuration Value

spark.ui.prometheus.enabled true

spark.kubernetes.driver.annotation.prometheus.io/scrape true

spark.kubernetes.driver.annotation.prometheus.io/path /metrics/executors/prometheus/

spark.kubernetes.driver.annotation.prometheus.io/port 4040

Monitoring batch job memory behavior (2/2)spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster \ -c spark.driver.memory=2g \ -c spark.executor.instances=30 \ -c spark.ui.prometheus.enabled=true \ -c spark.kubernetes.driver.annotation.prometheus.io/scrape=true \ -c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/ \ -c spark.kubernetes.driver.annotation.prometheus.io/port=4040 \ -c spark.kubernetes.container.image=spark:3.0.0 \ --class org.apache.spark.examples.SparkPi \ local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar 200000

OOMKilled at Driver

Set spark.dynamicAllocation.*Monitoring dynamic allocation behavior

spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster \ -c spark.dynamicAllocation.enabled=true \ -c spark.dynamicAllocation.executorIdleTimeout=5 \ -c spark.dynamicAllocation.shuffleTracking.enabled=true \ -c spark.dynamicAllocation.maxExecutors=50 \ -c spark.ui.prometheus.enabled=true \ … (the same) … https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000

Set spark.dynamicAllocation.*Monitoring dynamic allocation behavior

spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster \ -c spark.dynamicAllocation.enabled=true \ -c spark.dynamicAllocation.executorIdleTimeout=5 \ -c spark.dynamicAllocation.shuffleTracking.enabled=true \ -c spark.dynamicAllocation.maxExecutors=50 \ -c spark.ui.prometheus.enabled=true \ … (the same) … https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000

`dynamic-pi.py` computes Pi, sleeps 1 minutes, and computes Pi again.

Select a single Spark app

rate(metrics_executor_totalTasks_total{...}[1m])

Inform Prometheus both metrics endpointsDriver service annotation

spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster \-c spark.ui.prometheus.enabled=true \-c spark.kubernetes.driver.annotation.prometheus.io/scrape=true \-c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/prometheus/ \-c spark.kubernetes.driver.annotation.prometheus.io/port=4040 \-c spark.kubernetes.driver.service.annotation.prometheus.io/scrape=true \-c spark.kubernetes.driver.service.annotation.prometheus.io/path=/metrics/executors/prometheus/ \-c spark.kubernetes.driver.service.annotation.prometheus.io/port=4040 \…

spark.dynamicAllocation.maxExecutors=30

spark.dynamicAllocation.maxExecutors=300

Executor Allocation Ratio

Set spark.sql.streaming.metricsEnabled=true (default: false)Monitoring streaming job behavior (1/2)

Metrics• latency• inputRate-total• processingRate-total• states-rowsTotal• states-usedBytes• eventTime-watermark

Prefix of streaming query metric names• metrics_[namespace]_spark_streaming_[queryName]•

All metrics are important for alertMonitoring streaming job behavior (2/2)

latency > micro-batch interval • Spark can endure some situations, but the job needs to be re-design to prevent future

outage

states-rowsTotal grows indefinitely • These jobs will die eventually due to OOM

- SPARK-27340 Alias on TimeWindow expression cause watermark metadata lost (Fixed at 3.0)- SPARK-30553 Fix structured-streaming java example error

Separation of concernsPrometheus Federation and Alert

Prometheus Server

Prometheus Web UI

Alert Manager

Pushgateway

namespace1 (User)

… Prometheus Server

Prometheus Web UI

Alert Manager

Pushgateway

namespace2 (User)

Prometheus Server

Prometheus Web UI

Alert Manager

Pushgateway

Cluster-wise prometheus (Admin)

Metrics for batch job monitoring Metrics for streaming job monitoring

a subset of metrics (spark_info + ...)

New endpoints are still experimentalLimitations and Tips

New endpoints expose only Spark metrics starting with `metrics_` or `spark_info`• `javaagent` method can expose more metrics like `jvm_info`

PrometheusSevlet does not follow Prometheus naming convention• Instead, it's designed to follow Spark 2 naming convention for consistency in Spark

The number of metrics grows if we don't set the followings

writeStream.queryName("spark")

spark.metrics.namespace=spark

Summary

Spark 3 provides a better integration with Prometheus monitoring• Especially, in K8s environment, the metric collections become much easier than Spark 2

New Prometheus style endpoints are independent and additional options• Users can migrate into new endpoints or use them with the existing methods in a mixed

way

Thank you!

native support of prometheus monitoring in apache spark 3...spark 3 provides a better integration...

Documents