{"id":41728321,"url":"https://github.com/cerndb/spark-dashboard","last_synced_at":"2026-05-06T09:05:18.983Z","repository":{"id":72812255,"uuid":"212368829","full_name":"cerndb/spark-dashboard","owner":"cerndb","description":"Spark-Dashboard is an open-source monitoring solution for Apache Spark that provides real-time performance dashboards using containers and Kubernetes.","archived":false,"fork":false,"pushed_at":"2026-05-05T19:19:12.000Z","size":2054,"stargazers_count":134,"open_issues_count":1,"forks_count":20,"subscribers_count":9,"default_branch":"master","last_synced_at":"2026-05-05T21:23:33.168Z","etag":null,"topics":["docker","grafana","grafana-dashboard","helm","kubernetes","monitoring","spark"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cerndb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-10-02T14:55:25.000Z","updated_at":"2026-05-05T19:19:16.000Z","dependencies_parsed_at":"2026-01-06T01:00:53.218Z","dependency_job_id":null,"html_url":"https://github.com/cerndb/spark-dashboard","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/cerndb/spark-dashboard","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cerndb%2Fspark-dashboard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cerndb%2Fspark-dashboard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cerndb%2Fspark-dashboard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cerndb%2Fspark-dashboard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cerndb","download_url":"https://codeload.github.com/cerndb/spark-dashboard/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cerndb%2Fspark-dashboard/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32686264,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T08:33:17.875Z","status":"ssl_error","status_checked_at":"2026-05-06T08:33:17.221Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","grafana","grafana-dashboard","helm","kubernetes","monitoring","spark"],"created_at":"2026-01-24T23:18:51.330Z","updated_at":"2026-05-06T09:05:18.975Z","avatar_url":"https://github.com/cerndb.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spark Dashboard\n\n**Real-time monitoring and performance troubleshooting for Apache Spark**\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14718682.svg)](https://doi.org/10.5281/zenodo.14718682)\n[![Docker Pulls](https://img.shields.io/docker/pulls/lucacanali/spark-dashboard)](https://hub.docker.com/r/lucacanali/spark-dashboard)\n\n---\n\n## Key Features\n\n- **Real-time monitoring**  \n  Visualize Spark and system metrics in Grafana, including CPU, memory, active tasks, and I/O, to quickly spot trends and anomalies.\n\n- **Easy deployment**  \n  Run locally with a container or deploy on Kubernetes with Helm.\n\n- **Broad compatibility**  \n  Supports Apache Spark 3.x and 4.x across Hadoop, Kubernetes, and Spark Standalone environments.\n\n---\n\n## Contents\n\n- [Architecture](#architecture)\n- [Deploying Spark Dashboard v2](#deployment-options)\n  - [Run Spark Dashboard v2 in a container](#run-spark-dashboard-v2-in-a-container)\n  - [Run Spark Dashboard v2 on Kubernetes with Helm](#run-spark-dashboard-v2-on-kubernetes-with-helm)\n  - [Extended Spark Dashboard](#extended-spark-dashboard)\n- [Notes on Spark Connect](#notes-on-spark-connect)\n- [Examples and getting started](#examples-and-getting-started)\n  - [Start small with Spark local mode](#start-small-with-spark-local-mode)\n  - [Run TPC-DS on a Spark cluster](#run-tpc-ds-on-a-spark-cluster)\n- [Legacy implementation (v1)](#legacy-implementation-v1)\n\n---\n\n## Resources\n\n- [Watch the Spark Dashboard demo and tutorial](https://www.youtube.com/watch?v=sLjAyDwpg80)\n- [Notes on Spark Dashboard](https://github.com/LucaCanali/Miscellaneous/tree/master/Spark_Dashboard)\n- [Building an Apache Spark Performance Lab](https://db-blog.web.cern.ch/node/195)\n- [Blog post on Spark Dashboard](https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark)\n- [Talk at Data + AI Summit 2021](https://databricks.com/session_na21/monitor-apache-spark-3-on-kubernetes-using-metrics-and-plugins), [slides](http://canali.web.cern.ch/docs/Monitor_Spark3_on_Kubernetes_DataAI2021_LucaCanali.pdf)\n- [sparkMeasure](https://github.com/LucaCanali/sparkMeasure), a tool for troubleshooting Apache Spark workloads\n- [TPCDS_PySpark](https://github.com/LucaCanali/Miscellaneous/tree/master/Performance_Testing/TPCDS_PySpark), a TPC-DS workload generator for Apache Spark\n\n**Main author and contact:** Luca.Canali@cern.ch\n\n---\n\n## Architecture\n\n![Spark metrics dashboard architecture](https://raw.githubusercontent.com/LucaCanali/Miscellaneous/master/Spark_Dashboard/images/Spark_MetricsSystem_Grafana_Dashboard_V2.0.png \"Spark metrics dashboard architecture\")\n\nSpark Dashboard provides an end-to-end monitoring pipeline for Apache Spark using open-source components. It is designed to deliver real-time visibility into Spark cluster health and performance, from metric generation to visualization.\n\n- **Apache Spark metrics**  \n  Apache Spark generates detailed performance metrics through its [metrics system](https://spark.apache.org/docs/latest/monitoring.html#metrics). Both the driver and executors emit metrics such as runtime, CPU usage, garbage collection time, memory consumption, shuffle activity, and I/O statistics in Graphite format.\n\n- **Telegraf**  \n  Telegraf acts as the collection agent. It ingests Spark metrics, enriches them with labels and tags, and forwards them to the storage backend.\n\n- **VictoriaMetrics**  \n  VictoriaMetrics stores the collected metrics efficiently as time-series data, making it well suited for both real-time monitoring and historical analysis.\n\n- **Grafana**  \n  Grafana provides the visualization layer. It queries VictoriaMetrics using PromQL or MetricsQL and displays interactive dashboards for observing trends and identifying bottlenecks.\n\nTogether, these components provide a scalable monitoring solution for Apache Spark.\n\n---\n\n## Deployment options\nThis repository provides two main deployment options for Spark Dashboard v2:\n\n- [Run Spark Dashboard v2 in a container with Docker or Podman](#run-spark-dashboard-v2-in-a-container)\n- [Run Spark Dashboard v2 on a Kubernetes cluster with Helm](#run-spark-dashboard-v2-on-kubernetes-with-helm)\n\n---\n## Run Spark Dashboard v2 in a container\n\nFollow these steps to deploy Spark Dashboard v2 with Docker or Podman.\n\n### 1. Start the container\n\nThe container image includes VictoriaMetrics for metrics storage and Grafana for visualization.\n\n**Using Docker**\n\n```bash\ndocker run -p 3000:3000 -p 2003:2003 -d lucacanali/spark-dashboard\n```\n\n**Using Podman**\n\n```bash\npodman run -p 3000:3000 -p 2003:2003 -d lucacanali/spark-dashboard\n```\n\n### 2. Configure Apache Spark\n\nTo make Spark Dashboard receive metrics from your Spark application, configure Spark to send metrics to Telegraf.\n\nYou can do this in one of two ways (use one approach or the other, not both)\n\n#### Option A: Configure `metrics.properties`\n\nEdit the file `metrics.properties` in `$SPARK_CONF_DIR` and add:\n\n```properties\n# Configure Graphite sink for Spark metrics\n*.sink.graphite.host=localhost\n*.sink.graphite.port=2003\n*.sink.graphite.period=10\n*.sink.graphite.unit=seconds\n*.sink.graphite.prefix=lucatest\n\n# Enable JVM metrics collection\n*.source.jvm.class=org.apache.spark.metrics.source.JvmSource\n```\n\nAfter saving `metrics.properties`, start Spark normally. Spark will load that file at startup and send metrics to Spark Dashboard automatically.\n\nOptionally, enable additional Spark metric sources in `spark-defaults.conf` or your spark-submit launch command:\n\n```bash\n--conf spark.metrics.staticSources.enabled=true\n--conf spark.metrics.appStatusSource.enabled=true\n```\n\n#### Option B: Configure Spark on the command line\n\nInstead of editing `metrics.properties`, you can pass the configuration directly when starting Spark:\n\n```bash\n# We use Telegraf to collect metrics sent by Spark to the Graphite sink\nTELEGRAF_ENDPOINT=$(hostname)\n\nbin/spark-shell \\\n  --conf \"spark.metrics.conf.*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.host=${TELEGRAF_ENDPOINT}\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.port=2003\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.period=10\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.unit=seconds\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.prefix=mytest\" \\\n  --conf \"spark.metrics.conf.*.source.jvm.class=org.apache.spark.metrics.source.JvmSource\" \\\n  --conf \"spark.metrics.staticSources.enabled=true\" \\\n  --conf \"spark.metrics.appStatusSource.enabled=true\"\n```\n\n### 3. Visualize metrics in Grafana\n\nOnce the container is running and Spark is configured to export metrics:\n\n- Open Grafana at `http://localhost:3000`\n- Default credentials:\n  - **User:** `admin`\n  - **Password:** `admin`\n\nTo set a custom Grafana admin password when running with Docker, pass `GRAFANA_ADMIN_PASSWORD`:\n\n```bash\ndocker run -p 3000:3000 -p 2003:2003 \\\n  -e GRAFANA_ADMIN_PASSWORD='change-me' \\\n  -d lucacanali/spark-dashboard:v02\n```\n\n`GF_SECURITY_ADMIN_PASSWORD` is also supported for compatibility with Grafana's native environment variable.\n\nThe bundled dashboard, **Spark_Perf_Dashboard_v04_promQL**, displays key Spark metrics such as runtime, CPU, I/O, shuffle activity, and task counts, along with detailed time-series graphs.\n\n\u003e Ensure that a Spark application is running and configured to send metrics, otherwise no data will appear in Grafana.\n\nFor test workloads, you can use [TPCDS_PySpark](https://github.com/LucaCanali/Miscellaneous/tree/master/Performance_Testing/TPCDS_PySpark).\n\n### Optional: enable HTTPS for Grafana\n\nFor testing, `openssl` can be used to generate a self-signed certificate. Mount the certificate and key into the container, then enable HTTPS:\n\n```bash\nmkdir -p certs\nopenssl req -x509 -newkey rsa:4096 -nodes -days 365 \\\n  -keyout certs/tls.key \\\n  -out certs/tls.crt \\\n  -subj \"/CN=localhost\" \\\n  -addext \"subjectAltName=DNS:localhost,IP:127.0.0.1\"\n\ndocker run -p 3000:3000 -p 2003:2003 \\\n  -v ./certs:/etc/grafana/certs:ro \\\n  -e GRAFANA_HTTPS_ENABLED=true \\\n  -d lucacanali/spark-dashboard:v02\n```\n\nGrafana will be available at:\n\n```text\nhttps://localhost:3000\n```\n\nBy default, the container expects `tls.crt` and `tls.key` in `/etc/grafana/certs`. You can override the paths with `GRAFANA_CERT_FILE` and `GRAFANA_CERT_KEY`, and set `GRAFANA_ROOT_URL` when Grafana is served through a DNS name.\n\n---\n\n## Persisting VictoriaMetrics data across restarts\n\nBy default, VictoriaMetrics data is not preserved across container restarts. To keep historical metrics, mount a persistent volume.\n\nExample using a local directory:\n\n```bash\nmkdir metrics_data\n\ndocker run --network=host \\\n  -v ./metrics_data:/victoria-metrics-data \\\n  -d lucacanali/spark-dashboard:v02\n```\n\n---\n\n## Run Spark Dashboard v2 on Kubernetes with Helm\n\nThe `charts_v2/` directory contains the Helm chart for Spark Dashboard v2.\n\nIf your cluster does not provide a default storage class:\n\n```bash\nhelm install spark-dashboard ./charts_v2 --set persistence.enabled=false\n```\n\nIf your cluster provides a suitable storage class and you want persistence:\n\n```bash\nhelm install spark-dashboard ./charts_v2 --set persistence.storageClass=\u003cyour-storage-class\u003e\n```\n\nCheck deployment status:\n\n```bash\nkubectl get pods -l app.kubernetes.io/name=spark-dashboard-v2\nkubectl get svc spark-dashboard-v2\n```\n\nTo expose the dashboard externally using a `LoadBalancer` service:\n\n```bash\nhelm install spark-dashboard ./charts_v2 \\\n  --set persistence.enabled=false \\\n  --set service.type=LoadBalancer\n```\n\nThis exposes:\n- Grafana on port `3000`\n- Telegraf on port `2003`\n\nVictoriaMetrics port `8428` is not exposed on the load balancer by default.\n\nTo expose VictoriaMetrics as well:\n\n```bash\nhelm install spark-dashboard ./charts_v2 \\\n  --set persistence.enabled=false \\\n  --set service.type=LoadBalancer \\\n  --set service.victoriametrics.exposeOnLoadBalancer=true\n```\n\nIf Spark runs inside the cluster, use the service DNS name as the Graphite endpoint:\n\n```text\nspark-dashboard-v2:2003\n```\n\nIf Spark runs outside the cluster, wait for an external IP:\n\n```bash\nkubectl get svc spark-dashboard-v2 -w\n```\n\nThen use:\n\n```text\n\u003cexternal-ip\u003e:2003\n```\n\nGrafana will be available at:\n\n```text\nhttp://\u003cexternal-ip\u003e:3000\n```\n\nTo set the Grafana admin password:\n\n```bash\nhelm upgrade --install spark-dashboard ./charts_v2 \\\n  --set grafana.adminPassword='change-me'\n```\n\nTo use HTTPS for Grafana, first create a certificate. For testing, `openssl` can be used to generate a self-signed certificate:\n\n```bash\nopenssl req -x509 -newkey rsa:4096 -nodes -days 365 \\\n  -keyout tls.key \\\n  -out tls.crt \\\n  -subj \"/CN=dashboard.example.com\" \\\n  -addext \"subjectAltName=DNS:dashboard.example.com\"\n```\n\nThen create a Kubernetes TLS secret and enable the chart option:\n\n```bash\nkubectl create secret tls spark-dashboard-grafana-tls \\\n  --cert=./tls.crt \\\n  --key=./tls.key\n\nhelm upgrade --install spark-dashboard ./charts_v2 \\\n  --set persistence.enabled=false \\\n  --set grafana.https.enabled=true \\\n  --set grafana.https.secretName=spark-dashboard-grafana-tls\n```\n\nGrafana will then be available at:\n\n```text\nhttps://\u003cexternal-ip\u003e:3000\n```\n\nFor testing, you can also use port-forwarding:\n\n```bash\nkubectl port-forward svc/spark-dashboard-v2 3000:3000 2003:2003\n```\n\nThen open:\n\n```text\nhttp://localhost:3000\n```\n\nIf Spark runs on the same machine as the port-forward, use `localhost:2003` as the Telegraf/Graphite sink endpoint.\n\n### Helm troubleshooting\n\nIf pods remain in `Pending`, check for storage issues:\n\n```bash\nkubectl get pvc\nkubectl describe pod -l app.kubernetes.io/name=spark-dashboard-v2\nkubectl get storageclass\n```\n\nIf needed, reinstall without persistence:\n\n```bash\nhelm uninstall spark-dashboard\nhelm install spark-dashboard ./charts_v2 --set persistence.enabled=false\n```\n\nIf the service exists but external access fails, verify the in-cluster path first:\n\n```bash\nkubectl get endpoints spark-dashboard-v2\nkubectl run netcheck --rm -it --image=busybox:1.36 --restart=Never -- sh\n```\n\nFrom the debug shell:\n\n```sh\nnc -vz spark-dashboard-v2 3000\nnc -vz spark-dashboard-v2 2003\n```\n\nIf those checks succeed, the chart is working and the remaining issue is external networking, firewall rules, or service exposure.\n\nIf `EXTERNAL-IP` stays pending for a `LoadBalancer` service, your cluster likely does not have load balancer integration configured. In that case, use `NodePort`, deploy a solution such as MetalLB, or use the external exposure mechanism supported by your Kubernetes environment.\n\n---\n\n## Extended Spark Dashboard\n\nThe Extended Spark Dashboard adds OS- and storage-level observability on top of standard Spark metrics. It uses [Spark Plugins](https://github.com/cerndb/SparkPlugins) to collect additional metrics and stores them in the same VictoriaMetrics backend.\n\n### Additional dashboard features\n\nThe extended dashboard adds three groups of graphs:\n\n- **CGroup metrics**  \n  Useful for Spark running on Kubernetes.\n\n- **Cloud storage metrics**  \n  Covers storage backends such as S3A, GCS, WASB, and similar systems.\n\n- **Advanced HDFS statistics**  \n  Provides deeper visibility into HDFS activity and performance.\n\n### Configuration\n\nAdd the following to your Spark configuration:\n\n```bash\n--conf spark.jars.packages=ch.cern.sparkmeasure:spark-plugins_2.13:0.4\n--conf spark.plugins=ch.cern.HDFSMetrics,ch.cern.CgroupMetrics,ch.cern.CloudFSMetrics\n```\n\n### Using the extended dashboard\n\nIn Grafana, select:\n\n- `Spark_Perf_Dashboard_v04_PromQL_with_SparkPlugins`\n\nThis dashboard includes additional graphs for OS and storage metrics.\n\n---\n\n## Notes on Spark Connect\n\n[Spark Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html) allows a lightweight Spark client to connect remotely to a Spark cluster.\n\nWhen using Spark Connect, **Spark Dashboard must run on the Spark Connect server**, not on the client.\n\n1. Start the Spark Dashboard container.\n2. Edit the `metrics.properties` file in the Spark Connect `conf` directory as described above.\n3. Start Spark Connect:\n\n```bash\nsbin/start-connect-server.sh\n```\n\nMetrics from Spark Connect will then be sent to Spark Dashboard and visualized in Grafana.\n\n---\n\n## Examples and getting started\n\nSee example graphs here:\n- [Spark Dashboard example graphs](https://github.com/LucaCanali/Miscellaneous/tree/master/Spark_Dashboard#example-graphs)\n\n### Start small with Spark local mode\n\nYou can use [TPCDS_PySpark](https://github.com/LucaCanali/Miscellaneous/tree/master/Performance_Testing/TPCDS_PySpark) to generate a TPC-DS workload and test the dashboard.\n\nYou can run this locally or in the cloud, for example with GitHub Codespaces:\n\n- [Open in GitHub Codespaces](https://codespaces.new/cerndb/spark-dashboard)\n\n```bash\n# Install dependencies\npip install pyspark\npip install sparkmeasure\npip install tpcds_pyspark\n\n# Download test data\nwget https://sparkdltrigger.web.cern.ch/sparkdltrigger/TPCDS/tpcds_10.zip\nunzip -q tpcds_10.zip\n\n# 1. Run a minimal test\ntpcds_pyspark_run.py -d tpcds_10 -n 1 -r 1 --queries q1,q2\n\n# 2. Start the dashboard\ndocker run -p 2003:2003 -p 3000:3000 -d lucacanali/spark-dashboard\n\n# 3. Run the workload and send metrics to the dashboard\nTPCDS_PYSPARK=$(which tpcds_pyspark_run.py)\n\nspark-submit --master local[*] \\\n  --conf \"spark.metrics.conf.*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.host=localhost\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.port=2003\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.period=10\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.unit=seconds\" \\\n  --conf \"spark.metrics.conf.*.sink.graphite.prefix=lucatest\" \\\n  --conf \"spark.metrics.conf.*.source.jvm.class=org.apache.spark.metrics.source.JvmSource\" \\\n  --conf \"spark.metrics.staticSources.enabled=true\" \\\n  --conf \"spark.metrics.appStatusSource.enabled=true\" \\\n  --conf spark.driver.memory=4g \\\n  --conf spark.log.level=error \\\n  --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27 \\\n  $TPCDS_PYSPARK -d tpcds_10\n```\n\nThen:\n- open `http://localhost:3000`\n- log in with `admin` / `admin`\n- optionally open the Spark UI at `http://localhost:4040`\n\n\u003e The dashboard is more informative when Spark runs on cluster resources rather than only in local mode.\n\n### Run TPC-DS on a Spark cluster\n\nExample on a YARN cluster:\n\n```bash\nTPCDS_PYSPARK=$(which tpcds_pyspark_run.py)\n\nspark-submit --master yarn \\\n  --conf spark.log.level=error \\\n  --conf spark.executor.cores=8 \\\n  --conf spark.executor.memory=64g \\\n  --conf spark.driver.memory=16g \\\n  --conf spark.driver.extraClassPath=tpcds_pyspark/spark-measure_2.13-0.27.jar \\\n  --conf spark.dynamicAllocation.enabled=false \\\n  --conf spark.executor.instances=32 \\\n  --conf spark.sql.shuffle.partitions=512 \\\n  $TPCDS_PYSPARK -d hdfs://\u003cPATH\u003e/tpcds_10000_parquet_1.13.1\n```\n\nExample on Kubernetes with S3 storage and Spark plugins:\n\n```bash\nTPCDS_PYSPARK=$(which tpcds_pyspark_run.py)\n\nspark-submit --master k8s://https://xxx.xxx.xxx.xxx:6443 \\\n  --conf spark.kubernetes.container.image=apache/spark \\\n  --conf spark.kubernetes.namespace=xxx \\\n  --conf spark.eventLog.enabled=false \\\n  --conf spark.task.maxDirectResultSize=2000000000 \\\n  --conf spark.shuffle.service.enabled=false \\\n  --conf spark.executor.cores=8 \\\n  --conf spark.executor.memory=32g \\\n  --conf spark.driver.memory=4g \\\n  --packages org.apache.hadoop:hadoop-aws:3.4.3,ch.cern.sparkmeasure:spark-measure_2.13:0.27,ch.cern.sparkmeasure:spark-plugins_2.13:0.4 \\\n  --conf spark.plugins=ch.cern.HDFSMetrics,ch.cern.CgroupMetrics,ch.cern.CloudFSMetrics \\\n  --conf spark.cernSparkPlugin.cloudFsName=s3a \\\n  --conf spark.dynamicAllocation.enabled=false \\\n  --conf spark.executor.instances=4 \\\n  --conf spark.hadoop.fs.s3a.secret.key=$SECRET_KEY \\\n  --conf spark.hadoop.fs.s3a.access.key=$ACCESS_KEY \\\n  --conf spark.hadoop.fs.s3a.endpoint=\"https://s3.cern.ch\" \\\n  --conf spark.hadoop.fs.s3a.impl=\"org.apache.hadoop.fs.s3AFileSystem\" \\\n  --conf spark.executor.metrics.fileSystemSchemes=\"file,hdfs,s3a\" \\\n  --conf spark.hadoop.fs.s3a.fast.upload=true \\\n  --conf spark.hadoop.fs.s3a.path.style.access=true \\\n  --conf spark.hadoop.fs.s3a.list.version=1 \\\n  $TPCDS_PYSPARK -d s3a://luca/tpcds_100\n```\n\n---\n\n## Legacy implementation (v1)\n\nSpark Dashboard v1 is the original implementation and uses InfluxDB as the time-series backend.\n\nArchitecture reference:\n- [spark-dashboard v1 architecture](https://raw.githubusercontent.com/LucaCanali/Miscellaneous/master/Spark_Dashboard/images/Spark_metrics_dashboard_arch.PNG)\n\nLegacy assets are stored under:\n- `legacy/dockerfiles_v1/`\n- `legacy/charts_v1/`\n\nSee also:\n- [legacy/README.md](legacy/README.md)\n\n### Run Spark Dashboard v1 in a container\n\n```bash\ndocker run -p 3000:3000 -p 2003:2003 -d lucacanali/spark-dashboard:v01\n```\n\n- Port `2003` is the Graphite ingestion endpoint\n- Port `3000` is Grafana\n\nMore options, including persistence across restarts:\n- [legacy/dockerfiles_v1](legacy/dockerfiles_v1)\n\n### Run Spark Dashboard v1 on Kubernetes with Helm\n\n```bash\nhelm install spark-dashboard https://github.com/cerndb/spark-dashboard/raw/master/charts/spark-dashboard-0.3.0.tgz\n```\n\nMore details:\n- [legacy/charts_v1](legacy/charts_v1)\n- [legacy/charts_v1/README.md](legacy/charts_v1/README.md)\n\n\n### Graph annotations\n\nOptionally, you can add annotations for query, job, and stage start and end times in the v1 dashboard.\n\n```bash\nINFLUXDB_HTTP_ENDPOINT=\"http://$(hostname):8086\"\n\n\u003cspark-submit config\u003e\n--packages ch.cern.sparkmeasure:spark-measure_2.13:0.27 \\\n--conf spark.sparkmeasure.influxdbURL=$INFLUXDB_HTTP_ENDPOINT \\\n--conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSink\n```\n\n### Notes\n\n- More details and alternative configurations: [Spark Dashboard notes](https://github.com/LucaCanali/Miscellaneous/tree/master/Spark_Dashboard)\n- The dashboard can be used with Spark on Kubernetes, YARN, Standalone, or local mode\n\n\n### Docker / Podman\n\n- Telegraf uses port `2003` for Graphite ingestion and port `8428` for VictoriaMetrics\n- In v1, InfluxDB uses port `2003` for Graphite ingestion and port `8086` for HTTP access when using `--network=host`\n- Ensure these endpoints are reachable from the Spark driver and executors\n\n### Helm\n\nFind the InfluxDB service IP with:\n\n```bash\nkubectl get service spark-dashboard-influx\n```\n\nExample service DNS name:\n\n```text\nspark-dashboard-influx.default.svc.cluster.local\n```\n\n### Custom dashboards\n\n- The project includes example dashboards, but only a subset of available metrics is visualized by default\n- For the full list of Spark metrics, see the [Spark metrics documentation](https://github.com/apache/spark/blob/master/docs/monitoring.md#metrics)\n- To add new dashboards, place them in the appropriate `grafana_dashboards` folder and rebuild the container image or repackage the Helm chart\n- With Helm, updating the chart is enough to load dashboards through ConfigMaps\n- Automatic persistence of manual Grafana edits is not currently supported\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcerndb%2Fspark-dashboard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcerndb%2Fspark-dashboard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcerndb%2Fspark-dashboard/lists"}