Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nikoshet/monitoring-spark-on-docker
Spark Monitoring With Prometheus And Grafana Using Docker
https://github.com/nikoshet/monitoring-spark-on-docker
docker docker-compose grafana hadoop hdfs monitoring node-exporter prometheus spark
Last synced: about 2 months ago
JSON representation
Spark Monitoring With Prometheus And Grafana Using Docker
- Host: GitHub
- URL: https://github.com/nikoshet/monitoring-spark-on-docker
- Owner: nikoshet
- License: mit
- Created: 2020-09-15T07:26:20.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-11-09T10:06:25.000Z (about 4 years ago)
- Last Synced: 2024-04-19T19:08:01.333Z (9 months ago)
- Topics: docker, docker-compose, grafana, hadoop, hdfs, monitoring, node-exporter, prometheus, spark
- Language: Shell
- Homepage:
- Size: 257 KB
- Stars: 6
- Watchers: 2
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Monitoring Apache Spark and HDFS on Docker with Prometheus and Grafana
## Goal
The goal of this project is to:
- Create a Docker Container that runs Spark on top of HDFS
- Use Prometheus to get metrics from Spark applications and Node-exporter
- Use Grafana to display the metrics collected## Configuration
- Hadoop Configurations for core-sites.xml and hadoop-env.sh are set [here](https://github.com/nikoshet/monitoring-spark-on-docker/blob/0b363ce7f0586ea9041e270e1a4fb7abfb6e52b5/Spark/install.sh#L27).
- Spark Configurations for spark-env.sh and spark-defaults.conf are set [here](https://github.com/nikoshet/monitoring-spark-on-docker/blob/0b363ce7f0586ea9041e270e1a4fb7abfb6e52b5/Spark/install.sh#L53).
- Environment variables for Spark/Hadoop versions and library paths are set [here](https://github.com/nikoshet/monitoring-spark-on-docker/blob/0b363ce7f0586ea9041e270e1a4fb7abfb6e52b5/Spark/Dockerfile#L6).## Notes
- Spark version running is 3.0.1, and HDFS version is 3.2.0.
- For all available metrics for Spark monitoring see [here](https://spark.apache.org/docs/2.2.0/monitoring.html#metrics).
- The containerized environment consists of a Master, a Worker, a DataNode, a NameNode and a SecondaryNameNode.
- To track metrics across Spark apps, appName needs to be set up or else the spark.metrics.namespace will be spark.app.id that changes after every invocation of the app.
- Main Python Application running is app.py that is an example application computing number pi. For your own application/use of HDFS please do changes accordingly.
- Dockerfile for Spark/Hadoop is also available [here](https://hub.docker.com/repository/docker/nikoshet/spark-hadoop/general) in order to add it in docker-compose.yaml file as seen [here](https://github.com/nikoshet/monitoring-spark-on-docker/blob/820dee01d771e8cf6ec3a7b27ede8aa0eeef2214/docker-compose.yaml#L54).## Usage
Assuming that Docker is installed, simply execute the following command to build and run the Docker Containers:
```
docker-compose build && docker-compose up
```
## Screenshots
- Example dashboard for Spark Metrics:
- All available services from Service Discovery in Prometheus:
## Troobleshooting
Please file issues if you run into any problems.