https://github.com/spektom/spark-flamegraph
Easy CPU Profiling for Apache Spark applications
https://github.com/spektom/spark-flamegraph
apache-spark cpu-profiling flamegraph spark
Last synced: 14 days ago
JSON representation
Easy CPU Profiling for Apache Spark applications
- Host: GitHub
- URL: https://github.com/spektom/spark-flamegraph
- Owner: spektom
- License: apache-2.0
- Created: 2017-09-20T08:32:11.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-08-20T18:44:21.000Z (almost 5 years ago)
- Last Synced: 2025-04-03T11:51:19.336Z (about 2 months ago)
- Topics: apache-spark, cpu-profiling, flamegraph, spark
- Language: Shell
- Homepage:
- Size: 999 KB
- Stars: 45
- Watchers: 3
- Forks: 12
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
spark-flamegraph
================
Easy CPU Profiling for [Apache Spark](https://spark.apache.org/) applications.
The script `spark-submit-flamegraph` is a wrapper around standard `spark-submit` that generates [Flame Graph](http://www.brendangregg.com/flamegraphs.html).
## Supported Systems
* Amazon EMR
* Most Linux distributions
* Mac (with [Homebrew](https://brew.sh/) installed)## Prerequisites
The script is adapted for work in [Amazon EMR](https://aws.amazon.com/emr/).
Otherwise the following utilities must present on your system:* perl
* python2.7 (or set `PYTHON` environment variable to the Python executabl)
* pip (or set `PIP` environment variable to the pip utility)## Running
```bash
wget -O /usr/local/bin/spark-submit-flamegraph \
https://raw.githubusercontent.com/spektom/spark-flamegraph/master/spark-submit-flamegraphchmod +x /usr/local/bin/spark-submit-flamegraph
```Use `spark-submit-flamegraph` as a replacement for the `spark-submit` command.
## Configuration
To configure use the following environment variables:
| Environment Variable | Description | Default value |
| -------------------- | ------------ | ------------- |
| `SPARK_CMD` | Spark command to run | spark-submit |
| `PYTHON` | Path to the Python executable | python2.7 |
| `PIP` | Path to the pip utility | pip |For example, to profile Spark shell session set `SPARK_CMD` environment variable:
```bash
SPARK_CMD=spark-shell /usr/local/bin/spark-submit-flamegraph
```## Details
The script does the following operations to make profiling Spark applications as easy as possible:
* Downloads InfluxDB, and starts it on some random port.
* Starts Spark application using original `spark-submit` command, with the StatsD profiler Jar in its classpath and with the configuration that tells it to report statistics back to the InfluxDB instance.
* After running Spark application, queries all the reported metrics from the InfluxDB instance.
* Run a script that generates the .SVG file.
* Stops the InfluxDB instance.