https://github.com/spektom/spark-flamegraph

Easy CPU Profiling for Apache Spark applications
https://github.com/spektom/spark-flamegraph

apache-spark cpu-profiling flamegraph spark

Last synced: 2 months ago
JSON representation

Easy CPU Profiling for Apache Spark applications

Host: GitHub
URL: https://github.com/spektom/spark-flamegraph
Owner: spektom
License: apache-2.0
Created: 2017-09-20T08:32:11.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2020-08-20T18:44:21.000Z (almost 5 years ago)
Last Synced: 2025-04-03T11:51:19.336Z (3 months ago)
Topics: apache-spark, cpu-profiling, flamegraph, spark
Language: Shell
Homepage:
Size: 999 KB
Stars: 45
Watchers: 3
Forks: 12
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        spark-flamegraph

================

![Build Status](https://github.com/spektom/spark-flamegraph/workflows/CI/badge.svg)

Easy CPU Profiling for [Apache Spark](https://spark.apache.org/) applications.

The script `spark-submit-flamegraph` is a wrapper around standard `spark-submit` that generates [Flame Graph](http://www.brendangregg.com/flamegraphs.html).



## Supported Systems

 * Amazon EMR

 * Most Linux distributions

 * Mac (with [Homebrew](https://brew.sh/) installed)

## Prerequisites

The script is adapted for work in [Amazon EMR](https://aws.amazon.com/emr/).

Otherwise the following utilities must present on your system:

 * perl

 * python2.7 (or set `PYTHON` environment variable to the Python executabl)

 * pip (or set `PIP` environment variable to the pip utility)

## Running

```bash

wget -O /usr/local/bin/spark-submit-flamegraph \

  https://raw.githubusercontent.com/spektom/spark-flamegraph/master/spark-submit-flamegraph

chmod +x /usr/local/bin/spark-submit-flamegraph

```

Use `spark-submit-flamegraph` as a replacement for the `spark-submit` command.

## Configuration

To configure use the following environment variables:

| Environment Variable | Description  | Default value |

| -------------------- | ------------ | ------------- |

| `SPARK_CMD` | Spark command to run | spark-submit |

| `PYTHON` | Path to the Python executable | python2.7 |

| `PIP` | Path to the pip utility | pip |

For example, to profile Spark shell session set `SPARK_CMD` environment variable:

```bash

SPARK_CMD=spark-shell /usr/local/bin/spark-submit-flamegraph

```

## Details

The script does the following operations to make profiling Spark applications as easy as possible:

  * Downloads InfluxDB, and starts it on some random port.

  * Starts Spark application using original `spark-submit` command, with the StatsD profiler Jar in its classpath and with the configuration that tells it to report statistics back to the InfluxDB instance.

  * After running Spark application, queries all the reported metrics from the InfluxDB instance.

  * Run a script that generates the .SVG file.

  * Stops the InfluxDB instance.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/spektom/spark-flamegraph

Awesome Lists containing this project

README