Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vemonet/setup-spark
:octocat:✨ Setup Apache Spark in GitHub Action workflows
https://github.com/vemonet/setup-spark
apache-spark github-actions setup spark
Last synced: 2 months ago
JSON representation
:octocat:✨ Setup Apache Spark in GitHub Action workflows
- Host: GitHub
- URL: https://github.com/vemonet/setup-spark
- Owner: vemonet
- License: mit
- Created: 2020-10-27T14:39:25.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-02-28T08:11:51.000Z (11 months ago)
- Last Synced: 2024-04-29T17:21:24.623Z (9 months ago)
- Topics: apache-spark, github-actions, setup, spark
- Language: TypeScript
- Homepage: https://github.com/marketplace/actions/setup-apache-spark
- Size: 885 KB
- Stars: 18
- Watchers: 7
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# ✨ setup-spark
[![Test setup-spark action](https://github.com/vemonet/setup-spark/actions/workflows/test.yml/badge.svg)](https://github.com/vemonet/setup-spark/actions/workflows/test.yml)
This action sets up Apache Spark in your environment for use in GitHub Actions by:
- installing and adding `spark-submit` and `spark-shell` to the `PATH`
- setting required environment variables such as `SPARK_HOME`, `PYSPARK_PYTHON` in the workflowThis enables to test applications using a local Spark context in GitHub Actions.
## 🪄 Usage
You will need to setup **Python** and **Java** in the job before setting up **Spark**
Check for the latest Spark versions at https://spark.apache.org/downloads.html
Basic workflow:
```yaml
steps:
- uses: actions/setup-python@v5
with:
python-version: '3.10'- uses: actions/setup-java@v4
with:
java-version: '21'
distribution: temurin- uses: vemonet/setup-spark@v1
with:
spark-version: '3.5.3'
hadoop-version: '3'- run: spark-submit --version
```See the [action.yml](action.yml) file for a complete rundown of the available parameters.
You can also define various options, such as providing a specific URL to download the Spark `.tgz`, or using a specific scala version:
```yaml
- uses: vemonet/setup-spark@v1
with:
spark-version: '3.5.3'
hadoop-version: '3'
scala-version: '2.13'
spark-url: 'https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz'
xms: '1024M'
xmx: '2048M'
log-level: 'debug'
install-folder: '/home/runner/work'
```## ️🏷️ Available versions
Check for the latest Spark versions at https://spark.apache.org/downloads.html
The Hadoop version stays quite stable.
The `setup-spark` action is tested for various versions of Spark and Hadoop in [`.github/workflows/test.yml`](https://github.com/vemonet/setup-spark/blob/main/.github/workflows/test.yml)
## 📝 Contributions
Contributions are welcome! Feel free to test other Spark versions, and submit [issues](/issues), or [pull requests](https://github.com/vemonet/setup-spark/blob/main/CONTRIBUTING.md).
See the [contributor's guide](https://github.com/vemonet/setup-spark/blob/main/CONTRIBUTING.md) for more details.