Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tuancamtbtx/python-spark-example
Spark template to submit to cluster
https://github.com/tuancamtbtx/python-spark-example
python spark
Last synced: about 1 month ago
JSON representation
Spark template to submit to cluster
- Host: GitHub
- URL: https://github.com/tuancamtbtx/python-spark-example
- Owner: tuancamtbtx
- Created: 2019-08-09T18:00:41.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-12-20T03:35:05.000Z (about 5 years ago)
- Last Synced: 2024-11-09T02:38:15.059Z (3 months ago)
- Topics: python, spark
- Language: Python
- Homepage:
- Size: 1.86 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## requirements
Spark with pyspark## spark-submit with pyspark simple template
A simple usage example of pyspark with spark-submit, including:
- passing arguments
- creating spark context and sql context
- loading your project source code (src directory )
- loading pip modules (with simple requirements file)## preparing libraries (source and pip modules)
from terminal, run:
```pip install -r ./requirements.txt -t ./pip_modules && jar -cvf pip_modules.jar -C ./pip_modules . ```
```jar -cvf src.jar -C ./src . ```## running spark-submit
from terminal, run:
```spark-submit main.py --some_arg=some_value --pip_modules=pip_modules.jar --src=src.jar ```## running pyspark interactive shell with pip modules and source code
- run: pyspark
- from within pyspark interactive shell, run the following:
```sc.addPyFile("src.jar")```
```sc.addPyFile("pip_modules.jar")```#