https://github.com/beaglefoot/pyspark-venv
https://github.com/beaglefoot/pyspark-venv
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/beaglefoot/pyspark-venv
- Owner: Beaglefoot
- Created: 2022-07-09T20:00:37.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2022-07-09T20:00:52.000Z (almost 3 years ago)
- Last Synced: 2025-02-03T23:54:51.359Z (3 months ago)
- Language: Python
- Size: 6.84 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# pyspark-venv
This is an experiment to test if it's possible to mix spark jobs with customized virtual environments.
## How to install
It's not really necessary to install top-level dependencies except for proper IDE support:
```
poetry install
```Install customized virtual environment:
```
cd myvenv
poetry install
```This is mounted as volume to all spark nodes.
## How to run
Start all containers:
```
docker-compose up
```Jump into spark master node container:
```
docker-compose exec spark bash
```Inside of container:
```
spark-submit /main.py
```The output:
```
--- venv ----
+----------+---------+----------+
|first_name|last_name|instrument|
+----------+---------+----------+
| John| Coltrane| Sax|
| Joe| Pass| Guitar|
| Louis|Armstrong| Trumpet|
+----------+---------+----------+
```