https://github.com/eocode/docker-spark-big-data
Exercises in Spark with Docker and Data Languages
https://github.com/eocode/docker-spark-big-data
big-data data-science docker java python scala spark
Last synced: 2 months ago
JSON representation
Exercises in Spark with Docker and Data Languages
- Host: GitHub
- URL: https://github.com/eocode/docker-spark-big-data
- Owner: eocode
- Created: 2020-09-26T16:14:01.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-10-09T21:12:53.000Z (about 5 years ago)
- Last Synced: 2023-03-08T06:53:20.521Z (almost 3 years ago)
- Topics: big-data, data-science, docker, java, python, scala, spark
- Language: Jupyter Notebook
- Homepage:
- Size: 4.39 MB
- Stars: 21
- Watchers: 2
- Forks: 20
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spark Projects with Docker
Project build using: https://github.com/big-data-europe/docker-spark
Supported versions:
* Spark 3.0.0 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
* Spark 2.4.5 for Hadoop 2.7+ with OpenJDK 8
## How to start
```bash
docker-compose up
```
Master:
http://localhost:8080
Workers:
http://localhost:8081
http://localhost:8082
## Execute container with worker 1
```sh
docker exec -it spark-worker-1 bash
```
## Python examples
Run pyspark CLI:
```sh
# Run pyspark CLI
./spark/bin/pyspark
# Execute a file
cd home/python/example
./../../../spark/bin/spark-submit example.py data.csv
```
**Spark monitor:**
http://localhost:4040
http://localhost:4041
apk add gcc
pip3 install notebook