https://github.com/michelderu/jupyter-spark-cassandra

Complete environment that allows you to use Jupyter with PySPark in combination with Cassandra and Spark.
https://github.com/michelderu/jupyter-spark-cassandra

Last synced: 3 months ago
JSON representation

Complete environment that allows you to use Jupyter with PySPark in combination with Cassandra and Spark.

Host: GitHub
URL: https://github.com/michelderu/jupyter-spark-cassandra
Owner: michelderu
Created: 2021-03-09T19:05:15.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-03-09T19:21:08.000Z (over 4 years ago)
Last Synced: 2025-01-20T08:49:25.150Z (5 months ago)
Language: Jupyter Notebook
Size: 71.3 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Jupyter notebook with Spark master, 2 workers and a Cassandra database
This repo contains a working environment that allows you to use PySPark in combination with Cassandra and Spark.

## Build a specific version of bitnami-spark
We need to match the Python and Spark version between the spark and jupyter containers.
- `jupyter/pyspark-notebook:29edefbcb06a` is a Jupyter container with Pythin 3.8.8 and Spark 3.0.2
- `bitnamy-spark` will be modified to include Python 3.8.8 (instead of 3.6), it already includes Spark 3.0.2
First build the custom `bitnami-spark` image with:
```sh
cd ./bitnami-docker-spark-custom/3/debian-10
docker build -t custom-bitnami-spark .
```

## Startup the environment
```sh
docker-compose up
```
Wait until Cassandra, Spark-Master, the two Spark-Workers and Jupyter have been started and fire up a notebook.

## Vermont notebook
The vermont notebook and data is based upon: https://levelup.gitconnected.com/using-docker-and-pyspark-134cd4cab867
Link to dataset: https://data.vermont.gov/Finance/Vermont-Vendor-Payments/786x-sbp3
Place the csv in `/jupyter/data`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/michelderu/jupyter-spark-cassandra

Awesome Lists containing this project

README