Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tabular-io/docker-spark-iceberg
https://github.com/tabular-io/docker-spark-iceberg
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/tabular-io/docker-spark-iceberg
- Owner: tabular-io
- License: apache-2.0
- Created: 2021-10-14T19:03:28.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-16T18:16:37.000Z (2 months ago)
- Last Synced: 2024-04-16T23:12:15.852Z (2 months ago)
- Language: Jupyter Notebook
- Size: 17.7 MB
- Stars: 168
- Watchers: 14
- Forks: 84
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- spark-joy - spark + iceberg quickstart - [blog](https://tabular.io/blog/docker-spark-and-iceberg/)) (Random Stuff That Doesn't Fit Anywhere / Useful big datasets)
README
# Spark + Iceberg Quickstart Image
This is a docker compose environment to quickly get up and running with a Spark environment and a local REST
catalog, and MinIO as a storage backend.**note**: If you don't have docker installed, you can head over to the [Get Docker](https://docs.docker.com/get-docker/)
page for installation instructions.## Usage
Start up the notebook server by running the following.
```
docker-compose up
```The notebook server will then be available at http://localhost:8888
While the notebook server is running, you can use any of the following commands if you prefer to use spark-shell, spark-sql, or pyspark.
```
docker exec -it spark-iceberg spark-shell
```
```
docker exec -it spark-iceberg spark-sql
```
```
docker exec -it spark-iceberg pyspark
```To stop everything, just run `docker-compose down`.
## Troubleshooting & Maintenance
### Refreshing Docker Image
The prebuilt spark image is uploaded to Dockerhub. Out of convenience, the image tag defaults to `latest`.
If you have an older version of the image, you might need to remove it to upgrade.
```bash
docker image rm tabulario/spark-iceberg && docker-compose pull
```### Building the Docker Image locally
If you want to make changes to the local files, and test them out, you can build the image locally and use that instead:
```bash
docker image rm tabulario/spark-iceberg && docker-compose build
```### Use `Dockerfile` In This Repo
To directly use the Dockerfile in this repo (as opposed to pulling the pre-build `tabulario/spark-iceberg` image), you can use `docker-compose build`.
### Deploying Changes
To deploy changes to the hosted docker image `tabulario/spark-iceberg`, run the following. (Requires access to the tabulario docker hub account)
```sh
cd spark
docker buildx build -t tabulario/spark-iceberg --platform=linux/amd64,linux/arm64 . --push
```---
For more information on getting started with using Iceberg, checkout
the [Quickstart](https://iceberg.apache.org/spark-quickstart/) guide in the official docs.The repository for the docker image is [located on dockerhub](https://hub.docker.com/r/tabulario/spark-iceberg).