Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/projectnessie/nessie-demos
Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.
https://github.com/projectnessie/nessie-demos
binder iceberg jupyter-notebooks nessie spark
Last synced: about 1 month ago
JSON representation
Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.
- Host: GitHub
- URL: https://github.com/projectnessie/nessie-demos
- Owner: projectnessie
- License: apache-2.0
- Created: 2021-05-03T16:29:35.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-11-02T19:52:08.000Z (about 2 months ago)
- Last Synced: 2024-11-02T20:26:46.273Z (about 2 months ago)
- Topics: binder, iceberg, jupyter-notebooks, nessie, spark
- Language: Jupyter Notebook
- Homepage: https://projectnessie.org/
- Size: 813 KB
- Stars: 28
- Watchers: 10
- Forks: 21
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# Nessie Binder Demos
These demos run under binder and can be found at:
* [Spark and Iceberg](https://mybinder.org/v2/gh/projectnessie/nessie-demos/main?labpath=notebooks%2Fnessie-iceberg-demo-nba.ipynb)
* [Flink and Iceberg](https://mybinder.org/v2/gh/projectnessie/nessie-demos/main?labpath=notebooks%2Fnessie-iceberg-flink-demo-nba.ipynb)
* [Hive and Iceberg](https://mybinder.org/v2/gh/projectnessie/nessie-demos/main?labpath=notebooks%2Fnessie-iceberg-hive-demo-nba.ipynb)They are automatically rebuilt every time we push to main. They are unit tested using `testbook` library to ensure we get
the correct results as the underlying libraries continue to grow/mature.## Upgrade instructions
Because of the split between Binder and unit tests it wasn't totally trivial to create a single place to update all versions.
Some versions have to be updated in multiple places:### Nessie
Nessie version is set in Binder at `docker/binder/requirements_base.txt`. Currently, the demos are using 0.74.x of Nessie.
### Iceberg
Currently we are using Iceberg `1.4.2` and it is specified in both iceberg notebooks as well as `docker/utils/__init__.py`
### Spark
Only has to be updated in `docker/binder/requirements.txt`. Currently, Iceberg supports 3.2, 3.3, 3.4 and 3.5, we use Spark 3.2 in the demos.
### Flink
Flink version is set in Binder at `docker/binder/requirements_flink.txt`. Currently, we are using `1.17.1`.
### Hadoop
Hadoop libs are used by flink and currently specified in `docker/utils/__init__.py` only. We use `2.10.1` with Flink and Hive.
### Hive
Current Hive version that is being used `2.3.9` which supports Hadoop version of `2.10.1`. To update the version, it needs to be only updated
in `docker/utils/__init__.py`.## Binder
[Binder](https://mybinder.org) is a more customizable platform for Jupyter notebooks and
more (see their website). Binder generates a Dockerfile + image based on the settings in the
source GitHub repository (other sources are possible). It is possible to pre-install both
e.g. Ubuntu and/or Python packages into the Docker image generated by Binder.Of course, Binder just lets a user "simply start" a notebook via a simple "click on a link".
## Development
For development, you will need to make sure to have the following installed:
- Python 3.10+
- pre-commitRegarding pre-commit, you will need to make sure is installed through `pre-commit install` in order to install the hooks locally since this repo
executes some several scripts in pre-commit stage.To run the notebooks unit tests, in `notebook` folder, run the following commands:
1. `python -m pip install -r requirements_dev.txt`
2. `tox`Running the unit tests takes time since it will need to download all the binaries files like Hive, Flink ..etc and then it will
run the tests.