https://github.com/baptvit/open-table-formats-labs
Open table format exploraions and PoC
https://github.com/baptvit/open-table-formats-labs
Last synced: about 2 months ago
JSON representation
Open table format exploraions and PoC
- Host: GitHub
- URL: https://github.com/baptvit/open-table-formats-labs
- Owner: baptvit
- License: apache-2.0
- Created: 2025-01-14T20:34:49.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-14T20:42:56.000Z (over 1 year ago)
- Last Synced: 2025-01-25T12:13:17.080Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 590 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README copy.md
- License: LICENSE
Awesome Lists containing this project
README
# Open Tables Format Labs








Experimentations with Delta, Iceberg, Hive, and other Big Data tools
## Tech Stack
- [PySpark](https://spark.apache.org/docs/latest/api/python/user_guide)
- [Delta.io](https://docs.delta.io/latest/quick-start.html)
- [Iceberg](https://iceberg.apache.org/spark-quickstart/)
- [Hudi](https://hudi.apache.org/docs/quick-start-guide/)
- [uv](https://docs.astral.sh/uv/concepts/projects/dependencies/)
- [Docker](https://docs.docker.com/get-docker/)
## Up and Running
### Developer Setup
**1.** Install Java 7, 11, 17 or 21, Apache Maven, Spark 3.5.x, and Hadoop.
- Apache XTable use Java 11
- Apache Ranger use Java 7
- Apache Spark use Java +17
**2.** Install the dependencies on `pyproject.toml`:
```shell
uv sync
```
**3.** Activate the virtualenv created by `uv`:
```shell
source .venv/bin/activate
```
**4.** Spin up Minio (Object Storage), and Hive Metastore on Docker compose
```shell
docker compose up -d
```