https://github.com/jayhan94/minilake
A morden mini lakehouse based on Spark and Delta running in the docker.
https://github.com/jayhan94/minilake
analytics datalake deltalake lakehouse spark
Last synced: 7 months ago
JSON representation
A morden mini lakehouse based on Spark and Delta running in the docker.
- Host: GitHub
- URL: https://github.com/jayhan94/minilake
- Owner: jayhan94
- License: mit
- Created: 2025-01-16T02:09:14.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-01-16T02:41:57.000Z (9 months ago)
- Last Synced: 2025-01-16T03:30:04.992Z (9 months ago)
- Topics: analytics, datalake, deltalake, lakehouse, spark
- Language: Dockerfile
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MiniLake
A morden mini lakehouse based on Spark and Iceberg running in the docker.# Usage
Build and run
```bash
docker compose up --build
```Attach the spark container
```bash
docker exec -it spark-iceberg /opt/spark/bin/spark-sql
```Create table
```SQL
CREATE TABLE student (id INT, name STRING, age INT) USING ICEBERG LOCATION 's3://minilake/student';
```Insert data
```SQL
INSERT INTO student VALUES (1, 'jay', 15), (2, 'dove', 15);
```Execute query
```SQL
SELECT * FROM student;
```# TODO
1. A standalone catalog server.
2. Ingesting real-time data from Kafka.
3. CDC.