https://github.com/scrogson/duckpond-rs
Rust implementation of the DuckLake lakehouse format
https://github.com/scrogson/duckpond-rs
datalake ducklake mysql postgres rust sqlite
Last synced: 3 months ago
JSON representation
Rust implementation of the DuckLake lakehouse format
- Host: GitHub
- URL: https://github.com/scrogson/duckpond-rs
- Owner: scrogson
- License: mit
- Created: 2025-06-24T03:24:36.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-06-25T04:33:34.000Z (3 months ago)
- Last Synced: 2025-06-25T05:28:23.317Z (3 months ago)
- Topics: datalake, ducklake, mysql, postgres, rust, sqlite
- Language: Rust
- Homepage:
- Size: 108 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DuckPond
A Rust implementation of the [DuckLake specification](https://ducklake.select/docs/stable/specification/introduction/) - a new lakehouse format that uses SQL databases for metadata management while storing data as Parquet files.
## Overview
DuckLake rethinks lakehouse architecture by:
- **Storing metadata in SQL databases** (PostgreSQL, SQLite, MySQL) instead of file-based systems
- **Keeping data in Parquet files** on object storage (S3, local filesystem, etc.)
- **Providing ACID transactions** across multiple tables
- **Supporting time travel** and schema evolution
- **Eliminating metadata file sprawl** that plagues other lakehouse formats## Features
- DuckLake 0.2 specification implementation
- **Multi-database support**: PostgreSQL, MySQL, and SQLite
- Automatic database type detection
- ACID transactions with snapshot isolation
- Schema evolution and time travel
- Rust-native with tokio and sqlx
- Parquet file management
- S3/object storage integration## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
duckpond = "0.0.1"
```Install the CLI
```bash
cargo install duckpond
```## Quick Start
### Prerequisites
- Rust 1.70+
- One of: PostgreSQL, MySQL, or SQLite
- `duckpond` CLI for migrations### Setup
1. **Choose your database and set up:**
#### PostgreSQL (Recommended for Production)
```bash
# Create a PostgreSQL database
createdb duckpond# Set environment variables
export DATABASE_URL="postgres://username:password@localhost/duckpond"
export DUCKLAKE_DATA_PATH="./data"
```#### MySQL
```bash
# Create a MySQL database
mysql -e "CREATE DATABASE duckpond;"# Set environment variables
export DATABASE_URL="mysql://username:password@localhost/duckpond"
export DUCKLAKE_DATA_PATH="./data"
```#### SQLite (Great for Development)
```bash
# SQLite will create the file automatically
export DATABASE_URL="sqlite://duckpond.db"
export DUCKLAKE_DATA_PATH="./data"
```2. **Run migrations:**
```bash
# The migrations work with all supported database types
duckpond migrate run --database-url $DATABASE_URL
```3. **Run the example:**
```bash
cargo run --example comprehensive
cargo run --example read_data
```## Database Migration
The migration file `crates/duckpond-cli/migrations/20250624030102_create_duckpond_tables.sql` is **cross-database compatible** and creates all 19 tables required by the DuckPond specification:
### Core Tables
- `ducklake_metadata` - Global instance metadata
- `ducklake_snapshot` - Snapshot tracking (commits)
- `ducklake_snapshot_changes` - Change logs
- `ducklake_schema` - Schema definitions
- `ducklake_table` - Table definitions
- `ducklake_column` - Column definitions### Data Management
- `ducklake_data_file` - Parquet data files
- `ducklake_delete_file` - Delete marker files
- `ducklake_files_scheduled_for_deletion` - Cleanup tracking
- `ducklake_inlined_data_tables` - Small data inlining### Statistics & Performance
- `ducklake_table_stats` - Table-level statistics
- `ducklake_table_column_stats` - Column statistics
- `ducklake_file_column_statistics` - File-level column stats### Partitioning
- `ducklake_partition_info` - Partition schemes
- `ducklake_partition_column` - Partition column definitions
- `ducklake_file_partition_value` - File partition values### Metadata & Tagging
- `ducklake_tag` - General purpose tags
- `ducklake_column_tag` - Column-specific tags
- `ducklake_view` - SQL view definitions## Contributing
This project implements the [DuckLake specification](https://ducklake.select/docs/stable/specification/introduction/). Contributions are welcome!
## References
- [DuckLake Official Documentation](https://ducklake.select/)
- [DuckLake Blog Post](https://duckdb.org/2025/05/27/ducklake.html)
- [DuckLake Specification](https://ducklake.select/docs/stable/specification/introduction/)
- [DuckDB Extension](https://duckdb.org/docs/stable/core_extensions/ducklake.html)## License
MIT License (following DuckLake specification license)