https://github.com/duckdb/ducklake
DuckLake is an integrated data lake and catalog format
https://github.com/duckdb/ducklake
Last synced: 4 months ago
JSON representation
DuckLake is an integrated data lake and catalog format
- Host: GitHub
- URL: https://github.com/duckdb/ducklake
- Owner: duckdb
- License: mit
- Created: 2025-03-03T16:01:19.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-07-04T14:42:25.000Z (4 months ago)
- Last Synced: 2025-07-04T16:07:27.794Z (4 months ago)
- Language: C++
- Homepage: https://ducklake.select
- Size: 1.04 MB
- Stars: 1,730
- Watchers: 24
- Forks: 65
- Open Issues: 50
-
Metadata Files:
- Readme: docs/README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-duckdb - `ducklake` - For DuckLake support. (Extensions / [Core Extensions](https://duckdb.org/docs/stable/core_extensions/overview))
README
# DuckDB DuckLake Extension
> DuckLake is released under version 0.1 and is currently experimental. If you encounter any issues, please file them [here](https://github.com/duckdb/ducklake/issues).
DuckLake is an open Lakehouse format that is built on SQL and Parquet. DuckLake stores metadata in a [catalog database](https://ducklake.select/docs/stable/duckdb/usage/choosing_a_catalog_database), and stores data in Parquet files. The DuckLake extension allows DuckDB to directly read and write data from DuckLake.
See the [DuckLake website](https://ducklake.select) for more information.
## Installation
DuckLake can be installed using the `INSTALL` command:
```sql
INSTALL ducklake;
```
The latest development version can be installed from `core_nightly`:
```sql
FORCE INSTALL ducklake FROM core_nightly;
```
## Usage
DuckLake databases can be attached using the [`ATTACH`](https://duckdb.org/docs/stable/sql/statements/attach.html) syntax, after which tables can be created, modified and queried using standard SQL.
Below is a short usage example that stores the metadata in a DuckDB database file called `metadata.ducklake`, and the data in Parquet files in the `file_path` directory:
```sql
ATTACH 'ducklake:metadata.ducklake' AS my_ducklake (DATA_PATH 'file_path/');
USE my_ducklake;
CREATE TABLE my_ducklake.my_table(id INTEGER, val VARCHAR);
INSERT INTO my_ducklake.my_table VALUES (1, 'Hello'), (2, 'World');
FROM my_ducklake.my_table;
┌───────┬─────────┐
│ id │ val │
│ int32 │ varchar │
├───────┼─────────┤
│ 1 │ Hello │
│ 2 │ World │
└───────┴─────────┘
```
##### Updates
```sql
UPDATE my_ducklake.my_table SET val='DuckLake' WHERE id=2;
FROM my_ducklake.my_table;
┌───────┬──────────┐
│ id │ val │
│ int32 │ varchar │
├───────┼──────────┤
│ 1 │ Hello │
│ 2 │ DuckLake │
└───────┴──────────┘
```
##### Time Travel
```sql
FROM my_ducklake.my_table AT (VERSION => 2);
┌───────┬─────────┐
│ id │ val │
│ int32 │ varchar │
├───────┼─────────┤
│ 1 │ Hello │
│ 2 │ World │
└───────┴─────────┘
```
##### Schema Evolution
```sql
ALTER TABLE my_ducklake.my_table ADD COLUMN new_column VARCHAR;
FROM my_ducklake.my_table;
┌───────┬──────────┬────────────┐
│ id │ val │ new_column │
│ int32 │ varchar │ varchar │
├───────┼──────────┼────────────┤
│ 1 │ Hello │ NULL │
│ 2 │ DuckLake │ NULL │
└───────┴──────────┴────────────┘
```
##### Change Data Feed
```sql
FROM my_ducklake.table_changes('my_table', 2, 2);
┌─────────────┬───────┬─────────────┬───────┬─────────┐
│ snapshot_id │ rowid │ change_type │ id │ val │
│ int64 │ int64 │ varchar │ int32 │ varchar │
├─────────────┼───────┼─────────────┼───────┼─────────┤
│ 2 │ 0 │ insert │ 1 │ Hello │
│ 2 │ 1 │ insert │ 2 │ World │
└─────────────┴───────┴─────────────┴───────┴─────────┘
```
See the [Usage](https://ducklake.select/docs/stable/duckdb/introduction) guide for more information.
## Building & Loading the Extension
To build, type
```
git submodule init
git submodule update
# to build with multiple cores, use `make GEN=ninja release`
make pull
make
```
To run, run the bundled `duckdb` shell:
```
./build/release/duckdb
```