An open API service indexing awesome lists of open source software.

https://github.com/tegmentum/ducklink


https://github.com/tegmentum/ducklink

Last synced: about 8 hours ago
JSON representation

Awesome Lists containing this project

README

          


DuckLink

# DuckDB WebAssembly Components

This repository contains a pair of WebAssembly components that wrap the DuckDB C API (`libduckdb`) and expose it through the Wasm component model.

- `ducklink-core`: Implements the `duckdb:component/database` world and provides structured access to DuckDB connections and SQL execution.
- `ducklink-cli`: Implements the `wasi:cli/run` world and offers a WASI-native command line interface that mirrors the behaviour of the native DuckDB shell while delegating database access through the component interface.

Both components are intended to run in preview2-capable runtimes such as `wasmtime 16.0+`.

## Extension catalog

The repo also ships **111 component extensions** (254 SQL functions) — Rust
`wasm32-wasip2` components implementing the `duckdb:extension` WIT world, loadable
at runtime with `LOAD ` and verified by `tooling/smoke.py`. They span text
& NLP, encodings, crypto, aggregates (bloom/minhash/count-min sketches), and gated
network (dns/http). See **[CATALOG.md](CATALOG.md)** for the full index
(regenerate with `python3 tooling/gen-catalog.py`; verify integrity with
`python3 tooling/verify-catalog.py`).

## Repository layout

```
wit/
core/ Shared database interface definitions
standalone/ WASI-oriented worlds (standalone DB + CLI)
browser/ Browser-oriented database world
crates/
libduckdb-sys/ bindgen-based bindings to the DuckDB C API
ducklink-core/ Component implementation of the DuckDB API
ducklink-cli/ WASI CLI component built on top of the exported API
scripts/
build-libduckdb-wasm.sh Helper for cross-compiling DuckDB to wasm32-wasi
cmake/toolchains/
wasi-sdk.cmake Toolchain file for building DuckDB with wasi-sdk
```

## Prerequisites

1. **DuckDB source** at `DUCKDB_SOURCE_DIR` (e.g. `~/src/duckdb`). A shallow clone is sufficient:
```bash
git clone https://github.com/duckdb/duckdb.git ~/src/duckdb
```
2. **wasi-sdk** (tested with 33.0; exception handling requires >= 33) with `WASI_SDK_PREFIX` pointing at the installation root. A predownloaded copy lives under `external/wasi-sdk-33.0-`; point the variable there if you do not have a global install.
3. **Rust tooling**:
- `rustup target add wasm32-wasi`
- `cargo install cargo-component`
4. **wit-bindgen tooling** (included automatically by `cargo-component`).

Network access is required only when fetching DuckDB or installing the toolchain.

## Building `libduckdb` for wasm32-wasi

The component links against a statically built `libduckdb` compiled for `wasm32-wasi`. Use the helper script to cross-compile the library:

```bash
export DUCKDB_SOURCE_DIR=~/src/duckdb
export WASI_SDK_PREFIX="$(pwd)/external/wasi-sdk-33.0-arm64-macos"
export WASI_TARGET_TRIPLE=wasm32-wasip2
export WASM_EXTENSIONS=json # defaults to json if unset; comma‑separate to add more later
scripts/build-libduckdb-wasm.sh
```

The script places `libduckdb-wasi.a` under `artifacts/`. Afterwards set the following environment variables so the Rust build can locate the headers and the archive:

```bash
export DUCKDB_INCLUDE_DIR="$DUCKDB_SOURCE_DIR/src/include"
export DUCKDB_STATIC_LIB="$(pwd)/artifacts/libduckdb-wasi.a"
```

### Browser-oriented static library

For the browser component you will need a DuckDB archive compiled for the appropriate `wasm32-unknown-unknown` (or equivalent) target. Once built, point `DUCKDB_STATIC_LIB` at that archive and use the `make core-browser` target to produce `ducklink_core.wasm` with the `browser` feature enabled.

## Building the components

Compile both components using the make targets (they call `cargo component` under the hood):

```bash
make
```

Individual targets are also available:

```bash
make core
make ducklink-cli

# Build the browser-oriented core (requires a browser-compatible DuckDB static archive)
make core-browser BROWSER_TARGET=wasm32-unknown-unknown
```

The resulting component binaries are generated in `target/wasm32-wasi/release/`:

- `ducklink_core.wasm`
- `ducklink_cli.wasm`

## Developing component extensions

Extensions live under `extensions/-component`, register imperatively in
`load()` against the `duckdb:extension` world, and are tracked by the tooling in
`tooling/` + `registry/` (mirrors `~/git/sqlite-wasm`'s system). The full
roadmap is in [PLAN-duckdb-extensions.md](PLAN-duckdb-extensions.md).

```bash
# Scaffold a skeleton (consults tooling/compat-registry.json for crate status,
# registers the workspace member, and cargo-checks that it compiles):
make ext-scaffold NAME=myext CRATE=base32,bs58

# Edit extensions/myext-component/src/lib.rs + smoke.sql, then build + smoke:
make ext NAME=myext-component

# Seed assertions from current output, review, and re-run to assert:
python3 tooling/smoke.py --seed-expected myext
python3 tooling/smoke.py myext

make ext-smoke-all # smoke every extension
make ext-list-broken # crates flagged un-buildable on wasm32-wasip2
python3 tooling/t-status.py # tooling-improvement items from build experience
```

Extensions load through the **native host runner** (`ducklink`); the
wac-composed standalone CLI links a no-op loader stub and cannot instantiate
them. `isin` (hand-rolled) and `baseN` (crate-backed) are worked examples. See
[docs/component-extension-guide.md](docs/component-extension-guide.md) for the
capability surface and packaging details.

## Using the components

### Component worlds

- `wit/core/duckdb-core.wit` defines the shared `duckdb:component/database` interface implemented by the core component.
- `wit/standalone/duckdb-standalone.wit` exports the database world for WASI runtimes, while `wit/standalone/duckdb-cli.wit` wires in the CLI experience on top of it.
- `wit/browser/duckdb-browser.wit` will back the browser-friendly component variant, sharing the same database surface but relying on host-provided storage and networking.

### Direct database access

Instantiate the database component with a runtime that supports the component model. For example, using `wasmtime`:

```bash
wasmtime component run target/wasm32-wasi/release/ducklink_core.wasm --dir .
```

Pre-open directories that contain database files (e.g. `--dir .`) so the component can access them via WASI.

### CLI component

The CLI component imports the database world and exposes a `wasi:cli` entry point. To run it with wasmtime you can compose the CLI and core components using the [`wac`](https://github.com/bytecodealliance/wac) tool:

```bash
# Install the wac CLI once
cargo install wac-cli

# Compose the CLI + core component pair
wac plug target/wasm32-wasip2/release/ducklink_cli.wasm \
--plug target/wasm32-wasip2/release/ducklink_core.wasm \
-o artifacts/duckdb-cli.wasm

# Execute a query (grant directory access for any on-disk database file)
wasmtime run artifacts/duckdb-cli.wasm --dir . -- :memory: -c "select 42;"
```

For quick validation there is also a helper script that performs the `wac plug`
step and executes a simple query:

```bash
scripts/smoke-cli.sh
```

The script accepts optional environment variables (`SQL`, `DB_PATH`, `EXTRA_WASMTIME_FLAGS`, `EXTENSIONS`)
to tailor the smoke test.
For example, set `EXTENSIONS="sample_extension"` to pass `--load-extension sample_extension`
to the CLI before the query runs.

The CLI supports:

- Connecting to a database file or running purely in-memory (`ducklink_cli.wasm :memory:`)
- Executing a single command via `-c "SQL"`
- Preloading componentized extensions via `--load-extension ` (repeat for multiple extensions); this issues a `LOAD ` statement before user SQL runs
- Interactive REPL with `.help`, `.exit`, and `.quit`

Result sets are rendered in a text table that mirrors the native DuckDB shell.

### WIT packages

All WIT interfaces live under `wit/` at the repository root. That directory
vendors the WASI Preview 2 packages at version `0.2.6` (the latest preview
supported by Wasmtime `37.0.2`), along with the DuckDB-specific packages. The
crate-local copies under `crates/*/wit/` are generated from this canonical tree
via `scripts/sync-core-wit.sh` and `scripts/sync-cli-wit.sh`. Always edit the WIT
files in `wit/` first, then re-run the sync scripts to propagate changes before
building.

External extensions can depend on the definitions in `wit/duckdb-extension/`
to stay in sync with the host runtime without having to vendor their own copies
of the extension interfaces.

### Native host runner

The `ducklink-host` crate provides a reusable Wasmtime runner that composes the CLI
and core components along with the componentized extension loader. Build and execute it via:

```bash
cargo run -p ducklink-host --bin ducklink -- -- duckdb-cli :memory: -c "select 42 as answer;"
```

Additional directories can be exposed to the CLI with `--dir /host/path::/guest/path`, and
custom component artifacts can be supplied with `--core-component` / `--cli-component`. The
host automatically preopens the current working directory as `.` so relative database paths
continue to work.

### Extension components

DuckDB’s extension loader is in the process of resolving WebAssembly components from `artifacts/extensions/`. When an extension registers itself with the core component, the name is sanitized to `[A-Za-z0-9_-]` and mapped to `.wasm` inside that directory. As the loader matures, dropping a compiled extension there will allow `LOAD ` to instantiate it through the preview2 runtime rather than the native shared-library path.

This repository ships a minimal sample extension under `extensions/sample-extension-component/` that exercises the component interface. You can build and validate it end-to-end via:

```bash
make smoke-extension
```

The target runs the `ducklink-host` test `load_sample_extension_component`, which:

1. Builds the sample extension (if it is not already present).
2. Copies the resulting component to `artifacts/extensions/sample_extension.wasm`.
3. Instantiates it with Wasmtime using the preview2 bindings and asserts that `load()` returns the expected metadata.

## Testing

Currently the project does not ship a full integration test suite because executing the components requires a preview2 runtime plus a wasm32-wasi build of DuckDB. Manual smoke testing can be done after building:

```bash
wasmtime component run artifacts/duckdb-cli.wasm --dir . -- in_memory_db.duckdb -c "select 42 as answer;"
```

There are also convenience targets:

```bash
make smoke-cli # :memory: query via scripts/smoke-cli.sh
make smoke-cli-disk # same but forces an on-disk temp database
make sample-extension # builds the sample component and copies it to artifacts/extensions/
make smoke-extension # runs Cargo test to build + load the sample extension component
```

To validate the preview2 filesystem adapter against real storage outside of `make`, set `ON_DISK_SMOKE=1` when running `scripts/smoke-cli.sh`; the helper will create a temporary on-disk database, grant Wasmtime access to that directory, and delete it after the query completes.

Continuous smoke coverage runs in CI via `.github/workflows/smoke-tests.yml`, which builds the components and executes both the in-memory and on-disk runs of `scripts/smoke-cli.sh` on every push and pull request.

### Running CI locally with act

Until hosted Actions are available (public repo / billing), the same workflow can
run locally with [nektos/act](https://github.com/nektos/act) in Docker:

```bash
brew install act # one-time (Docker must be running)
make ci-local # runs .github/workflows/smoke-tests.yml
scripts/ci-local.sh -l # list jobs without running
```

`.actrc` maps `ubuntu-latest` to `catthehacker/ubuntu:act-latest` and enables
`--reuse` so caches persist between runs. The wasi-sdk download in the workflow
is architecture-aware (`x86_64`/`arm64`), so it runs natively under act on Apple
silicon as well as on GitHub's x86_64 runners. The first run is slow (it pulls
the runner image, compiles the component tooling, and builds the patched DuckDB
archive); afterwards the cached archive makes runs fast.

## Database interface

Beyond `execute` / `open-stream`, the `database` interface exposes:

- **Prepared statements** — `prepare(conn, sql)` returns a reusable
`prepared-statement` resource; `execute(params)` binds positional parameters
(`$1`, `$2`, ...) and runs it, rebinding from scratch each call.
- **Configuration** — `open-with-config(path, options)` opens a database applying
`(name, value)` options (e.g. `access_mode`, `default_order`, `max_memory`).
- **Arrow** — `query-arrow(conn, sql)` returns the result as an Arrow IPC stream
(`list`), decodable by any Arrow implementation (apache-arrow in JS,
arrow-rs in Rust). Zero-copy is not possible across the component boundary, so
buffers are serialized once into IPC bytes.

## Next steps

- Flesh out remaining CLI scripting parity with the native shell
- Resolve GitHub Actions billing so the smoke-tests workflow can run

## Acknowledgments

This project owes a clear debt to [Simon Willison](https://simonwillison.net/)
and [`sqlite-utils`](https://sqlite-utils.datasette.io/). The extension catalog,
the scaffold → smoke → feedback tooling loop, and much of the CLI ergonomics here
follow the patterns Simon established with `sqlite-utils` and the wider Datasette
ecosystem for making a database pleasant to extend and script from the command
line. Many of the component extensions also mirror utilities first popularized in
that ecosystem. Thank you.