https://github.com/timsehn/doltlite
A fork of SQLite that has Dolt storage and features
https://github.com/timsehn/doltlite
database-version-control dolt sqlite version-controlled-database
Last synced: 19 days ago
JSON representation
A fork of SQLite that has Dolt storage and features
- Host: GitHub
- URL: https://github.com/timsehn/doltlite
- Owner: timsehn
- License: other
- Created: 2026-03-13T22:43:20.000Z (about 2 months ago)
- Default Branch: master
- Last Pushed: 2026-03-30T14:14:16.000Z (about 1 month ago)
- Last Synced: 2026-04-03T01:12:35.449Z (28 days ago)
- Topics: database-version-control, dolt, sqlite, version-controlled-database
- Language: C
- Homepage:
- Size: 267 MB
- Stars: 39
- Watchers: 1
- Forks: 3
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Doltlite
A SQLite fork that replaces the B-tree storage engine with a content-addressed
prolly tree, enabling Git-like version control on a SQL database. Everything
above SQLite's `btree.h` interface (VDBE, query planner, parser) is untouched.
Everything below it -- the pager and on-disk format -- is replaced with a
prolly tree engine backed by a single-file content-addressed chunk store.
## Building
### macOS / Linux
```
cd build
../configure
make
./doltlite :memory:
```
### Windows (MSYS2 / MINGW64)
```
pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-zlib make tcl
mkdir -p build && cd build
../configure
make doltlite.exe
./doltlite.exe :memory:
```
To verify the engine:
```sql
SELECT doltlite_engine();
-- prolly
```
To build stock SQLite instead (for comparison):
```
make DOLTLITE_PROLLY=0 sqlite3
```
## Using as a C Library
Doltlite is designed as a drop-in replacement for SQLite. It uses the same
`sqlite3.h` header and `sqlite3_*` API, so existing C programs work without
code changes — just link against `libdoltlite` instead of `libsqlite3` to get
version control. The build produces `libdoltlite.a` (static) and
`libdoltlite.dylib`/`.so` (shared) with the full prolly tree engine and all
Dolt functions included.
```bash
cd build
../configure
make doltlite-lib # builds libdoltlite.a and libdoltlite.dylib/.so
```
Compile and link your program:
```bash
# Static link (recommended — single binary, no runtime deps)
gcc -o myapp myapp.c -I/path/to/build libdoltlite.a -lpthread -lz
# Dynamic link
gcc -o myapp myapp.c -I/path/to/build -L/path/to/build -ldoltlite -lpthread -lz
```
The API is the standard [SQLite C API](https://sqlite.org/cintro.html) —
`sqlite3_open`, `sqlite3_exec`, `sqlite3_prepare_v2`, etc. Dolt features are
called as SQL functions (`dolt_commit`, `dolt_branch`, `dolt_merge`, ...) and
virtual tables (`dolt_log`, `dolt_diff_`, `dolt_history_`, ...).
### Quickstart Examples
Complete working examples that demonstrate commits, branches, merges,
point-in-time queries, diffs, and tags. Each example does the same thing
in a different language.
**C** ([`examples/quickstart.c`](examples/quickstart.c)) — based on the
[SQLite quickstart](https://sqlite.org/quickstart.html):
```bash
cd build
gcc -o quickstart ../examples/quickstart.c -I. libdoltlite.a -lpthread -lz
./quickstart
```
**Python** ([`examples/quickstart.py`](examples/quickstart.py)) — uses the
standard `sqlite3` module, zero code changes:
```bash
cd build
LD_PRELOAD=./libdoltlite.so python3 ../examples/quickstart.py
```
**Go** ([`examples/go/main.go`](examples/go/main.go)) — uses
[mattn/go-sqlite3](https://github.com/mattn/go-sqlite3) with the `libsqlite3`
build tag:
```bash
cd examples/go
CGO_CFLAGS="-I../../build" CGO_LDFLAGS="../../build/libdoltlite.a -lz -lpthread" \
go build -tags libsqlite3 -o quickstart .
./quickstart
```
## Dolt Features
Version control operations are exposed as SQL functions and virtual tables.
### Staging and Committing
```sql
-- Stage specific tables or all changes
SELECT dolt_add('users');
SELECT dolt_add('-A');
-- Commit staged changes
SELECT dolt_commit('-m', 'Add users table');
-- Stage and commit in one step
SELECT dolt_commit('-A', '-m', 'Initial commit');
-- Shorthand (compound flags, like git commit -am)
SELECT dolt_commit('-am', 'Initial commit');
-- Commit with author
SELECT dolt_commit('-m', 'Fix data', '--author', 'Alice ');
```
### Configuration
```sql
-- Set committer name and email (per-session)
SELECT dolt_config('user.name', 'Tim Sehn');
SELECT dolt_config('user.email', 'tim@dolthub.com');
-- Read current config
SELECT dolt_config('user.name');
-- Tim Sehn
```
All commit-creating operations (`dolt_commit`, `dolt_merge`, `dolt_cherry_pick`,
`dolt_revert`) use these values. The `--author` flag on `dolt_commit` overrides
the session config for a single commit. Config is per-connection and not
persisted — set it at the start of each session.
### Status and History
```sql
-- Working/staged changes
SELECT * FROM dolt_status;
-- table_name | staged | status
-- users | 1 | modified
-- orders | 0 | new table
-- Commit history
SELECT * FROM dolt_log;
-- commit_hash | committer | email | date | message
```
### History (dolt_history_<table>)
Time-travel query showing every version of every row across all commits:
```sql
SELECT * FROM dolt_history_users;
-- rowid_val | value | commit_hash | committer | commit_date
-- How many times was row 42 changed?
SELECT count(*) FROM dolt_history_users WHERE rowid_val = 42;
-- What did the table look like at a specific commit?
SELECT * FROM dolt_history_users WHERE commit_hash = 'abc123...';
```
### Point-in-Time Queries (AS OF)
Read a table as it existed at any commit, branch, or tag.
Returns the real table columns (not generic blobs):
```sql
SELECT * FROM dolt_at_users('abc123...');
-- id | name | email (same columns as the actual table)
SELECT * FROM dolt_at_users('feature');
SELECT * FROM dolt_at_users('v1.0');
-- Compare current vs historical
SELECT count(*) FROM users; -- 100
SELECT count(*) FROM dolt_at_users('v1.0'); -- 42
```
### Diff
Row-level diff between any two commits, or working state vs HEAD:
```sql
SELECT * FROM dolt_diff('users');
SELECT * FROM dolt_diff('users', 'abc123...', 'def456...');
-- diff_type | rowid_val | from_value | to_value
```
### Schema Diff
Compare schemas between any two commits, branches, or tags:
```sql
SELECT * FROM dolt_schema_diff('v1.0', 'v2.0');
-- table_name | from_create_stmt | to_create_stmt | diff_type
-- Shows tables added, dropped, or modified (schema changed)
-- Also detects new indexes and views
```
### Audit Log (dolt_diff_<table>)
Full history of every change to every row, across all commits:
```sql
SELECT * FROM dolt_diff_users;
-- diff_type | rowid_val | from_value | to_value |
-- from_commit | to_commit | from_commit_date | to_commit_date
-- Every INSERT, UPDATE, DELETE that was ever committed is here
SELECT diff_type, rowid_val, to_commit FROM dolt_diff_users
WHERE rowid_val = 42;
```
One `dolt_diff_` virtual table is automatically created for each
user table. The table walks the full commit history and diffs each
consecutive pair of commits.
### Reset
```sql
SELECT dolt_reset('--soft'); -- unstage all, keep working changes
SELECT dolt_reset('--hard'); -- discard all uncommitted changes
```
### Branching (Per-Session)
Each connection tracks its own active branch. Branch state (active branch
name, HEAD commit, staged catalog hash) lives in the `Btree` struct
(per-connection). Each connection gets its own `BtShared` and chunk store.
```sql
-- Create a branch at current HEAD
SELECT dolt_branch('feature');
-- Switch to it (fails if uncommitted changes exist)
SELECT dolt_checkout('feature');
-- See current branch
SELECT active_branch();
-- List all branches
SELECT * FROM dolt_branches;
-- name | hash | is_current
-- Delete a branch
SELECT dolt_branch('-d', 'feature');
```
### Tags
Immutable named pointers to commits:
```sql
SELECT dolt_tag('v1.0'); -- tag HEAD
SELECT dolt_tag('v1.0', 'abc123...'); -- tag specific commit
SELECT dolt_tag('-d', 'v1.0'); -- delete tag
SELECT * FROM dolt_tags; -- list tags
```
### Merge
Three-way merge of another branch into the current branch. Merges at the
**row level** — non-conflicting changes to different rows of the same table
are auto-merged. Conflicts (same row modified on both branches) are detected
and stored for resolution.
```sql
SELECT dolt_merge('feature');
-- Returns commit hash (clean merge), or "Merge completed with N conflict(s)"
```
### Conflicts
View and resolve merge conflicts:
```sql
-- View which tables have conflicts (summary)
SELECT * FROM dolt_conflicts;
-- table_name | num_conflicts
-- users | 2
-- View individual conflict rows for a table
SELECT * FROM dolt_conflicts_users;
-- base_rowid | base_value | our_rowid | our_value | their_rowid | their_value
-- Resolve individual conflicts by deleting them (keeps current working value)
DELETE FROM dolt_conflicts_users WHERE base_rowid = 5;
-- Or resolve all conflicts for a table at once
SELECT dolt_conflicts_resolve('--ours', 'users'); -- keep our values
SELECT dolt_conflicts_resolve('--theirs', 'users'); -- take their values
-- Commit is blocked while conflicts exist
SELECT dolt_commit('-A', '-m', 'msg');
-- Error: "cannot commit: unresolved merge conflicts"
```
### Cherry-Pick
Apply the changes from a specific commit onto the current branch:
```sql
SELECT dolt_cherry_pick('abc123...');
-- Returns new commit hash, or "Cherry-pick completed with N conflict(s)"
```
Cherry-pick works by computing the diff between the target commit and its
parent, then applying that diff to the current HEAD as a three-way merge.
Conflicts are handled the same way as `dolt_merge`.
### Revert
Create a new commit that undoes the changes from a specific commit:
```sql
SELECT dolt_revert('abc123...');
-- Returns new commit hash, or "Revert completed with N conflict(s)"
```
Revert computes the inverse of the target commit's changes and applies
them to the current HEAD. The new commit message is
`Revert ''`. Cannot revert the initial commit.
### Garbage Collection
Remove unreachable chunks from the store to reclaim space:
```sql
SELECT dolt_gc();
-- "12 chunks removed, 45 chunks kept"
```
Stop-the-world mark-and-sweep: walks all branches, tags, commit
history, catalogs, and prolly tree nodes to find reachable chunks,
then rewrites the file with only live data. Safe and idempotent.
### Merge Base
Find the common ancestor of two commits:
```sql
SELECT dolt_merge_base('abc123...', 'def456...');
```
### Remotes
Doltlite supports Git-like remotes for pushing, fetching, pulling, and cloning
between databases.
#### Filesystem Remotes
```sql
-- Add a remote
SELECT dolt_remote('add', 'origin', 'file:///path/to/remote.doltlite');
-- Push a branch
SELECT dolt_push('origin', 'main');
-- Clone a remote database
SELECT dolt_clone('file:///path/to/source.doltlite');
-- Fetch updates
SELECT dolt_fetch('origin', 'main');
-- Pull (fetch + fast-forward)
SELECT dolt_pull('origin', 'main');
-- List remotes
SELECT * FROM dolt_remotes;
```
#### HTTP Remotes
```sql
-- Add an HTTP remote (URL includes database name)
SELECT dolt_remote('add', 'origin', 'http://myserver:8080/mydb.db');
-- All operations work identically to file:// remotes
SELECT dolt_push('origin', 'main');
SELECT dolt_clone('http://myserver:8080/mydb.db');
SELECT dolt_fetch('origin', 'main');
SELECT dolt_pull('origin', 'main');
```
#### Remote Server (`doltlite-remotesrv`)
Doltlite includes a standalone HTTP server for serving databases over the
network. Build it alongside doltlite:
```
cd build
make doltlite-remotesrv
```
Start serving a directory of databases:
```
./doltlite-remotesrv -p 8080 /path/to/databases/
```
Every `.db` file in that directory becomes accessible at
`http://host:8080/filename.db`. The server supports push, fetch, pull, and
clone — multiple clients can collaborate on the same databases.
The server is also embeddable as a library (`doltliteServeAsync` in
`doltlite_remotesrv.h`) for applications that want to host remotes in-process.
#### How It Works
Content-addressed chunk transfer — only sends chunks the remote doesn't already
have. BFS traversal of the DAG with batch `HasMany` pruning.
## Using Existing SQLite Databases
Doltlite can ATTACH standard SQLite databases alongside its own prolly-tree
storage. This lets you keep versioned tables in doltlite and high-write
operational tables in standard SQLite, queried through a single connection.
Doltlite detects the file format automatically from the header — no
configuration needed. Standard SQLite files route to SQLite's original B-tree
engine; everything else uses the prolly tree.
### Basic ATTACH
```sql
-- Attach a standard SQLite database
ATTACH DATABASE '/path/to/events.sqlite' AS ops;
-- Query it (prefix table names with the alias)
SELECT * FROM ops.events WHERE type='click';
-- Main db tables need no prefix
SELECT * FROM threads;
-- Detach when done
DETACH DATABASE ops;
```
### Cross-Database JOINs
```sql
-- Join doltlite (versioned) tables with SQLite (attached) tables
SELECT t.title, e.type
FROM threads t
JOIN ops.events e ON t.id = e.thread_id;
```
### Migrating Data Between Formats
```sql
-- Copy from SQLite into doltlite (now versioned)
INSERT INTO threads SELECT * FROM ops.threads;
-- Copy from doltlite into SQLite (for export)
INSERT INTO ops.archive SELECT * FROM threads WHERE archived=1;
-- One-step copy with CREATE TABLE...AS
CREATE TABLE local_events AS SELECT * FROM ops.events;
```
### Hybrid Storage Pattern
Use doltlite for tables that benefit from version control, and standard SQLite
for high-throughput tables that don't need history:
```sql
-- Main DB: doltlite (versioned)
CREATE TABLE config(key TEXT PRIMARY KEY, val TEXT);
SELECT dolt_commit('-am', 'Add config table');
-- Attached: standard SQLite (high-write, no versioning overhead)
ATTACH DATABASE 'telemetry.sqlite' AS tel;
CREATE TABLE tel.events(seq INTEGER PRIMARY KEY, kind TEXT, payload TEXT);
-- Hot write path goes to standard SQLite
INSERT INTO tel.events VALUES(1, 'pageview', '{"url":"/home"}');
-- Analytics spans both databases
SELECT c.val, count(e.seq)
FROM config c
JOIN tel.events e ON e.kind = c.key
GROUP BY c.key;
-- Version control only applies to main db
SELECT * FROM dolt_diff('config');
```
## Per-Session Branching Architecture
Each connection gets its own `Btree` and `BtShared` (not shared across
connections). Doltlite stores the session's branch name, HEAD commit hash,
and staged catalog hash in the `Btree` struct.
- Each connection can be on a different branch. Cross-branch concurrent
access is safe — each branch's working catalog is stored independently
in a per-branch working state chunk, so one branch's autocommit never
corrupts another branch's reads.
- `dolt_checkout` reloads the table registry from the target branch's catalog.
- Write transactions (DML) are serialized via an exclusive file-level lock,
matching SQLite's standard behavior. Under that lock, the connection
refreshes from disk before writing. Multiple connections can read
concurrently; writes from one connection are immediately visible to
readers on the same branch.
- All commit graph mutations (`dolt_commit`, `dolt_merge`, `dolt_reset`,
`dolt_branch`, `dolt_tag`, push, pull) are also serialized via the
file-level lock, preventing silent data loss from concurrent commits.
## Performance
### Sysbench OLTP Benchmarks: Doltlite vs SQLite
Doltlite is a drop-in replacement for SQLite, so the natural question is: what
does version control cost?
Every PR runs a [sysbench-style benchmark](test/sysbench_compare.sh) comparing
doltlite against stock SQLite on 23 OLTP workloads. Results are posted as a PR
comment.
#### Reads
| Test | SQLite (ms) | Doltlite (ms) | Multiplier |
|------|-------------|---------------|------------|
| oltp_point_select | 145 | 89 | 0.61 |
| oltp_range_select | 38 | 36 | 0.95 |
| oltp_sum_range | 21 | 18 | 0.86 |
| oltp_order_range | 8 | 8 | 1.00 |
| oltp_distinct_range | 9 | 7 | 0.78 |
| oltp_index_scan | 15 | 11 | 0.73 |
| select_random_points | 39 | 48 | 1.23 |
| select_random_ranges | 17 | 10 | 0.59 |
| covering_index_scan | 20 | 28 | 1.40 |
| groupby_scan | 54 | 54 | 1.00 |
| index_join | 9 | 11 | 1.22 |
| index_join_scan | 3 | 6 | 2.00 |
| types_table_scan | 13 | 13 | 1.00 |
| table_scan | 1 | 2 | 2.00 |
| oltp_read_only | 340 | 259 | 0.76 |
#### Writes
| Test | SQLite (ms) | Doltlite (ms) | Multiplier |
|------|-------------|---------------|------------|
| oltp_bulk_insert | 32 | 39 | 1.22 |
| oltp_insert | 21 | 35 | 1.67 |
| oltp_update_index | 48 | 128 | 2.67 |
| oltp_update_non_index | 37 | 52 | 1.41 |
| oltp_delete_insert | 44 | 76 | 1.73 |
| oltp_write_only | 21 | 35 | 1.67 |
| types_delete_insert | 28 | 32 | 1.14 |
| oltp_read_write | 128 | 488 | 3.81 |
_10K rows, file-backed, Linux x64 (GitHub Actions). Run `test/sysbench_compare.sh` to reproduce._
### Algorithmic Complexity
All numbers below have automated assertions in CI (`test/doltlite_perf.sh` and `test/doltlite_structural.sh`).
- **O(log n) Point Operations** -- SELECT, UPDATE, and DELETE by primary key are O(log n), essentially constant time from 1K to 1M rows. Tested and asserted at 1K, 100K, and 1M rows.
- **O(n log n) Bulk Insert** -- Bulk INSERT inside BEGIN/COMMIT scales as O(n log n). 1M rows inserts in ~2 seconds. CTE-based inserts also scale linearly (5M rows in 11s).
- **O(changes) Diff** -- `dolt_diff` between two commits is proportional to the number of changed rows, not the table size. A single-row diff on a 1M-row table takes the same time as on a 1K-row table (~30ms).
- **Structural Sharing** -- The prolly tree provides structural sharing between versions. Changing 1 row in a 10K-row table adds only 1.9% to the file size (5.2KB on 273KB). Branch creation with 1 new row adds ~10% overhead.
- **Garbage Collection** -- `dolt_gc()` reclaims orphaned chunks. Deleting a branch with 1000 unique rows and running GC reclaims 53% of file size. GC is idempotent and preserves all reachable data.
## Running Tests
### SQLite Tcl Test Suite
87,000+ SQLite test cases pass with 0 correctness failures.
```bash
# Install Tcl (macOS)
brew install tcl-tk
# Configure with Tcl support
cd build
../configure --with-tcl=$(brew --prefix tcl-tk)/lib
# Build testfixture
make testfixture OPTS="-L$(brew --prefix)/lib"
# Run a single test file
./testfixture ../test/select1.test
# Run with timeout
perl -e 'alarm(60); exec @ARGV' ./testfixture ../test/select1.test
# Count passes
./testfixture ../test/func.test 2>&1 | grep -c "Ok$"
```
Stock SQLite testfixture for comparison:
```
make testfixture DOLTLITE_PROLLY=0 USE_AMALGAMATION=1
```
### Doltlite Shell Tests
31 test suites covering all features:
```bash
# Run all suites
cd build
bash ../test/run_doltlite_tests.sh
# Run individual suites
bash ../test/doltlite_parity.sh # SQLite compatibility (110 tests)
bash ../test/doltlite_commit.sh # Commits and log
bash ../test/doltlite_staging.sh # Add, status, staging
bash ../test/doltlite_branch.sh # Branching and checkout
bash ../test/doltlite_merge.sh # Three-way merge
bash ../test/doltlite_attach_sqlite.sh # ATTACH standard SQLite databases
```
### SQL Logic Test Suite
Doltlite passes 100% of the
[sqllogictest](https://www.sqlite.org/sqllogictest/) suite — the same
5.7 million-statement correctness corpus that SQLite itself uses. Every PR
runs the full suite in CI, comparing Doltlite's results against stock SQLite
as a reference. Zero failures, zero errors.
The test works by building the official
[sqllogictest C runner](https://www.sqlite.org/sqllogictest/) twice — once
linked against stock SQLite, once against Doltlite's amalgamation — and
running every `.test` file through both in `--verify` mode. Any result
divergence from stock SQLite is a failure.
```bash
# Build both runners and run the full suite (requires Fossil)
fossil clone https://www.sqlite.org/sqllogictest/ /tmp/sqllogictest.fossil
mkdir -p /tmp/sqllogictest && cd /tmp/sqllogictest && fossil open /tmp/sqllogictest.fossil
# Build stock runner (reference)
cd src
gcc -O2 -DSQLITE_NO_SYNC=1 -DSQLITE_THREADSAFE=0 \
-DSQLITE_OMIT_LOAD_EXTENSION -c md5.c sqlite3.c
gcc -O2 -o sqllogictest-stock sqllogictest.c md5.o sqlite3.o -lpthread -lm
# Build doltlite runner (replace amalgamation)
cp /path/to/doltlite/build/sqlite3.c sqlite3.c
cp /path/to/doltlite/build/sqlite3.h sqlite3.h
gcc -O2 -DSQLITE_NO_SYNC=1 -DSQLITE_THREADSAFE=0 \
-DSQLITE_OMIT_LOAD_EXTENSION -c sqlite3.c
gcc -O2 -o sqllogictest-doltlite sqllogictest.c md5.o sqlite3.o -lpthread -lm -lz
# Run the suite
bash test/run_sqllogictest.sh \
sqllogictest-doltlite sqllogictest-stock /tmp/sqllogictest/test
```
### Concurrent Branch Tests
C tests that verify cross-branch isolation — two connections on different
branches both write and read without corrupting each other:
```bash
cd build
gcc -o cross_branch_test ../test/cross_branch_test.c \
-I. -I../src libdoltlite.a -lz -lpthread
./cross_branch_test
```
## Architecture
### Prolly Tree Engine
| File | Purpose |
|------|---------|
| `prolly_hash.c/h` | xxHash32 content addressing |
| `prolly_node.c/h` | Binary node format (serialization, field access) |
| `prolly_cache.c/h` | LRU node cache |
| `prolly_cursor.c/h` | Tree cursor (seek, next, prev) |
| `prolly_mutmap.c/h` | Skip list write buffer for pending edits |
| `prolly_chunker.c/h` | Rolling hash tree builder |
| `prolly_mutate.c/h` | Merge-flush edits into tree |
| `prolly_diff.c/h` | Tree-level diff (drives `dolt_diff`) |
| `prolly_arena.c/h` | Arena allocator for tree operations |
| `prolly_btree.c` | `btree.h` API implementation (main integration point) |
| `sortkey.c/h` | Sort key encoding for memcmp-sortable index keys |
| `chunk_store.c` | Single-file content-addressed chunk storage |
| `pager_shim.c` | Pager facade (satisfies pager API without page-based I/O) |
| `btree_orig_*.c` | Original SQLite btree compiled with renamed symbols (for ATTACH) |
| `btree_orig_api.c/h` | Bridge API between prolly dispatch and original btree |
### Doltlite Feature Files
| File | Purpose |
|------|---------|
| `doltlite.c` | `dolt_add`, `dolt_commit`, `dolt_reset`, `dolt_merge`, registration |
| `doltlite_status.c` | `dolt_status` virtual table |
| `doltlite_log.c` | `dolt_log` virtual table |
| `doltlite_diff.c` | `dolt_diff` table-valued function |
| `doltlite_branch.c` | `dolt_branch`, `dolt_checkout`, `active_branch`, `dolt_branches` |
| `doltlite_tag.c` | `dolt_tag`, `dolt_tags` |
| `doltlite_merge.c` | Three-way catalog and row-level merge |
| `doltlite_conflicts.c` | `dolt_conflicts`, `dolt_conflicts_resolve` |
| `doltlite_ancestor.c` | Common ancestor search, `dolt_merge_base` |
| `doltlite_commit.h` | Commit object serialization/deserialization |
| `doltlite_ancestor.h` | Ancestor-finding API |
| `doltlite_history.c` | `dolt_history_` virtual table |
| `doltlite_at.c` | `dolt_at_` point-in-time query |
| `doltlite_schema_diff.c` | `dolt_schema_diff` virtual table |
| `doltlite_gc.c` | `dolt_gc` garbage collection |
| `doltlite_remote.c` | Remote management (`dolt_remote`, `dolt_push`, `dolt_fetch`, `dolt_clone`) |
| `doltlite_http_remote.c` | HTTP remote client (BSD sockets) |
| `doltlite_remotesrv.c` | Standalone HTTP server for remotes |
## Dolt vs Doltlite: Storage Engine Comparison
Doltlite implements the same prolly tree architecture as
[Dolt](https://github.com/dolthub/dolt), but adapted for SQLite's constraints
and C implementation. The core idea is identical — content-addressed immutable
nodes with rolling-hash-determined boundaries — but the details differ
significantly.
### Prolly Tree
Both use prolly trees (probabilistic B-trees) where node boundaries are
determined by a rolling hash over key bytes rather than fixed fan-out. This gives
content-defined chunking: identical subtrees produce identical hashes regardless
of where they appear, enabling structural sharing between versions.
| | Dolt | Doltlite |
|--|------|----------|
| **Language** | Go | C (inside SQLite) |
| **Node format** | FlatBuffers | Custom binary (header + offset arrays + data regions) |
| **Hash function** | xxhash, 20 bytes | xxHash32 with 5 seeds packed into 20 bytes |
| **Chunk target** | ~4KB | 4KB (512B min, 16KB max) |
| **Boundary detection** | Rolling hash, `(hash & pattern) == pattern` | Same algorithm |
### Key Encoding
**Dolt** uses a purpose-built tuple encoding: fields are serialized as contiguous
bytes with a trailing offset array and field count. Keys sort lexicographically,
so comparison is a single `memcmp`.
**Doltlite** uses sort key materialization for index (BLOBKEY) entries. Each
SQLite record is converted to a memcmp-sortable byte string at insert time:
integers and floats are encoded as IEEE 754 doubles with sign normalization,
text and blobs use NUL-byte escaping with double-NUL terminators. The sort key
is stored as the prolly tree key; the original SQLite record is stored as the
value (for reads). This enables `memcmp` comparison in the tree at the cost of
~2x index entry size. For INTKEY tables (rowid tables), keys are 8-byte
little-endian integers — comparison is trivial.
### Tree Mutation
**Dolt** uses a chunker with `advanceTo` boundary synchronization. Two cursors
track the old tree and new tree simultaneously. When the chunker fires a boundary
that aligns with an old tree node boundary, it skips the entire unchanged
subtree. This handles splits, merges, and boundary drift naturally within a
single bottom-up pass.
**Doltlite** uses a cursor-path-stack approach. For each edit, it seeks from root
to leaf, clones the leaf into a node builder, applies edits, serializes the new
leaf (with rolling-hash re-chunking for overflow/underflow), and rewrites
ancestors by walking up the path stack. Unchanged subtrees are never loaded. A
hybrid strategy falls back to a full O(N+M) merge-walk when the edit count is
large relative to tree size.
Both achieve O(M log N) for sparse edits. Dolt's approach is more elegant for
boundary maintenance; doltlite's is simpler to implement in C and integrates
naturally with SQLite's cursor-based API.
### Chunk Store
**Dolt** uses the Noms Block Store (NBS) format with multiple table files
organized into generations (oldgen/newgen). Writers append new table files;
readers see consistent snapshots. This enables MVCC-like concurrency with
optimistic locking at the manifest level.
**Doltlite** uses a single file with three regions: a 168-byte manifest header
at offset 0, a compacted chunk data region with sorted index (written by GC),
and a WAL region at the end of the file (append-only journal of new chunks).
Normal commits append to the WAL region at EOF. GC rewrites the entire file
with all chunks compacted (empty WAL region). Concurrency uses file-level
locking for serialization.
### Commits and Metadata
**Dolt** stores commits as FlatBuffer-serialized objects forming a DAG (directed
acyclic graph) with multiple parents for merge commits. Commits include a parent
closure for O(1) ancestor queries and a height field for efficient traversal.
**Doltlite** stores commits as custom binary objects forming a DAG with
multi-parent support (merge commits record both parents). Each branch has an
associated WorkingSet chunk that stores staged catalog and merge state
independently, plus a per-branch working catalog tracked in a separate working
state chunk (referenced by the manifest). This allows connections on different
branches to each find their own catalog on refresh without reading a stale
catalog from another branch. The catalog hash is purely data-derived (no
runtime metadata), enabling O(1) dirty checks via hash comparison. Branches
and tags are stored in a serialized refs chunk referenced by the manifest.
### Garbage Collection
Both use mark-and-sweep: walk all reachable chunks from branches, tags, and
commit history, then remove everything else. Dolt rewrites live data into new
table files and deletes old ones. Doltlite compacts in-place by rewriting the
single database file with only live chunks.