https://github.com/marcelocantos/sqlpipe
Streaming replication protocol for SQLite
https://github.com/marcelocantos/sqlpipe
cpp database replication sqlite streaming
Last synced: about 2 months ago
JSON representation
Streaming replication protocol for SQLite
- Host: GitHub
- URL: https://github.com/marcelocantos/sqlpipe
- Owner: marcelocantos
- License: apache-2.0
- Created: 2026-02-22T04:47:14.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2026-03-29T13:51:19.000Z (3 months ago)
- Last Synced: 2026-03-29T16:34:07.927Z (3 months ago)
- Topics: cpp, database, replication, sqlite, streaming
- Language: C
- Size: 3.07 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sqlpipe
Unified SQLite library: replication, schema migration, and query transpilation.
sqlpipe is a C++ library that combines three capabilities in a single
two-file distribution (`dist/sqlpipe.h` / `dist/sqlpipe.cpp`):
- **sqlpipe core** — keeps SQLite databases in sync over any transport layer.
A **Master** tracks changes and produces compact binary changesets; a
**Replica** applies them, emitting per-row change events and query
subscription updates. A **Peer** wraps both behind a symmetric API for
bidirectional replication with table-level ownership. A **Relay** enables
chain replication (source → relay → sink).
- **sqlift** — declarative schema migration via structural diffing. The
`Database` constructor auto-migrates your schema on open; the
`generate_migration()` free function produces migration SQL from any two
DDL strings.
- **sqldeep** — JSON5-like SQL transpiler. All SQL passed through `Database`
methods is transpiled automatically. Extended syntax like
`SELECT {id, name}` (JSON object construction) works out of the box.
The `Database` class is the primary entry point for most users. `Master`,
`Replica`, and `Peer` are the lower-level building blocks for custom
replication topologies.
The library is transport-agnostic: it defines a message-in / message-out API.
You decide how messages travel between peers (TCP, WebSocket, QUIC, serial,
shared memory, datagrams, etc.). The convergence loop makes every message
regenerable, so the protocol works over pure datagrams — any message can be
lost and recovered by the next convergence round.
## Features
- **Unified Database class** — opens SQLite, auto-migrates the schema via
sqlift, provides `exec`/`query`/`subscribe` with automatic change
notification. The primary API for most use cases.
- **RAII Subscription** — `Database::subscribe` returns a `Subscription`
object that auto-unsubscribes on destruction.
- **sqldeep syntax everywhere** — all SQL in `Database` methods is transpiled
automatically. Write `SELECT {id, name} FROM items` and get back a JSON
object per row with no extra wiring.
- **Bundled distribution** — `dist/sqlpipe.h` + `dist/sqlpipe.cpp` include
sqlpipe core, sqlift, and sqldeep. No separate dependency installation
needed beyond SQLite itself.
- **Schema migration via sqlift** — `Database` constructor auto-migrates on
open; `Database::migration(from_ddl, to_ddl)` static method; and the
`generate_migration(old_ddl, new_ddl)` free function for offline use.
- **Convergence loop** — `Replica::converge()` provides stateless,
loss-tolerant sync. The replica computes bucket hashes and sends them
directly; the master responds with the delta. Works entirely over
datagrams. No handshake required. Call it periodically or on reconnect.
- **Bidirectional replication** via the Peer API — each side owns a disjoint
set of tables (glob patterns supported, e.g. `"*"` owns all tables), with
server-authoritative ownership negotiation
- **Chain replication** via the Relay class — source → relay → sink
topologies for fan-out or geographic distribution
- **Incremental replication** via SQLite's session extension (compact binary
changesets)
- **Efficient diff sync** on reconnect — bucketed row hashes identify what
differs, then only the delta is transferred
- **Changeset queue** — master retains recent changesets
(`changeset_queue_size`, default 64) for fast reconnect replay without
full diff sync
- **Predicate-aware query subscriptions** — register SQL queries on the
replica; receive updated result sets only when incoming changes match
extracted predicates. Queries are parsed via liteparser into relational
algebra; predicates are propagated through equijoins and evaluated by a
bytecode VM. Supports equality, inequality, range, IS NULL, IN, NOT IN,
BETWEEN, and OR-of-equalities.
- **Prediction API** — `begin_prediction`/`commit_prediction`/`rollback_prediction`
for optimistic local updates with automatic rollback on server response
- **Auto-flush** — `MasterConfig::on_flush` callback fires on commit, so
callers never need to call `flush()` explicitly
- **Per-row change events** (insert/update/delete) on the receiving side
- **Conflict callbacks** for custom resolution logic
- **LZ4 changeset compression** — automatic, with uncompressed fallback
- **Schema fingerprinting** via structural hashing (sqlift) to detect mismatches
- **Single header + source** (`sqlpipe.h` / `sqlpipe.cpp`) for easy integration
- **Formally verified** — the convergence protocol is modelled in
[TLA+](formal/Convergence.tla) and checked with TLC
## Language bindings
| Language | Location | Install |
|---|---|---|
| C++ | `dist/sqlpipe.h` + `dist/sqlpipe.cpp` | Copy two files (includes sqlpipe + sqlift + sqldeep) |
| Go | `go/sqlpipe/` | `go get github.com/marcelocantos/sqlpipe/go/sqlpipe` |
| Swift | `swift/` | SPM package with `CSqlpipe` and `Sqlpipe` targets |
| TypeScript/Wasm | `web/` | `npm install` (builds SQLite + sqlpipe to Wasm) |
## Requirements
- C++23 compiler
- SQLite 3 compiled with `-DSQLITE_ENABLE_SESSION
-DSQLITE_ENABLE_PREUPDATE_HOOK`
All tables must have explicit `PRIMARY KEY`s (required by SQLite's session
extension). `WITHOUT ROWID` tables are not supported.
## Quick start
### Database API (recommended)
```cpp
#include
using namespace sqlpipe;
// Open database with schema (auto-creates/migrates via sqlift).
Database db(":memory:", "CREATE TABLE items (id INTEGER PRIMARY KEY, name TEXT, qty INTEGER)");
// Subscribe — fires immediately with initial result, then on every change.
// Subscription auto-unsubscribes when it goes out of scope (RAII).
auto sub = db.subscribe("SELECT {id, name, qty} FROM items ORDER BY id",
[](const QueryResult& r) {
for (auto& row : r.rows)
std::cout << std::get(row[0]) << "\n";
});
// Insert — subscription fires automatically.
db.exec("INSERT INTO items VALUES (1, 'Widget', 10)");
// sqldeep syntax works everywhere.
auto r = db.query("SELECT {id, name} FROM items WHERE qty > 5");
```
### Replication with Database
`Database` integrates cleanly with `Master` and `Replica` by exposing the
underlying `sqlite3*` handle. After applying a changeset, call
`replica_db.notify()` to fire subscriptions.
```cpp
Database master_db(":memory:", schema);
Database replica_db(":memory:", schema);
Master master(master_db.handle());
Replica replica(replica_db.handle());
sync_handshake(master, replica);
// Subscribe on replica side.
auto sub = replica_db.subscribe("SELECT count(*) FROM items",
[](const QueryResult& r) { /* ... */ });
// Insert on master, flush, apply on replica.
master_db.exec("INSERT INTO items VALUES (1, 'hello')");
for (auto& msg : master.flush())
replica.handle_message(msg);
replica_db.notify(); // fires subscriptions registered on replica_db
```
### Lower-level API (Master/Replica directly)
```cpp
#include
using namespace sqlpipe;
// Open two databases with matching schemas.
sqlite3 *master_db, *replica_db;
sqlite3_open(":memory:", &master_db);
sqlite3_open(":memory:", &replica_db);
sqlite3_exec(master_db,
"CREATE TABLE t (id INTEGER PRIMARY KEY, v TEXT)", 0, 0, 0);
sqlite3_exec(replica_db,
"CREATE TABLE t (id INTEGER PRIMARY KEY, v TEXT)", 0, 0, 0);
// Create master and replica.
Master master(master_db);
Replica replica(replica_db);
// Handshake (exchange messages until replica reaches Live state).
sync_handshake(master, replica); // convenience for in-process use
// Make changes on the master, then flush.
sqlite3_exec(master_db, "INSERT INTO t VALUES (1, 'hello')", 0, 0, 0);
auto msgs = master.flush(); // returns vector
for (auto& msg : msgs) {
auto result = replica.handle_message(msg);
// result.messages — vector to send back
// result.changes — per-row ChangeEvents
// result.subscriptions — updated query results
}
// replica_db now has the row.
```
See [`examples/loopback.cpp`](examples/loopback.cpp) for a complete working
example including handshake and change event handling.
## Building
```sh
git clone --recurse-submodules https://github.com/marcelocantos/sqlpipe.git
cd sqlpipe
mk test # build and run tests (146 test cases)
mk example # build and run the loopback demo
mk wasm # build Wasm module (requires emscripten)
```
If you use an agentic coding tool (Claude Code, Cursor, Copilot, etc.), include
[`dist/sqlpipe-agents-guide.md`](dist/sqlpipe-agents-guide.md) in your project
context for a condensed API reference.
## Protocol overview
Two sync paths are available. The convergence loop is preferred for most use
cases; the legacy handshake is available for environments that require
ordered reliable delivery.
**Convergence loop** (preferred — stateless, works over datagrams):
```mermaid
sequenceDiagram
participant R as Replica
participant M as Master
Note over R: converge() — no prior hello needed
R->>M: BucketHashesMsg
Note over M: compare bucket hashes
M->>R: NeedBucketsMsg (skipped if all match)
R->>M: RowHashesMsg
Note over M: compute diff
M->>R: DiffReadyMsg (patchset + deletes)
R->>M: AckMsg
rect rgb(240, 248, 255)
Note over R,M: Live streaming
M->>R: ChangesetMsg (master.flush())
R->>M: AckMsg
end
```
Every message in the convergence loop is regenerable. If any message is
lost, call `converge()` again — the loop is idempotent. The master
processes `BucketHashesMsg` directly without requiring a prior `HelloMsg`.
**Legacy handshake** (ordered reliable channel):
```mermaid
sequenceDiagram
participant R as Replica
participant M as Master
R->>M: HelloMsg
M->>R: HelloMsg (or ErrorMsg on schema mismatch)
R->>M: BucketHashesMsg
M->>R: NeedBucketsMsg
R->>M: RowHashesMsg
M->>R: DiffReadyMsg
R->>M: AckMsg
rect rgb(240, 248, 255)
Note over R,M: Live streaming
end
```
## API
### Database
The `Database` class is the recommended starting point. It wraps a `sqlite3*`
handle, auto-migrates on open, and wires up subscriptions. All SQL is
transpiled through sqldeep before execution.
```cpp
class Database {
public:
// Open (or create) a database at path, auto-migrating to schema_ddl.
Database(const std::string& path, const std::string& schema_ddl);
// Execute a statement (sqldeep transpiled). Fires notify() on success.
void exec(const std::string& sql);
// Run a query (sqldeep transpiled). Returns the full result set.
QueryResult query(const std::string& sql);
// Register a callback query (sqldeep transpiled). The callback fires
// immediately with the current result, then again after each exec() or
// notify() call that touches a table the query reads from.
// Returns a Subscription that unsubscribes on destruction.
Subscription subscribe(const std::string& sql,
std::function callback);
// Manually trigger subscription re-evaluation (all tables).
void notify();
// Trigger subscription re-evaluation for a specific set of tables.
void notify(const std::set& tables);
// Access the raw sqlite3* handle (e.g. to pass to Master/Replica/Peer).
sqlite3* handle() const;
// Generate migration SQL from one DDL to another (uses sqlift).
static std::string migration(const std::string& from_ddl,
const std::string& to_ddl);
};
// Free function: generate migration SQL between two DDL strings.
std::string generate_migration(const std::string& old_ddl,
const std::string& new_ddl);
```
`Subscription` is a RAII handle: when it is destroyed, the underlying
subscription is automatically removed.
### Master
```cpp
struct MasterConfig {
std::optional> table_filter; // nullopt = all tables
std::int64_t bucket_size = 1024;
ProgressCallback on_progress = nullptr;
SchemaMismatchCallback on_schema_mismatch = nullptr;
FlushCallback on_flush = nullptr; // auto-flush on commit (takes std::vector)
std::size_t changeset_queue_size = 64; // 0 = disable queue replay
LogCallback on_log = nullptr;
};
class Master {
public:
explicit Master(sqlite3* db, MasterConfig config = {});
void exec(const std::string& sql); // auto-flushes if on_flush set
std::vector flush(); // manual flush
std::vector handle_message(const Message& msg);
Seq current_seq() const;
SchemaVersion schema_version() const;
};
```
### Replica
```cpp
struct ReplicaConfig {
ConflictCallback on_conflict = nullptr;
std::optional> table_filter;
std::int64_t bucket_size = 1024;
ProgressCallback on_progress = nullptr;
SchemaMismatchCallback on_schema_mismatch = nullptr;
LogCallback on_log = nullptr;
};
struct HandleResult {
std::vector messages; // protocol responses
std::vector changes; // row-level changes applied
std::vector subscriptions; // invalidated query results
};
class Replica {
public:
explicit Replica(sqlite3* db, ReplicaConfig config = {});
Message hello() const;
std::vector converge(); // stateless sync
HandleResult handle_message(const Message& msg);
HandleResult handle_messages(std::span msgs); // batched
SubscriptionId subscribe(const std::string& sql);
void unsubscribe(SubscriptionId id);
void begin_prediction(); // optimistic local update
void commit_prediction(); // finalise prediction
void rollback_prediction(); // cancel prediction
void reset(); // back to Init; preserves subscriptions
Seq current_seq() const;
SchemaVersion schema_version() const;
State state() const; // Init, Handshake, DiffBuckets, DiffRows, Live, Error
};
```
### Peer (bidirectional)
```cpp
enum class PeerRole : std::uint8_t { Client, Server };
struct PeerConfig {
PeerRole role = PeerRole::Client;
std::set owned_tables; // glob patterns; e.g. "*" owns all tables
std::optional> table_filter;
ApproveOwnershipCallback approve_ownership = nullptr; // server only
ConflictCallback on_conflict = nullptr;
ProgressCallback on_progress = nullptr;
SchemaMismatchCallback on_schema_mismatch = nullptr;
LogCallback on_log = nullptr;
};
struct PeerHandleResult {
std::vector messages;
std::vector changes;
std::vector subscriptions;
};
class Peer {
public:
explicit Peer(sqlite3* db, PeerConfig config = {});
std::vector start(); // client initiates
std::vector flush();
PeerHandleResult handle_message(const PeerMessage& msg);
SubscriptionId subscribe(const std::string& sql);
void unsubscribe(SubscriptionId id);
void reset();
State state() const; // Init, Negotiating, Diffing, Live, Error
const std::set& owned_tables() const;
const std::set& remote_tables() const;
};
```
### Relay (chain replication)
```cpp
class Relay {
public:
explicit Relay(sqlite3* db, RelayConfig config = {});
std::size_t add_sink(SinkCallback cb); // register downstream (takes const Message&)
void remove_sink(std::size_t id);
Message hello(); // send to upstream
std::vector handle_upstream(const Message& msg);
std::vector handle_downstream(const Message& msg);
SubscriptionId subscribe(const std::string& sql);
void unsubscribe(SubscriptionId id);
void reset();
};
```
### Query subscriptions
Register SQL queries on the replica to receive updated results when incoming
changes match the query's predicates:
```cpp
auto id = replica.subscribe("SELECT id, val FROM t1 WHERE val > 10 ORDER BY id");
// After applying a changeset:
auto result = replica.handle_message(changeset_msg);
for (const auto& sub : result.subscriptions) {
// sub.id — which subscription fired
// sub.columns — column names
// sub.rows — the full updated result set
}
replica.unsubscribe(id);
```
Predicates are extracted from WHERE clauses and propagated through equijoins.
A bytecode VM evaluates predicates against changeset rows, so subscriptions
whose predicates don't match are skipped entirely — no SQL re-evaluation
needed.
### Prediction API
Optimistic local updates with automatic rollback:
```cpp
replica.begin_prediction();
// Write optimistically to the local database.
sqlite3_exec(replica_db, "INSERT INTO items VALUES (99, 'pending')", 0, 0, 0);
// Subscriptions now reflect the predicted state.
replica.commit_prediction();
// Send the corresponding action to the server.
// When the server's changeset arrives via handle_message(), the prediction
// savepoint is automatically rolled back and the server's state applied.
```
### Reconnection
**Convergence loop** (preferred): Call `converge()` at any time to sync
without a handshake. Works from any state — Init, Live, or after `reset()`.
```cpp
replica.reset();
auto msgs = replica.converge(); // returns BucketHashesMsg
// Send msgs to master, process responses normally.
// If a message is lost, just call converge() again.
```
**Legacy handshake**: For ordered reliable channels.
```cpp
replica.reset();
auto hello = replica.hello();
// ... exchange messages until Live ...
```
**Peer reconnection**:
```cpp
peer.reset(); // preserves table ownership
auto msgs = peer.start(); // re-initiate handshake
```
## Error handling
All operations may throw `sqlpipe::Error`, which carries an `ErrorCode` and a
human-readable message:
```cpp
try {
auto msgs = master.flush(); // std::vector
} catch (const sqlpipe::Error& e) {
// e.code() — ErrorCode enum
// e.what() — descriptive string
}
```
| ErrorCode | Meaning | Recommended action |
|---|---|---|
| `SqliteError` | An underlying SQLite call failed | Check the message; may indicate corruption or constraint violation |
| `ProtocolError` | Malformed or unexpected message | Disconnect and reconnect |
| `SchemaMismatch` | Master and replica schemas differ | Install `on_schema_mismatch`, or migrate offline and reconnect |
| `InvalidState` | Operation not valid in current state | Bug in calling code |
| `OwnershipRejected` | Peer ownership request rejected | Server's `approve_ownership` returned false |
| `WithoutRowidTable` | Table uses `WITHOUT ROWID` | Use regular rowid tables |
## Schema migration
`Database` handles migration automatically on construction. For custom
migration workflows, use `generate_migration()` or the static
`Database::migration()` method, which both delegate to sqlift's structural
schema diffing:
```cpp
// Generate migration SQL between two DDL strings.
auto sql = generate_migration(old_schema_ddl, new_schema_ddl);
sqlite3_exec(db, sql.c_str(), 0, 0, 0);
// Or via the static method:
auto sql = Database::migration(old_schema_ddl, new_schema_ddl);
```
For lower-level use, install an `on_schema_mismatch` callback to run
migrations on schema mismatch instead of erroring:
```cpp
ReplicaConfig rc;
rc.on_schema_mismatch = [&](SchemaVersion remote, SchemaVersion local,
const std::string& remote_schema_sql) {
// remote_schema_sql has the master's CREATE TABLE statements.
sqlite3_exec(replica_db, "ALTER TABLE t ADD COLUMN new_col TEXT", 0, 0, 0);
return true; // reset to Init; re-handshake
};
```
The same callback is available on `MasterConfig` and `PeerConfig`.
## Transport wiring
sqlpipe is transport-agnostic. The wire format is length-prefixed:
```cpp
// Sending:
auto buf = sqlpipe::serialize(msg); // or serialize(peer_msg)
send(socket, buf.data(), buf.size());
// Receiving:
uint8_t hdr[4];
recv(socket, hdr, 4);
uint32_t len = hdr[0] | (hdr[1]<<8) | (hdr[2]<<16) | (hdr[3]<<24);
std::vector buf(4 + len);
memcpy(buf.data(), hdr, 4);
recv(socket, buf.data() + 4, len);
auto msg = sqlpipe::deserialize(buf); // or deserialize_peer(buf)
```
The Go wrapper provides a `Transport` interface in
`go/sqlpipe/transport` for pluggable transport implementations.
## Thread safety
`Master`, `Replica`, `Peer`, `Relay`, and `Database` are **not thread-safe**.
Each instance must be accessed from a single thread at a time. The `sqlite3*`
handle must not be used concurrently during sqlpipe operations.
## Message size limits
- **`kMaxMessageSize`** (64 MB) — maximum serialized message size
- **`kMaxArrayCount`** (10 M) — maximum elements in any array field
Messages exceeding these limits cause `deserialize()` to throw `ProtocolError`.
## Related projects
sqldeep and sqlift are bundled into the sqlpipe distribution and require no
separate installation. Their standalone repos remain available for use outside
of sqlpipe:
- **[sqldeep](https://github.com/marcelocantos/sqldeep)** — JSON5-like SQL syntax transpiler for SQLite JSON functions (bundled into sqlpipe)
- **[sqlift](https://github.com/marcelocantos/sqlift)** — Declarative SQLite schema migrations via structural diffing (bundled into sqlpipe)
## License
Apache 2.0. See [LICENSE](LICENSE) for details.
Third-party dependencies:
- **SQLite** — public domain
- **LZ4** — BSD 2-Clause
- **spdlog** — MIT
- **nlohmann/json** — MIT
- **liteparser** — MIT
- **sqlift** — Apache 2.0
- **sqldeep** — Apache 2.0
- **doctest** — MIT (test only)