An open API service indexing awesome lists of open source software.

https://github.com/maxpert/marmot

A distributed SQLite replicator built on top of NATS
https://github.com/maxpert/marmot

database distributed nats-streaming raft-consensus-algorithm replication sqlite3

Last synced: 14 days ago
JSON representation

A distributed SQLite replicator built on top of NATS

Awesome Lists containing this project

README

          

# Marmot v2

[![Go Report Card](https://goreportcard.com/badge/github.com/maxpert/marmot)](https://goreportcard.com/report/github.com/maxpert/marmot)
[![Discord](https://badgen.net/badge/icon/discord?icon=discord&label=Marmot)](https://discord.gg/AWUwY66XsE)
![GitHub](https://img.shields.io/github/license/maxpert/marmot)

## What & Why?

Marmot v2 is a leaderless, distributed SQLite replication system built on a gossip-based protocol with distributed transactions and eventual consistency.

**Key Features:**
- **Leaderless Architecture**: No single point of failure - any node can accept writes
- **MySQL Protocol Compatible**: Connect with any MySQL client (DBeaver, MySQL Workbench, mysql CLI)
- **WordPress Compatible**: Full MySQL function support for running distributed WordPress
- **Distributed Transactions**: Percolator-style write intents with conflict detection
- **Multi-Database Support**: Create and manage multiple databases per cluster
- **DDL Replication**: Distributed schema changes with automatic idempotency and cluster-wide locking
- **Production-Ready SQL Parser**: Powered by rqlite/sql AST parser for MySQL→SQLite transpilation
- **CDC-Based Replication**: Row-level change data capture for consistent replication

## Why Marmot?

### The Problem with Traditional Replication

MySQL active-active requires careful setup of replication, conflict avoidance, and monitoring. Failover needs manual intervention. Split-brain scenarios demand operational expertise. This complexity doesn't scale to edge deployments.

### Marmot's Approach

- **Zero operational overhead**: Automatic recovery from split-brain via eventual consistency + anti-entropy
- **No leader election**: Any node accepts writes, no failover coordination needed
- **Direct SQLite access**: Clients can read the local SQLite file directly for maximum performance
- **Tunable consistency**: Choose ONE/QUORUM/ALL per your latency vs durability needs

### Why MySQL Protocol?

- Ecosystem compatibility - existing drivers, ORMs, GUI tools work out-of-box
- Battle-tested wire protocol implementations
- Run real applications like WordPress without modification

### Ideal Use Cases

Marmot excels at **read-heavy edge scenarios**:

| Use Case | How Marmot Helps |
|----------|------------------|
| **Distributed WordPress** | Multi-region WordPress with replicated database |
| **Lambda/Edge sidecars** | Lightweight regional SQLite replicas, local reads |
| **Edge vector databases** | Distributed embeddings with local query |
| **Regional config servers** | Fast local config reads, replicated writes |
| **Product catalogs** | Geo-distributed catalog data, eventual sync |

### When to Consider Alternatives

- **Strong serializability required** → CockroachDB, Spanner
- **Single-region high-throughput** → PostgreSQL, MySQL directly
- **Large datasets (>100GB)** → Sharded solutions

## Quick Start

```bash
# Start a single-node cluster
./marmot-v2

# Or run as daemon (background)
./marmot-v2 -daemon -pid-file=/tmp/marmot/marmot.pid

# Connect with MySQL client
mysql -h localhost -P 3306 -u root

# Or use DBeaver, MySQL Workbench, etc.
```

### Testing Replication

```bash
# Test DDL and DML replication across a 2-node cluster
./scripts/test-ddl-replication.sh

# This script will:
# 1. Start a 2-node cluster
# 2. Create a table on node 1 and verify it replicates to node 2
# 3. Insert data on node 1 and verify it replicates to node 2
# 4. Update data on node 2 and verify it replicates to node 1
# 5. Delete data on node 1 and verify it replicates to node 2

# Manual cluster testing
./examples/start-seed.sh # Start seed node (port 8081, mysql 3307)
./examples/join-cluster.sh 2 localhost:8081 # Join node 2 (port 8082, mysql 3308)
./examples/join-cluster.sh 3 localhost:8081 # Join node 3 (port 8083, mysql 3309)

# Connect to any node and run queries
mysql --protocol=TCP -h localhost -P 3307 -u root
mysql --protocol=TCP -h localhost -P 3308 -u root

# Cleanup
pkill -f marmot-v2
```

## WordPress Support

Marmot can run **distributed WordPress** with full database replication across nodes. Each WordPress instance connects to its local Marmot node, and all database changes replicate automatically.

### MySQL Compatibility for WordPress

Marmot implements MySQL functions required by WordPress:

| Category | Functions |
|----------|-----------|
| **Date/Time** | NOW, CURDATE, DATE_FORMAT, UNIX_TIMESTAMP, DATEDIFF, YEAR, MONTH, DAY, etc. |
| **String** | CONCAT_WS, SUBSTRING_INDEX, FIND_IN_SET, LPAD, RPAD, etc. |
| **Math/Hash** | RAND, MD5, SHA1, SHA2, POW, etc. |
| **DML** | ON DUPLICATE KEY UPDATE (transformed to SQLite ON CONFLICT) |

### Quick Start: 3-Node WordPress Cluster

```bash
cd examples/wordpress-cluster
./run.sh up
```

This starts:
- **3 Marmot nodes** with QUORUM write consistency
- **3 WordPress instances** each connected to its local Marmot node

```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ WordPress-1 │ │ WordPress-2 │ │ WordPress-3 │
│ :9101 │ │ :9102 │ │ :9103 │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Marmot-1 │◄─┤ Marmot-2 │◄─┤ Marmot-3 │
│ MySQL: 9191 │ │ MySQL: 9192 │ │ MySQL: 9193 │
└─────────────┘ └─────────────┘ └─────────────┘
└──────────────┴──────────────┘
QUORUM Replication
```

**Test it:**
1. Open http://localhost:9101 - complete WordPress installation
2. Open http://localhost:9102 or http://localhost:9103
3. See your content replicated across all nodes!

**Commands:**
```bash
./run.sh status # Check cluster health
./run.sh logs-m # Marmot logs only
./run.sh logs-wp # WordPress logs only
./run.sh down # Stop cluster
```

### Production Considerations for WordPress

- **Media uploads**: Use S3/NFS for shared media storage (files not replicated by Marmot)
- **Sessions**: Use Redis or database sessions for sticky-session-free load balancing
- **Caching**: Each node can use local object cache (Redis/Memcached per region)

## Architecture

Marmot v2 uses a fundamentally different architecture from other SQLite replication solutions:

**vs. rqlite/dqlite/LiteFS:**
- ❌ They require a primary node for all writes
- ✅ Marmot allows writes on **any node**
- ❌ They use leader election (Raft)
- ✅ Marmot uses **gossip protocol** (no leader)
- ❌ They require proxy layer or page-level interception
- ✅ Marmot uses **MySQL protocol** for direct database access

**How It Works:**
1. **Write Coordination**: 2PC (Two-Phase Commit) with configurable consistency (ONE, QUORUM, ALL)
2. **Conflict Resolution**: Last-Write-Wins (LWW) with HLC timestamps
3. **Cluster Membership**: SWIM-style gossip with failure detection
4. **Data Replication**: Full database replication - all nodes receive all data
5. **DDL Replication**: Cluster-wide schema changes with automatic idempotency

## Comparison with Alternatives

| Aspect | Marmot | MySQL Active-Active | rqlite | dqlite | TiDB |
|--------|--------|---------------------|--------|-------|------|
| **Leader** | None | None (but complex) | Yes (Raft) | Yes (Raft) | Yes (Raft) |
| **Failover** | Automatic | Manual intervention | Automatic | Automatic | Automatic |
| **Split-brain recovery** | Automatic (anti-entropy) | Manual | N/A (leader-based) | N/A (leader-based) | N/A |
| **Consistency** | Tunable (ONE/QUORUM/ALL) | Serializable | Tunabale (ONE/QUORUM/Linearizable) | Strong | Strong |
| **Direct file read** | ✅ SQLite file | ❌ | ✅ SQLite file | ❌ | ❌ |
| **JS-safe AUTO_INCREMENT** | ✅ Compact mode (53-bit) | N/A | N/A | ❌ 64-bit breaks JS |
| **Edge-friendly** | ✅ Lightweight | ❌ Heavy | ✅ Lightweight | ⚠️ Moderate | ❌ Heavy |
| **Operational complexity** | Low | High | Low | Low | High |

## DDL Replication

Marmot v2 supports **distributed DDL (Data Definition Language) replication** without requiring master election:

### How It Works

1. **Cluster-Wide Locking**: Each DDL operation acquires a distributed lock per database (default: 30-second lease)
- Prevents concurrent schema changes on the same database
- Locks automatically expire if a node crashes
- Different databases can have concurrent DDL operations

2. **Automatic Idempotency**: DDL statements are automatically rewritten for safe replay
```sql
CREATE TABLE users (id INT)
→ CREATE TABLE IF NOT EXISTS users (id INT)

DROP TABLE users
→ DROP TABLE IF EXISTS users
```

3. **Schema Version Tracking**: Each database maintains a schema version counter
- Incremented on every DDL operation
- Exchanged via gossip protocol for drift detection
- Used by delta sync to validate transaction applicability

4. **Quorum-Based Replication**: DDL replicates like DML through the same 2PC mechanism
- No special master node needed
- Works with existing consistency levels (QUORUM, ALL, etc.)

### Configuration

```toml
[ddl]
# DDL lock lease duration (seconds)
lock_lease_seconds = 30

# Automatically rewrite DDL for idempotency
enable_idempotent = true
```

### Best Practices

- ✅ **Do**: Execute DDL from a single connection/node at a time
- ✅ **Do**: Use qualified table names (`mydb.users` instead of `users`)
- ⚠️ **Caution**: ALTER TABLE is less idempotent - avoid replaying failed ALTER operations
- ❌ **Don't**: Run concurrent DDL on the same database from multiple nodes

## CDC-Based Replication

Marmot v2 uses **Change Data Capture (CDC)** for replication instead of SQL statement replay:

### How It Works

1. **Row-Level Capture**: Instead of replicating SQL statements, Marmot captures the actual row data changes (INSERT/UPDATE/DELETE)
2. **Binary Data Format**: Row data is serialized as CDC messages with column values, ensuring consistent replication regardless of SQL dialect
3. **Deterministic Application**: Row data is applied directly to the target database, avoiding parsing ambiguities

### Benefits

- **Consistency**: Same row data applied everywhere, no SQL parsing differences
- **Performance**: Binary format is more efficient than SQL text
- **Reliability**: No issues with SQL syntax variations between MySQL and SQLite

### Row Key Extraction

For UPDATE and DELETE operations, Marmot automatically extracts row keys:
- Uses PRIMARY KEY columns when available
- Falls back to ROWID for tables without explicit primary key
- Handles composite primary keys correctly

## CDC Publisher

Marmot can publish CDC events to external messaging systems, enabling real-time data pipelines, analytics, and event-driven architectures. Events follow the **[Debezium](https://debezium.io/) specification** for maximum compatibility with existing CDC tooling.

### Features

- **Debezium-Compatible Format**: Events conform to the [Debezium event structure](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-events), compatible with Kafka Connect, Flink, Spark, and other CDC consumers
- **Multi-Sink Support**: Publish to multiple destinations simultaneously (Kafka, NATS)
- **Glob-Based Filtering**: Filter which tables and databases to publish
- **Automatic Retry**: Exponential backoff with configurable limits
- **Persistent Cursors**: Survives restarts without losing position

### Configuration

```toml
[publisher]
enabled = true

[[publisher.sinks]]
name = "kafka-main"
type = "kafka" # "kafka" or "nats"
format = "debezium" # Debezium-compatible JSON format
brokers = ["localhost:9092"] # Kafka broker addresses
topic_prefix = "marmot.cdc" # Topics: {prefix}.{database}.{table}
filter_tables = ["*"] # Glob patterns (e.g., "users", "order_*")
filter_databases = ["*"] # Glob patterns (e.g., "prod_*")
batch_size = 100 # Events per poll cycle
poll_interval_ms = 10 # Polling interval

# NATS sink example
[[publisher.sinks]]
name = "nats-events"
type = "nats"
format = "debezium"
nats_url = "nats://localhost:4222"
topic_prefix = "marmot.cdc"
filter_tables = ["*"]
filter_databases = ["*"]
```

### Event Format

Events follow the [Debezium envelope structure](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-change-events-value):

```json
{
"schema": { ... },
"payload": {
"before": null,
"after": {"id": 1, "name": "alice", "email": "alice@example.com"},
"source": {
"version": "2.0.0",
"connector": "marmot",
"name": "marmot",
"ts_ms": 1702500000000,
"db": "myapp",
"table": "users"
},
"op": "c",
"ts_ms": 1702500000000
}
}
```

**Operation Types** (per [Debezium spec](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-create-events)):
| Operation | `op` | `before` | `after` |
|-----------|------|----------|---------|
| INSERT | `c` (create) | `null` | row data |
| UPDATE | `u` (update) | old row | new row |
| DELETE | `d` (delete) | old row | `null` |

### Topic Naming

Topics follow the pattern: `{topic_prefix}.{database}.{table}`

Examples:
- `marmot.cdc.myapp.users`
- `marmot.cdc.myapp.orders`
- `marmot.cdc.analytics.events`

### Use Cases

- **Real-Time Analytics**: Stream changes to data warehouses (Snowflake, BigQuery, ClickHouse)
- **Event-Driven Microservices**: Trigger actions on data changes
- **Cache Invalidation**: Keep caches in sync with database changes
- **Audit Logging**: Capture all changes for compliance
- **Search Indexing**: Keep Elasticsearch/Algolia in sync

For more details, see the [Integrations documentation](https://maxpert.github.io/marmot/integrations).

## Edge Deployment Patterns

### Lambda Sidecar

Deploy Marmot as a lightweight regional replica alongside Lambda functions:
- Local SQLite reads (sub-ms latency)
- Writes replicate to cluster
- No cold-start database connections

### Read-Only Regional Replicas

Scale reads globally with replica mode and transparent failover:

```toml
[replica]
enabled = true
follow_addresses = ["central-cluster-1:8080", "central-cluster-2:8080", "central-cluster-3:8080"]
replicate_databases = [] # Filter databases (empty = all, supports glob patterns)
database_discovery_interval_seconds = 10 # Poll for new databases (default: 10)
discovery_interval_seconds = 30 # Poll for cluster membership (default: 30)
failover_timeout_seconds = 60 # Failover timeout (default: 60)
snapshot_concurrency = 3 # Parallel snapshot downloads (default: 3)
snapshot_cache_ttl_seconds = 30 # Snapshot cache TTL (default: 30)
```

- Discovers cluster nodes automatically via `GetClusterNodes` RPC
- Transparent failover when current source becomes unavailable
- Automatic discovery of new databases with configurable polling
- Per-database selective replication with glob pattern support
- Parallel snapshot downloads with configurable concurrency
- Snapshot caching for performance optimization
- Zero cluster participation overhead
- Auto-reconnect with exponential backoff

### Hybrid: Edge Reads, Central Writes

- Deploy full cluster in central region
- Deploy read replicas at edge locations
- Application routes writes to central, reads to local replica

## SQL Statement Compatibility

Marmot supports a wide range of MySQL/SQLite statements through its MySQL protocol server. The following table shows compatibility for different statement types:

| Statement Type | Support | Replication | Notes |
|---------------|---------|-------------|-------|
| **DML - Data Manipulation** |
| `INSERT` / `REPLACE` | ✅ Full | ✅ Yes | Includes qualified table names (db.table) |
| `UPDATE` | ✅ Full | ✅ Yes | Includes qualified table names |
| `DELETE` | ✅ Full | ✅ Yes | Includes qualified table names |
| `SELECT` | ✅ Full | N/A | Read operations |
| `LOAD DATA` | ✅ Full | ✅ Yes | Bulk data loading |
| **DDL - Data Definition** |
| `CREATE TABLE` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| `ALTER TABLE` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| `DROP TABLE` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| `TRUNCATE TABLE` | ✅ Full | ✅ Yes | |
| `RENAME TABLE` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| `CREATE/DROP INDEX` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| `CREATE/DROP VIEW` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| `CREATE/DROP TRIGGER` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| **Database Management** |
| `CREATE DATABASE` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| `DROP DATABASE` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| `ALTER DATABASE` | ✅ Full | ✅ Yes | Replicated with cluster-wide locking |
| `SHOW DATABASES` | ✅ Full | N/A | Metadata query |
| `SHOW TABLES` | ✅ Full | N/A | Metadata query |
| `USE database` | ✅ Full | N/A | Session state |
| **Transaction Control** |
| `BEGIN` / `START TRANSACTION` | ✅ Full | N/A | Transaction boundary |
| `COMMIT` | ✅ Full | ✅ Yes | Commits distributed transaction |
| `ROLLBACK` | ✅ Full | ✅ Yes | Aborts distributed transaction |
| `SAVEPOINT` | ✅ Full | ✅ Yes | Nested transaction support |
| **Locking** |
| `LOCK TABLES` | ✅ Parsed | ❌ No | Requires distributed locking coordination |
| `UNLOCK TABLES` | ✅ Parsed | ❌ No | Requires distributed locking coordination |
| **Session Configuration** |
| `SET` statements | ✅ Parsed | ❌ No | Session-local, not replicated |
| `SET marmot_transpilation` | ✅ Full | ❌ No | Toggle MySQL→SQLite transpilation |
| `LOAD EXTENSION` | ✅ Full | ❌ No | Load SQLite extension (requires config) |
| **XA Transactions** |
| `XA START/END/PREPARE` | ✅ Parsed | ❌ No | Marmot uses its own 2PC protocol |
| `XA COMMIT/ROLLBACK` | ✅ Parsed | ❌ No | Not compatible with Marmot's model |
| **DCL - Data Control** |
| `GRANT` / `REVOKE` | ✅ Parsed | ❌ No | User management not replicated |
| `CREATE/DROP USER` | ✅ Parsed | ❌ No | User management not replicated |
| `ALTER USER` | ✅ Parsed | ❌ No | User management not replicated |
| **Administrative** |
| `OPTIMIZE TABLE` | ✅ Parsed | ❌ No | Node-local administrative command |
| `REPAIR TABLE` | ✅ Parsed | ❌ No | Node-local administrative command |

### Legend
- ✅ **Full**: Fully supported and working
- ✅ **Parsed**: Statement is parsed and recognized
- ⚠️ **Limited**: Works but has limitations in distributed context
- ❌ **No**: Not supported or not replicated
- **N/A**: Not applicable (read-only or session-local)

### Important Notes

1. **Schema Changes (DDL)**: DDL statements are fully replicated with cluster-wide locking and automatic idempotency. See the DDL Replication section for details.

2. **XA Transactions**: Marmot has its own distributed transaction protocol based on 2PC. MySQL XA transactions are not compatible with Marmot's replication model.

3. **User Management (DCL)**: User and privilege management statements are local to each node. For production deployments, consider handling authentication at the application or proxy level.

4. **Table Locking**: `LOCK TABLES` statements are recognized but not enforced across the cluster. Use application-level coordination for distributed locking needs.

5. **Qualified Names**: Marmot fully supports qualified table names (e.g., `db.table`) in DML and DDL operations.

## SQLite Extensions

Marmot supports loading SQLite extensions to add custom functions, virtual tables, and other capabilities. This enables use cases like vector search (sqlite-vss), full-text search, and custom aggregations.

### Configuration

```toml
[extensions]
# Directory containing extension libraries (.so, .dylib, .dll)
directory = "/opt/sqlite/extensions"

# Extensions loaded automatically into every connection
always_loaded = ["sqlite-vector"]
```

### Dynamic Loading

Extensions can be loaded at runtime via SQL:

```sql
-- Load an extension by name (resolved from extensions.directory)
LOAD EXTENSION sqlite-vector;

-- Verify the extension is loaded
SELECT vec_version();
```

**Extension Resolution:**
- Searches the configured `directory` for the extension
- Tries multiple patterns: `name`, `name.so`, `name.dylib`, `libname.so`, `libname.dylib`
- Platform-appropriate suffix is preferred (.dylib on macOS, .so on Linux, .dll on Windows)

**Security:**
- Extension names cannot contain path separators (no path traversal)
- Resolved paths must stay within the configured directory
- Only extensions in the configured directory can be loaded

### Use Cases

| Extension | Purpose |
|-----------|---------|
| **sqlite-vss** | Vector similarity search for AI/ML embeddings |
| **sqlite-vec** | Lightweight vector search |
| **fts5** | Full-text search (usually built-in) |
| **json1** | JSON functions (usually built-in) |

## Session-Level Transpilation Toggle

Marmot normally transpiles MySQL syntax to SQLite. You can disable this per-session to send raw SQLite SQL directly.

### Usage

```sql
-- Disable MySQL→SQLite transpilation (raw SQLite mode)
SET marmot_transpilation = OFF;

-- Now you can use SQLite-specific syntax
SELECT sqlite_version();
PRAGMA table_info(users);

-- Re-enable transpilation
SET marmot_transpilation = ON;

-- Back to MySQL mode
SELECT NOW(); -- Transpiled to SQLite's datetime functions
```

### When to Use

- **Raw SQLite queries**: Use SQLite-specific syntax like PRAGMA, ATTACH, etc.
- **Performance testing**: Compare transpiled vs raw query performance
- **Debugging**: See exactly what SQLite receives
- **Advanced SQLite features**: Access features not available through MySQL syntax

**Note:** Transpilation is enabled by default. The setting is per-session and not persisted.

## MySQL Protocol & Metadata Queries

Marmot includes a MySQL-compatible protocol server, allowing you to connect using any MySQL client (DBeaver, MySQL Workbench, mysql CLI, etc.). The server supports:

### Metadata Query Support

Marmot provides full support for MySQL metadata queries, enabling GUI tools like DBeaver to browse databases, tables, and columns:

- **SHOW Commands**: `SHOW DATABASES`, `SHOW TABLES`, `SHOW COLUMNS FROM table`, `SHOW CREATE TABLE`, `SHOW INDEXES`
- **INFORMATION_SCHEMA**: Queries against `INFORMATION_SCHEMA.TABLES`, `INFORMATION_SCHEMA.COLUMNS`, `INFORMATION_SCHEMA.SCHEMATA`, and `INFORMATION_SCHEMA.STATISTICS`
- **Type Conversion**: Automatic SQLite-to-MySQL type mapping for compatibility

These metadata queries are powered by the **rqlite/sql AST parser**, providing production-grade MySQL query compatibility.

### Connecting with MySQL Clients

```bash
# Using mysql CLI
mysql -h localhost -P 3306 -u root

# Connection string for applications
mysql://root@localhost:3306/marmot
```

## Recovery Scenarios

Marmot handles various failure and recovery scenarios automatically:

### Network Partition (Split-Brain)

| Scenario | Behavior |
|----------|----------|
| **Minority partition** | Writes **fail** - cannot achieve quorum |
| **Majority partition** | Writes **succeed** - quorum achieved |
| **Partition heals** | Delta sync + LWW merges divergent data |

**How it works:**
1. During partition, only the majority side can commit writes (quorum enforcement)
2. When partition heals, nodes exchange transaction logs via `StreamChanges` RPC
3. Conflicts resolved using Last-Writer-Wins (LWW) with HLC timestamps
4. Higher node ID breaks ties for simultaneous writes

### Node Failure & Recovery

| Scenario | Recovery Method |
|----------|-----------------|
| **Brief outage** | Delta sync - replay missed transactions |
| **Extended outage** | Snapshot transfer + delta sync |
| **New node joining** | Full snapshot from existing node |

**Anti-Entropy Background Process:**

Marmot v2 includes an automatic anti-entropy system that continuously monitors and repairs replication lag across the cluster:

1. **Lag Detection**: Every 30 seconds (configurable), each node queries peers for their replication state
2. **Smart Recovery Decision**:
- **Delta Sync** if lag < 10,000 transactions AND < 1 hour: Streams missed transactions incrementally
- **Snapshot Transfer** if lag exceeds thresholds: Full database file transfer for efficiency
3. **Gap Detection**: Detects when transaction logs have been GC'd and automatically falls back to snapshot
4. **Multi-Database Support**: Tracks and syncs each database independently
5. **GC Coordination**: Garbage collection respects peer replication state - logs aren't deleted until all peers have applied them

**Delta Sync Process:**
1. Lagging node queries `last_applied_txn_id` for each peer/database
2. Requests transactions since that ID via `StreamChanges` RPC
3. **Gap Detection**: Checks if first received txn_id has a large gap from requested ID
- If gap > delta_sync_threshold_txns, indicates missing (GC'd) transactions
- Automatically falls back to snapshot transfer to prevent data loss
4. Applies changes using LWW conflict resolution
5. Updates replication state tracking (per-database)
6. Progress logged every 100 transactions

**GC Coordination with Anti-Entropy:**
- Transaction logs are retained with a two-tier policy:
- **Min retention** (2 hours): Must be >= delta sync threshold, respects peer lag
- **Max retention** (24 hours): Force delete after this time to prevent unbounded growth
- Config validation enforces: `gc_min >= delta_threshold` and `gc_max >= 2x delta_threshold`
- Each database tracks replication progress per peer
- GC queries minimum applied txn_id across all peers before cleanup
- **Gap detection** prevents data loss if GC runs while nodes are offline

### Consistency Guarantees

| Write Consistency | Behavior |
|-------------------|----------|
| `ONE` | Returns after 1 node ACK (fast, less durable) |
| `QUORUM` | Returns after majority ACK (default, balanced) |
| `ALL` | Returns after all nodes ACK (slow, most durable) |

**Conflict Resolution:**
- All conflicts resolved via LWW using HLC timestamps
- No data loss - later write always wins deterministically
- Tie-breaker: higher node ID wins for equal timestamps

## Limitations

- **Selective Table Watching**: All tables in a database are replicated. Selective table replication is not supported.
- **WAL Mode Required**: SQLite must use WAL mode for reliable multi-process changes.
- **Eventually Consistent**: Rows may sync out of order. `SERIALIZABLE` transaction assumptions may not hold across nodes.
- **Concurrent DDL**: Avoid running concurrent DDL operations on the same database from multiple nodes (protected by cluster-wide lock with 30s lease).

## Auto-Increment & ID Generation

### The Problem with Distributed IDs

Distributed databases need globally unique IDs, but traditional solutions cause problems:

| Solution | Issue |
|----------|-------|
| **UUID** | 128-bit, poor index performance, not sortable |
| **Snowflake/HLC 64-bit** | Exceeds JavaScript's `Number.MAX_SAFE_INTEGER` (2^53-1) |
| **TiDB AUTO_INCREMENT** | Returns 64-bit IDs that **break JavaScript clients** silently |

**The JavaScript Problem:**

```javascript
// 64-bit ID from TiDB or other distributed DBs
const id = 7318624812345678901;
console.log(id); // 7318624812345679000 - WRONG! Precision lost!

// JSON parsing also breaks
JSON.parse('{"id": 7318624812345678901}'); // {id: 7318624812345679000}
```

TiDB's answer? "Use strings." But that breaks ORMs, existing application code, and type safety.

### Marmot's Solution: Compact ID Mode

Marmot offers **two ID generation modes** to solve this:

| Mode | Bits | Range | Use Case |
|------|------|-------|----------|
| `extended` | 64-bit | Full HLC timestamp | New systems, non-JS clients |
| `compact` | 53-bit | JS-safe integers | **Legacy systems, JavaScript, REST APIs** |

```toml
[mysql]
auto_id_mode = "compact" # Safe for JavaScript (default)
# auto_id_mode = "extended" # Full 64-bit for new systems
```

**Compact Mode Guarantees:**
- IDs stay under `Number.MAX_SAFE_INTEGER` (9,007,199,254,740,991)
- Still globally unique across all nodes
- Still monotonically increasing (per node)
- No silent precision loss in JSON/JavaScript
- Works with existing ORMs expecting integer IDs

**With Marmot compact mode:**
```javascript
const id = 4503599627370496;
console.log(id); // 4503599627370496 - Correct!
JSON.parse('{"id": 4503599627370496}'); // {id: 4503599627370496} - Correct!
```

### How Auto-Increment Works

> **Note:** Marmot automatically converts `INT AUTO_INCREMENT` to `BIGINT` to support distributed ID generation.

1. **DDL Transformation**: When you create a table with `AUTO_INCREMENT`:
```sql
CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100))
-- Becomes internally:
CREATE TABLE users (id BIGINT PRIMARY KEY, name TEXT)
```

2. **DML ID Injection**: When inserting with `0` or `NULL` for an auto-increment column:
```sql
INSERT INTO users (id, name) VALUES (0, 'alice')
-- Becomes internally (compact mode):
INSERT INTO users (id, name) VALUES (4503599627370496, 'alice')
```

3. **Explicit IDs Preserved**: If you provide an explicit non-zero ID, it is used as-is.

**Schema-Based Detection:**

Marmot automatically detects auto-increment columns by querying SQLite schema directly:
- Single-column `INTEGER PRIMARY KEY` (SQLite rowid alias)
- Single-column `BIGINT PRIMARY KEY` (Marmot's transformed columns)

No registration required - columns are detected from schema at runtime, works across restarts, and works with existing databases.

## Configuration

Marmot v2 uses a TOML configuration file (default: `config.toml`). All settings have sensible defaults.

### Core Configuration

```toml
node_id = 0 # 0 = auto-generate
data_dir = "./marmot-data"
```

### Transaction Manager

```toml
[transaction]
heartbeat_timeout_seconds = 10 # Transaction timeout without heartbeat
conflict_window_seconds = 10 # Conflict resolution window
lock_wait_timeout_seconds = 50 # Lock wait timeout (MySQL: innodb_lock_wait_timeout)
```

**Note**: Transaction log garbage collection is managed by the replication configuration to coordinate with anti-entropy. See `replication.gc_min_retention_hours` and `replication.gc_max_retention_hours`.

### Connection Pool

```toml
[connection_pool]
pool_size = 4 # Number of SQLite connections
max_idle_time_seconds = 10 # Max idle time before closing
max_lifetime_seconds = 300 # Max connection lifetime (0 = unlimited)
```

### gRPC Client

```toml
[grpc_client]
keepalive_time_seconds = 10 # Keepalive ping interval
keepalive_timeout_seconds = 3 # Keepalive ping timeout
max_retries = 3 # Max retry attempts
retry_backoff_ms = 100 # Retry backoff duration
```

### Coordinator

```toml
[coordinator]
prepare_timeout_ms = 2000 # Prepare phase timeout
commit_timeout_ms = 2000 # Commit phase timeout
abort_timeout_ms = 2000 # Abort phase timeout
```

### Cluster

```toml
[cluster]
grpc_bind_address = "0.0.0.0"
grpc_port = 8080
seed_nodes = [] # List of seed node addresses
cluster_secret = "" # PSK for cluster authentication (see Security section)
gossip_interval_ms = 1000 # Gossip interval
gossip_fanout = 3 # Number of peers to gossip to
suspect_timeout_ms = 5000 # Suspect timeout
dead_timeout_ms = 10000 # Dead timeout
```

### Security

Marmot supports Pre-Shared Key (PSK) authentication for cluster communication. **This is strongly recommended for production deployments.**

```toml
[cluster]
# All nodes in the cluster must use the same secret
cluster_secret = "your-secret-key-here"
```

**Environment Variable (Recommended):**

For production, use the environment variable to avoid storing secrets in config files:

```bash
export MARMOT_CLUSTER_SECRET="your-secret-key-here"
./marmot
```

The environment variable takes precedence over the config file.

**Generating a Secret:**

```bash
# Generate a secure random secret
openssl rand -base64 32
```

**Behavior:**
- If `cluster_secret` is empty and `MARMOT_CLUSTER_SECRET` is not set, authentication is disabled
- A warning is logged at startup when authentication is disabled
- All gRPC endpoints (gossip, replication, snapshots) are protected when authentication is enabled
- Nodes with mismatched secrets will fail to communicate (connection rejected with "invalid cluster secret")

### Cluster Membership Management

Marmot provides admin HTTP endpoints for managing cluster membership (requires `cluster_secret` to be configured):

**Node Lifecycle:**
- New/restarted nodes **auto-join** via gossip - no manual intervention needed
- Nodes marked REMOVED via admin API **cannot auto-rejoin** - must be explicitly allowed
- This prevents decommissioned nodes from accidentally rejoining the cluster

```bash
# View cluster members and quorum info
curl -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/members

# Remove a node from the cluster (excludes from quorum, blocks auto-rejoin)
curl -X POST -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/remove/2

# Allow a removed node to rejoin (node must then restart to join)
curl -X POST -H "X-Marmot-Secret: your-secret" http://localhost:8080/admin/cluster/allow/2
```

See the [Operations documentation](https://maxpert.github.io/marmot/operations) for detailed usage and examples.

### Replica Mode

For read-only replicas that follow cluster nodes with transparent failover:

```toml
[replica]
enabled = true # Enable read-only replica mode
follow_addresses = ["node1:8080", "node2:8080", "node3:8080"] # Seed nodes for discovery
secret = "replica-secret" # PSK for authentication (required)
replicate_databases = [] # Filter databases (empty = all, supports glob patterns like "prod_*")
database_discovery_interval_seconds = 10 # Poll for new databases (default: 10)
discovery_interval_seconds = 30 # Poll for cluster membership (default: 30)
failover_timeout_seconds = 60 # Max time to find alive node during failover (default: 60)
reconnect_interval_seconds = 5 # Reconnect delay on disconnect (default: 5)
reconnect_max_backoff_seconds = 30 # Max reconnect backoff (default: 30)
initial_sync_timeout_minutes = 30 # Timeout for initial snapshot (default: 30)
snapshot_concurrency = 3 # Parallel snapshot downloads (default: 3)
snapshot_cache_ttl_seconds = 30 # Snapshot cache TTL in seconds (default: 30)
```

You can also specify follow addresses via CLI:
```bash
./marmot --config=replica.toml --follow-addresses=node1:8080,node2:8080,node3:8080
```

**Per-Database Selective Replication:**

Control which databases are replicated using the `replicate_databases` filter:

```toml
[replica]
# Replicate only specific databases
replicate_databases = ["myapp", "analytics"]

# Replicate databases matching glob patterns
replicate_databases = ["prod_*", "staging_*"]

# Replicate all databases (default)
replicate_databases = []
```

The system database (`__marmot_system`) is never replicated - each replica maintains its own independent system database.

**Snapshot Caching:**

Replicas use an LRU cache to avoid redundant snapshot creation:
- Cache TTL controlled by `snapshot_cache_ttl_seconds` (default: 30 seconds)
- Cached snapshots served from temp storage until expiration
- Background cleanup runs automatically
- Improves performance when multiple replicas bootstrap simultaneously

**Parallel Snapshot Downloads:**

Control download concurrency with `snapshot_concurrency`:
- Downloads multiple database snapshots in parallel (default: 3)
- Uses worker pool pattern to limit resource usage
- Partial failure handling: continues even if some databases fail
- Failed databases retry in background with exponential backoff

**Note:** Replica mode is mutually exclusive with cluster mode. A replica receives all data via streaming replication but cannot accept writes. It automatically discovers cluster nodes and fails over to another node if the current source becomes unavailable.

### Replication

```toml
[replication]
default_write_consistency = "QUORUM" # Write consistency level: ONE, QUORUM, ALL
default_read_consistency = "LOCAL_ONE" # Read consistency level
write_timeout_ms = 5000 # Write operation timeout
read_timeout_ms = 2000 # Read operation timeout

# Anti-Entropy: Background healing for eventual consistency
# - Detects and repairs divergence between replicas
# - Uses delta sync for small lags, snapshot for large lags
# - Includes gap detection to prevent incomplete data after GC
enable_anti_entropy = true # Enable automatic catch-up for lagging nodes
anti_entropy_interval_seconds = 30 # How often to check for lag (default: 30s)
gc_interval_seconds = 60 # GC interval (MUST be >= anti_entropy_interval)
delta_sync_threshold_transactions = 10000 # Delta sync if lag < 10K txns
delta_sync_threshold_seconds = 3600 # Snapshot if lag > 1 hour

# Garbage Collection: Reclaim disk space by deleting old transaction records
# - gc_interval must be >= anti_entropy_interval (validated at startup)
# - gc_min must be >= delta_sync_threshold (validated at startup)
# - gc_max should be >= 2x delta_sync_threshold (recommended)
# - Set gc_max = 0 for unlimited retention
gc_min_retention_hours = 2 # Keep at least 2 hours (>= 1 hour delta threshold)
gc_max_retention_hours = 24 # Force delete after 24 hours
```

**Anti-Entropy Tuning:**
- **Small clusters (2-3 nodes)**: Use default settings (30s AE, 60s GC)
- **Large clusters (5+ nodes)**: Consider increasing AE interval to 60-120s and GC to 2x that value
- **High write throughput**: Increase `delta_sync_threshold_transactions` to 50000+
- **Long-running clusters**: Keep `gc_max_retention_hours` at 24+ to handle extended outages

**GC Configuration Rules (Validated at Startup):**
- `gc_min_retention_hours` must be >= `delta_sync_threshold_seconds` (in hours)
- `gc_max_retention_hours` should be >= 2x `delta_sync_threshold_seconds`
- Violating these rules will cause startup failure with helpful error messages

### Query Pipeline

```toml
[query_pipeline]
transpiler_cache_size = 10000 # LRU cache for MySQL→SQLite transpilation
validator_pool_size = 8 # SQLite connection pool for validation
```

### MySQL Protocol Server

```toml
[mysql]
enabled = true
bind_address = "0.0.0.0"
port = 3306
max_connections = 1000
unix_socket = "" # Unix socket path (empty = disabled)
unix_socket_perm = 0660 # Socket file permissions
auto_id_mode = "compact" # "compact" (53-bit, JS-safe) or "extended" (64-bit)
```

**Unix Socket Connection** (lower latency than TCP):
```bash
mysql --socket=/tmp/marmot/mysql.sock -u root
```

### CDC Publisher

```toml
[publisher]
enabled = false # Enable CDC publishing to external systems

[[publisher.sinks]]
name = "kafka-main" # Unique sink name
type = "kafka" # "kafka" or "nats"
format = "debezium" # Debezium-compatible JSON (only option)
brokers = ["localhost:9092"] # Kafka broker addresses
topic_prefix = "marmot.cdc" # Topic pattern: {prefix}.{db}.{table}
filter_tables = ["*"] # Glob patterns for table filtering
filter_databases = ["*"] # Glob patterns for database filtering
batch_size = 100 # Events to read per poll cycle
poll_interval_ms = 10 # Polling interval (default: 10ms)
retry_initial_ms = 100 # Initial retry delay on failure
retry_max_ms = 30000 # Max retry delay (30 seconds)
retry_multiplier = 2.0 # Exponential backoff multiplier
```

See the [Integrations documentation](https://maxpert.github.io/marmot/integrations) for details on event format, Kafka/NATS configuration, and use cases.

### SQLite Extensions

```toml
[extensions]
directory = "/opt/sqlite/extensions" # Search path for extensions
always_loaded = ["sqlite-vector"] # Auto-load into every connection
```

**Loading Extensions at Runtime:**
```sql
LOAD EXTENSION sqlite-vector;
SELECT vec_version();
```

### Logging

```toml
[logging]
verbose = false # Enable verbose logging
format = "json" # Log format: "console" or "json" (json is faster)
file = "" # Log file path (empty = stdout only)
max_size_mb = 100 # Max size in MB before rotation
max_backups = 5 # Number of old log files to keep
compress = true # Compress rotated files with gzip
```

**File Logging with Rotation:**

When `file` is set, logs are written to both stdout and the specified file. The file is automatically rotated when it reaches `max_size_mb`, keeping the last `max_backups` files. Rotated files are optionally compressed with gzip.

```toml
[logging]
file = "/var/log/marmot/marmot.log"
max_size_mb = 100
max_backups = 5
compress = true
```

### Prometheus Metrics

```toml
[prometheus]
enabled = true # Metrics served on gRPC port at /metrics endpoint
```

**Accessing Metrics:**
```bash
# Metrics are multiplexed with gRPC on the same port
curl http://localhost:8080/metrics

# Prometheus scrape config
scrape_configs:
- job_name: 'marmot'
static_configs:
- targets: ['node1:8080', 'node2:8080', 'node3:8080']
```

See `config.toml` for complete configuration reference with detailed comments.

## Benchmarks

Performance benchmarks on a local development machine (Apple M-series, 3-node cluster, single machine):

### Test Configuration

| Parameter | Value |
|-----------|-------|
| Nodes | 3 (ports 3307, 3308, 3309) |
| Threads | 16 |
| Batch Size | 10 ops/transaction |
| Consistency | QUORUM |

### Load Phase (INSERT-only)

| Metric | Value |
|--------|-------|
| Throughput | **4,175 ops/sec** |
| TX Throughput | **417 tx/sec** |
| Records Loaded | 200,000 |
| Errors | 0 |

### Mixed Workload

| Metric | Value |
|--------|-------|
| Throughput | **3,370 ops/sec** |
| TX Throughput | **337 tx/sec** |
| Duration | 120 seconds |
| Total Operations | 404,930 |
| Errors | 0 |
| Retries | 37 (0.09%) |

**Operation Distribution:**
- READ: 20%
- UPDATE: 30%
- INSERT: 35%
- DELETE: 5%
- UPSERT: 10%

### Latency (Mixed Workload)

| Percentile | Latency |
|------------|---------|
| P50 | 4.3ms |
| P90 | 14.0ms |
| P95 | 36.8ms |
| P99 | 85.1ms |

### Replication Verification

All 3 nodes maintained identical row counts (346,684 rows) throughout the test, confirming consistent replication.

> **Note**: These benchmarks are from a local development machine with all nodes on the same host. Production deployments across multiple machines will have different characteristics based on network latency. Expect P99 latencies of 50-200ms for cross-region QUORUM writes.

## Backup & Disaster Recovery

### Option 1: Litestream (Recommended)

Marmot's SQLite files are standard WAL-mode databases, compatible with [Litestream](https://litestream.io/):

```bash
litestream replicate /path/to/marmot-data/*.db s3://bucket/backup
```

### Option 2: CDC to External Storage

Enable CDC publisher to stream changes to Kafka/NATS, then archive to your preferred storage.

### Option 3: Filesystem Snapshots

Since Marmot uses SQLite with WAL mode, you can safely snapshot the data directory during operation.

## Stargazers over time
[![Stargazers over time](https://starchart.cc/maxpert/marmot.svg?variant=adaptive)](https://starchart.cc/maxpert/marmot)

## FAQs & Community

- For FAQs visit [this page](https://maxpert.github.io/marmot/intro#faq)
- For community visit our [discord](https://discord.gg/AWUwY66XsE) or discussions on GitHub