An open API service indexing awesome lists of open source software.

https://github.com/henryqingmo/kubekv

Go implementation of the Raft consensus algorithm with simulated RPC, leader election, and fault-tolerant log replication.
https://github.com/henryqingmo/kubekv

concensus etcd fault-tolerance go k8s kubernetes raft raft-consensus-algorithm

Last synced: 18 days ago
JSON representation

Go implementation of the Raft consensus algorithm with simulated RPC, leader election, and fault-tolerant log replication.

Awesome Lists containing this project

README

          

# KubeKV

A fault-tolerant, linearizable key-value store built on a from-scratch Raft consensus implementation in Go. Long-term goal: a minimal etcd — the kind of thing Kubernetes uses for cluster state.

## What it does

- **Raft consensus** — leader election, log replication, and term-based safety
- **Log compaction** — snapshotting with `InstallSnapshot` so logs don't grow unbounded
- **Linearizable KV** — versioned `Put`/`Get` operations; stale reads are rejected
- **Partition tolerance** — correctly handles split-brain, heals on reconnect, checks term + leadership before committing

## Architecture

```mermaid
flowchart TD
subgraph client["Client Layer"]
C["Clerk\nclient.go"]
end

subgraph kv["KV Layer — kvraft1/"]
KVS["KVServer\nserver.go"]
RSM["RSM reader loop\nrsm/rsm.go"]
Store["In-memory KV store\nversioned map[key]→{val, version}"]
end

subgraph raft["Raft Layer — raft1/"]
RL["Raft leader\nraft.go"]
RF1["Raft follower"]
RF2["Raft follower"]
P["Persister\nraft state + snapshot"]
end

C -->|"Get / Put RPC"| KVS
KVS -->|"Submit(op)\nblocks until committed"| RSM
RSM -->|"Start(cmd)\nstartCh → immediate AppendEntries"| RL
RL -->|"AppendEntries RPC"| RF1
RL -->|"AppendEntries RPC"| RF2
RL -.->|"InstallSnapshot RPC\nlagging follower"| RF1
RL -->|"applyCh ApplyMsg\ncommitted index + command"| RSM
RSM -->|"DoOp()"| Store
RSM -.->|"Snapshot()\nwhen log ≥ 90% maxraftstate"| RL
RL -->|"persist()"| P
```

Each write goes through the Raft log. Reads block until the server confirms it's still the leader (no stale reads). The RSM layer owns the `applyCh` drain loop — it executes committed ops, wakes blocked `Submit()` callers via a per-index channel, and triggers snapshotting when the log approaches `maxraftstate`. On log growth, the KV layer serializes its state and Raft discards the log prefix, shipping the snapshot to lagging followers via `InstallSnapshot`.

## Key design decisions

**Versioned puts over CAS** — every key carries a monotonic version. `Put(key, val, version)` is rejected if the version doesn't match, giving clients optimistic concurrency without distributed locks. This maps cleanly to how etcd's watch + revision model works.

**Snapshot-driven log truncation** — when the KV layer signals a snapshot, Raft replaces the log prefix with `(lastIncludedIndex, lastIncludedTerm)` and ships the snapshot to lagging followers via `InstallSnapshot`. This is the same mechanism etcd uses to onboard new members without replaying the full history.

**Fast path on `Start()`** — `Start()` signals a dedicated channel that immediately triggers `AppendEntries` to all peers, rather than waiting for the next heartbeat tick. Cuts commit latency under load.

## Running

```bash
cd src/raft1
go test -run TestBasicAgree
go test -run TestSnapshotInstall
go test -run TestConcurrentClients
```

## Where this is headed

The immediate target is a standalone gRPC server that speaks a subset of the etcd v3 API — enough to swap in as a Kubernetes datastore for a small cluster. That requires:

- [ ] gRPC transport replacing the in-process RPC shim
- [ ] Persistent WAL on disk (currently in-memory via `Persister`)
- [ ] Watch streams (key-range notifications on commit)
- [ ] Membership changes (Raft joint consensus, §6 of the paper)
- [ ] Read index / lease-based reads for lower-latency queries

## References

- [In Search of an Understandable Consensus Algorithm](https://raft.github.io/raft.pdf) — Ongaro & Ousterhout
- [etcd internals](https://etcd.io/docs/v3.5/learning/design-learner/) — membership and storage design