https://github.com/henryqingmo/kubekv

Go implementation of the Raft consensus algorithm with simulated RPC, leader election, and fault-tolerant log replication.
https://github.com/henryqingmo/kubekv

concensus etcd fault-tolerance go k8s kubernetes raft raft-consensus-algorithm

Last synced: 18 days ago
JSON representation

Go implementation of the Raft consensus algorithm with simulated RPC, leader election, and fault-tolerant log replication.

Host: GitHub
URL: https://github.com/henryqingmo/kubekv
Owner: henryqingmo
Created: 2026-04-03T23:46:06.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-05-07T06:21:23.000Z (about 1 month ago)
Last Synced: 2026-05-07T08:24:25.211Z (about 1 month ago)
Topics: concensus, etcd, fault-tolerance, go, k8s, kubernetes, raft, raft-consensus-algorithm
Language: Go
Homepage:
Size: 8.56 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # KubeKV

A fault-tolerant, linearizable key-value store built on a from-scratch Raft consensus implementation in Go. Long-term goal: a minimal etcd — the kind of thing Kubernetes uses for cluster state.

## What it does

- **Raft consensus** — leader election, log replication, and term-based safety

- **Log compaction** — snapshotting with `InstallSnapshot` so logs don't grow unbounded

- **Linearizable KV** — versioned `Put`/`Get` operations; stale reads are rejected

- **Partition tolerance** — correctly handles split-brain, heals on reconnect, checks term + leadership before committing

## Architecture

```mermaid

flowchart TD

    subgraph client["Client Layer"]

        C["Clerk\nclient.go"]

    end

    subgraph kv["KV Layer  — kvraft1/"]

        KVS["KVServer\nserver.go"]

        RSM["RSM  reader loop\nrsm/rsm.go"]

        Store["In-memory KV store\nversioned map[key]→{val, version}"]

    end

    subgraph raft["Raft Layer  — raft1/"]

        RL["Raft leader\nraft.go"]

        RF1["Raft follower"]

        RF2["Raft follower"]

        P["Persister\nraft state + snapshot"]

    end

    C -->|"Get / Put RPC"| KVS

    KVS -->|"Submit(op)\nblocks until committed"| RSM

    RSM -->|"Start(cmd)\nstartCh → immediate AppendEntries"| RL

    RL -->|"AppendEntries RPC"| RF1

    RL -->|"AppendEntries RPC"| RF2

    RL -.->|"InstallSnapshot RPC\nlagging follower"| RF1

    RL -->|"applyCh ApplyMsg\ncommitted index + command"| RSM

    RSM -->|"DoOp()"| Store

    RSM -.->|"Snapshot()\nwhen log ≥ 90% maxraftstate"| RL

    RL -->|"persist()"| P

```

Each write goes through the Raft log. Reads block until the server confirms it's still the leader (no stale reads). The RSM layer owns the `applyCh` drain loop — it executes committed ops, wakes blocked `Submit()` callers via a per-index channel, and triggers snapshotting when the log approaches `maxraftstate`. On log growth, the KV layer serializes its state and Raft discards the log prefix, shipping the snapshot to lagging followers via `InstallSnapshot`.

## Key design decisions

**Versioned puts over CAS** — every key carries a monotonic version. `Put(key, val, version)` is rejected if the version doesn't match, giving clients optimistic concurrency without distributed locks. This maps cleanly to how etcd's watch + revision model works.

**Snapshot-driven log truncation** — when the KV layer signals a snapshot, Raft replaces the log prefix with `(lastIncludedIndex, lastIncludedTerm)` and ships the snapshot to lagging followers via `InstallSnapshot`. This is the same mechanism etcd uses to onboard new members without replaying the full history.

**Fast path on `Start()`** — `Start()` signals a dedicated channel that immediately triggers `AppendEntries` to all peers, rather than waiting for the next heartbeat tick. Cuts commit latency under load.

## Running

```bash

cd src/raft1

go test -run TestBasicAgree

go test -run TestSnapshotInstall

go test -run TestConcurrentClients

```

## Where this is headed

The immediate target is a standalone gRPC server that speaks a subset of the etcd v3 API — enough to swap in as a Kubernetes datastore for a small cluster. That requires:

- [ ] gRPC transport replacing the in-process RPC shim

- [ ] Persistent WAL on disk (currently in-memory via `Persister`)

- [ ] Watch streams (key-range notifications on commit)

- [ ] Membership changes (Raft joint consensus, §6 of the paper)

- [ ] Read index / lease-based reads for lower-latency queries

## References

- [In Search of an Understandable Consensus Algorithm](https://raft.github.io/raft.pdf) — Ongaro & Ousterhout

- [etcd internals](https://etcd.io/docs/v3.5/learning/design-learner/) — membership and storage design

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/henryqingmo/kubekv

Awesome Lists containing this project

README