https://github.com/aalhour/beachdb

🏖️ 🪨 Toy distributed NoSQL database in Go
https://github.com/aalhour/beachdb

database distributed go golang nosql raft storage-engine

Last synced: 4 months ago
JSON representation

🏖️ 🪨 Toy distributed NoSQL database in Go

Host: GitHub
URL: https://github.com/aalhour/beachdb
Owner: aalhour
License: apache-2.0
Created: 2026-01-04T23:51:04.000Z (6 months ago)
Default Branch: main
Last Pushed: 2026-01-25T12:58:09.000Z (5 months ago)
Last Synced: 2026-01-27T02:39:58.494Z (5 months ago)
Topics: database, distributed, go, golang, nosql, raft, storage-engine
Language: Makefile
Homepage:
Size: 4.05 MB
Stars: 14
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          



  



**BeachDB is a toy distributed NoSQL database. Built for learning and education, not production.**

It starts life as a small, inspectable storage engine, then deliberately grows “real-system bones”: a server API, a failure model, and a Raft-replicated core. The point isn’t to win benchmarks — it’s to understand, measure, and explain what’s actually happening.

### Backstory

I’ve been fond of distributed systems and databases for a long time. I wrote my first Hadoop and Apache Spark pipeline back in 2016, then went on to solve hairy stream-processing problems at Shopify, and later worked on Apache HBase at HubSpot where I helped build and operate database infrastructure on top of Kubernetes at massive scale.

BeachDB is my attempt to re-learn the fundamentals by building them from scratch in Go. I’m prioritizing **simplicity, clarity, and understanding** over scalability, speed, and micro-optimizations.

## Architecture

- **LSM storage engine** (WAL → memtable → SSTables → compaction)

- **Single-node API** (server wrapper for Get/Put/Delete/Scan with timeouts + backpressure)

- **Distributed replication with Raft** (single group: leader writes + leader reads; log entry == `WriteBatch`)

- **Inspectability-first** (dump tools + crash tests as part of the architecture)

## Key features (shipped as a checklist)

> This list is ordered to match the build + blog sequence. I’ll tick these off as they land.

### Engine (storage truth)

- [x] **Scope + semantics contract** (snapshots, iterators, durability), see: [intro blog post](https://aalhour.com/posts/building-beachdb/)

- [x] **WAL v1**: checksums + deterministic crash recovery (**fsync per committed batch**), see: [durability blog post](https://aalhour.com/posts/beachdb-wal-v1-milestone/)

- [ ] **Crash-loop harness**: kill mid-write, reopen, validate invariants

- [ ] **Memtable v1**: sorted structure + tombstones

- [ ] **Reference-model randomized tests** (model vs implementation)

- [ ] **SSTables v1**: immutable sorted files + `sst_dump`

- [ ] **Merge iterators** (memtable + SSTs) + **snapshot reads** (seqno-based)

- [ ] **Manifest/versioning** + `manifest_dump` (startup reconstruction)

- [ ] **Read path acceleration**: block index + bloom filters + benchmark evidence

- [ ] **Compaction v1**: one strategy, minimal knobs + amplification measurements

- [ ] **Adversarial testing**: fault injection + fuzzing (WAL/SST decode paths)

### Server (systems truth)

- [ ] **Binary protocol** (framed) + timeouts + backpressure

- [ ] **Load generator** + p50/p99 latency reporting

- [ ] **Metrics/tracing hooks** that make performance explainable

### Replication (distributed truth)

- [ ] **Raft (single group)** where a log entry == serialized `WriteBatch`

- [ ] **Deterministic apply** + restart safety

- [ ] **Snapshotting** for fast catch-up

### Sequel teaser (maybe)

- [ ] **Tables & Regions**: table-ish encoding + scans + key-range routing (minimal, no rabbit holes)

## Non-goals (by design)

To keep BeachDB small and finishable, these are intentionally out of scope for Season 1:

- Production readiness, multi-year maintenance guarantees, or compatibility promises

- Multi-writer concurrency in the engine (single-writer early on)

- Background compaction early on (added only after invariants are rock-solid)

- SQL, query planner, joins, secondary indexes

- Full transactions / serializable isolation

- Auto sharding, region split/merge, rebalancing, quorum reads, gossip/repair

## Philosophy

> Every chapter ends with evidence: a dump tool, a crash test, a benchmark, or a diagram.

See [docs/principles.md](docs/principles.md) to see how I'm keeping this project from turning into a second job :)

## License

Apache 2.0 (see: [LICENSE](LICENSE))

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aalhour/beachdb

Awesome Lists containing this project

README