https://github.com/flynnfc/bagginsdb
⚡A cassandra inspired distrubuted wide column nosql database⚡
https://github.com/flynnfc/bagginsdb
cassandra database distrubuted-systems go nosql
Last synced: 2 months ago
JSON representation
⚡A cassandra inspired distrubuted wide column nosql database⚡
- Host: GitHub
- URL: https://github.com/flynnfc/bagginsdb
- Owner: FlynnFc
- License: mit
- Created: 2024-12-18T13:19:12.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-09T19:46:08.000Z (about 1 year ago)
- Last Synced: 2025-06-09T20:32:32.058Z (about 1 year ago)
- Topics: cassandra, database, distrubuted-systems, go, nosql
- Language: Go
- Homepage: https://pkg.go.dev/github.com/flynnfc/bagginsdb/pkg/bagginsdb/db
- Size: 90.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Baggins DB
Baggins DB is a simple Cassandra-inspired wide-column db. While not production-ready, it serves as an educational project to explore low-level database internals, concurrency control, and performance-tuning techniques.
Learn more on how it's made here

---
## Features
- **Memtable (In-Memory Index):**
Stores recently written data in a sorted skiplist for quick insertion and retrieval. Once it reaches a certain size threshold, it is flushed to disk as an immutable SSTable.
- **SSTables (On-Disk Storage):**
Writes are organised into append-only, immutable files known as SSTables. Each SSTable is sorted by key and includes:
- A Bloom filter to quickly determine if a key might exist.
- A sparse index to jump near the desired key without scanning the entire file.
- **Compaction:**
Over time, multiple SSTables are merged and deduplicated into a single larger SSTable. This process, known as compaction, reduces storage fragmentation and stabilises read performance by limiting the number of SSTables that must be searched.
- **TrueTime Integration (Mocked):**
The code incorporates a `truetime` component, simulating reliable timestamp generation, similar in spirit to [Google’s TrueTime](https://cloud.google.com/spanner/docs/true-time-external-consistency), though far simpler and not distributed. This allows the program to avoid distributed locks and choose the newest value during compactions.
## Project Structure
- `pkg/bagginsdb/db`
Contains the core database logic, including:
- `database.go`: The `Database` struct that ties together mem-tables, SSTableManager, and timing.
- `memtable.go`, `skiplist.go`: In-memory skiplist for quick writes and reads.
- `sstable.go`, `sstable_manager.go`: Handling on-disk SSTables, building them from memtables, indexing, and merging them during compaction.
- `pkg/bagginsdb/truetime`
Mock time service that provides timestamps for record inserts.
- `logger/`
A simple logging wrapper configured to produce structured logs via `zap`.
## Performance tuning and improvements
I've opted to track and log performance at quite a granular level. You can find saved graphs and performance notes in the [Performance](performance) folder
## Roadmap
- Add delete support (tombstones).
- Implement improved error handling and recovery after crashes.
- Integrate more benchmarks and profiling tools to guide optimisations.
## License
This project is distributed under the MIT License. See `LICENSE` for details.