Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/flynnfc/bagginsdb
⚡A cassandra inspired distrubuted wide column nosql database⚡
https://github.com/flynnfc/bagginsdb
cassandra database distrubuted-systems go nosql
Last synced: 2 days ago
JSON representation
⚡A cassandra inspired distrubuted wide column nosql database⚡
- Host: GitHub
- URL: https://github.com/flynnfc/bagginsdb
- Owner: FlynnFc
- License: mit
- Created: 2024-12-18T13:19:12.000Z (26 days ago)
- Default Branch: main
- Last Pushed: 2025-01-07T18:51:13.000Z (5 days ago)
- Last Synced: 2025-01-07T19:50:41.322Z (5 days ago)
- Topics: cassandra, database, distrubuted-systems, go, nosql
- Language: Go
- Homepage:
- Size: 85.2 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Baggins DB
Baggins DB is a simple Cassandra-inspired wide-column db. While not production-ready, it serves as an educational project to explore low-level database internals, concurrency control, and performance-tuning techniques.
---
## Features
- **Memtable (In-Memory Index):**
Stores recently written data in a sorted skiplist for quick insertion and retrieval. Once it reaches a certain size threshold, it is flushed to disk as an immutable SSTable.- **SSTables (On-Disk Storage):**
Writes are organized into append-only, immutable files known as SSTables. Each SSTable is sorted by key and includes:
- A Bloom filter to quickly determine if a key might exist.
- A sparse index to jump near the desired key without scanning the entire file.
- **Compaction:**
Over time, multiple SSTables are merged and deduplicated into a single larger SSTable. This process, known as compaction, reduces storage fragmentation and keeps read performance stable by limiting the number of SSTables that must be searched.- **TrueTime Integration (Mocked):**
The code incorporates a `truetime` component, simulating reliable timestamp generation, similar in spirit to [Google’s TrueTime](https://cloud.google.com/spanner/docs/true-time-external-consistency), though far simpler and not distributed. This allows the system to track record versions and choose the newest value during compactions.## Project Structure
- `internal/database/`
Contains the core database logic including:- `database.go`: The `Database` struct that ties together mem-tables, SSTableManager, and timing.
- `memtable.go`, `skiplist.go`: In-memory skiplist for quick writes and reads.
- `sstable.go`, `sstable_manager.go`: Handling on-disk SSTables, building them from memtables, indexing, and merging them during compaction.- `internal/truetime/`
Mock time service that provides timestamps for record inserts.- `simulation/`
Contains load-testing and simulation scripts that run various scenarios:
- Bulk inserts and reads.
- Mixed workloads (reads/writes).
- Realistic Cassandra-like loads with random keys, values, and read/write ratios.
- `logger/`
A simple logging wrapper configured to produce structured logs via `zap`.## Performance tuning and improvements
I've opted to track and log performance at quite a granular level. You can find saved graphs and performance notes in the [Performance](performance) folder## Roadmap
- Add delete support (tombstones).
- Implement proper error handling and recovery after crashes.
- Infinite scale!
- Integrate more benchmarks and profiling tools to guide optimizations.## License
This project is distributed under the MIT License. See `LICENSE` for details.