https://github.com/mentalblood0/lawn

🧃 Key-value store with mutable data files without trees
https://github.com/mentalblood0/lawn

database-engine key-value-store

Last synced: 20 days ago
JSON representation

🧃 Key-value store with mutable data files without trees

Host: GitHub
URL: https://github.com/mentalblood0/lawn
Owner: mentalblood0
Created: 2025-10-08T12:13:17.000Z (6 months ago)
Default Branch: main
Last Pushed: 2026-02-24T18:17:37.000Z (26 days ago)
Last Synced: 2026-02-24T20:50:39.430Z (26 days ago)
Topics: database-engine, key-value-store
Language: Rust
Homepage:
Size: 1.32 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## lawn

[![tests](https://github.com/mentalblood0/lawn/actions/workflows/tests.yml/badge.svg)](https://github.com/mentalblood0/lawn/actions/workflows/tests.yml)

Key-value store with mutable data files without trees

### Rationale

Key-value stores are often used as primary data store in database. Usually they are implemented as follows

To store key-value pair, reliably write it to file on disk (this file can be called 'Write Ahead Log') and then write it to in-memory tree structure ('memtable')

To recover key-value pairs on next program launch, read each one of them from WAL and write to memtable

Store recovery can take significant time for large databases, so most of key-value pairs eventually stored not in WAL as they were added, but in more search-and-iteration-friendly data structures on disk. Almost always it is just list of key-value pairs, sorted by keys and called 'Sorted Strings Table'

SST does not eliminate WAL, but key-value pairs from WAL periodically 'checkpointed' into SST. Checkpointing implementation is crucial for amount of writes on disk, as it involves rewriting of sorted lists. In lawn SST is split into index part and data part

Index part is just file filled with records of fixed size, sorted by key. First byte of each record indicates in which data-part file related key-value pair is stored. The remaining bytes form number which is the index of key-value pair in indicated file

Data part consists of 256 files. Each file filled with containers of fixed size. Size of each container determined by logarithm-like splitting of scale from 2 to maximum key-value pair size. Maximum key-value pair size is stated by user in configuration file of store

Each data part file is managed as pool: in the beginning of it there is a pointer to the last freed container. Just the same way the last freed container can point to the second last etc.. Freed containers are reused to keep fragmentation under control

Split into index part and data part reduces disk writes drastically: checkpointing do not mean overwriting entire data, just pointers to it stored in index. But it also increases time needed to access to key-value pair: even when iterating them sequentially, we need one random read for each of them

Need for random reads to retrieve data makes traditional memtable-SST merging strategy inefficient as it lineary depends on amount of key-value pairs stored on disk. Something like insertion sort using binary search may be better: `M` key-value pairs in memtable and `D` key-value pairs on disk mean maximum `2 * M * log2(D)` random disk accesses. lawn uses upgraded version of this algorithm called 'sparsed merge': first we look for the place to insert the middle element, then as we know that all elements before it will be placed on the left of the place we just found, we look for the place to insert the middle-of-memtable-first-half element into index-before-place-for-first-element. Effectively recursive nature of this algorithm not only reduces random accesses amount by half for random data, but also it gives the less random accesses the more 'grouped' and 'cornered' inserted key-value pairs appear in resulted list whereas simple insertion sort does the opposite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mentalblood0/lawn

Awesome Lists containing this project

README