{"id":15056330,"url":"https://github.com/ckampfe/b2","last_synced_at":"2026-01-02T00:17:38.021Z","repository":{"id":238233718,"uuid":"796153971","full_name":"ckampfe/b2","owner":"ckampfe","description":"An implementation of Bitcask as a Rust library","archived":false,"fork":false,"pushed_at":"2024-05-23T03:23:32.000Z","size":24,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-05-23T03:40:38.794Z","etag":null,"topics":["bitcask","erlang","kv","riak","rust","storage-engine"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ckampfe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-05T05:10:54.000Z","updated_at":"2024-05-28T01:13:49.939Z","dependencies_parsed_at":"2024-05-05T06:22:48.400Z","dependency_job_id":"56896bc0-6899-42db-a471-cac7454c3476","html_url":"https://github.com/ckampfe/b2","commit_stats":null,"previous_names":["ckampfe/b2"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ckampfe%2Fb2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ckampfe%2Fb2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ckampfe%2Fb2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ckampfe%2Fb2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ckampfe","download_url":"https://codeload.github.com/ckampfe/b2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243532562,"owners_count":20306156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitcask","erlang","kv","riak","rust","storage-engine"],"created_at":"2024-09-24T21:49:57.616Z","updated_at":"2026-01-02T00:17:37.967Z","avatar_url":"https://github.com/ckampfe.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# b2\n\nAn implementation of [Bitcask](https://riak.com/assets/bitcask-intro.pdf) as a Rust library.\n\n## when would something like this make sense?\n\nFrom a user's perspective, you can think of B2 (and Bitcask) as a disk-backed hashmap.\nOperationally, there is more to it than that, but \"hashmap, but on disk\" is a good first approximation of how you use it.\n\nThis kind of storage model might make sense for you if:\n- a simple key/value information model is enough for your domain\n- pure in-memory storage does not work for you; values must be stored on disk\n- you require type flexibility; values can be any types that implement `Serialize` and `DeserializeOwned`\n- read and write latency are important to your domain\n- your keyspace (all keys your database knows about) fits in system memory\n\n## what can it do\n\nThis is the public API:\n\n```rust\npub async fn new(db_directory: \u0026Path, options: Options) -\u003e Result\u003cSelf\u003e\npub async fn get\u003cV: Serialize + DeserializeOwned + Send\u003e(\u0026self, key: \u0026K) -\u003e Result\u003cOption\u003cV\u003e\u003e\npub async fn insert\u003cV: Serialize + DeserializeOwned + Send\u003e(\u0026self, k: K, v: V) -\u003e Result\u003c()\u003e\npub async fn remove(\u0026self, k: K) -\u003e Result\u003c()\u003e\npub async fn keys(\u0026self) -\u003e Vec\u003cK\u003e\npub async fn contains_key(\u0026self, k: \u0026K) -\u003e bool\npub async fn merge(\u0026self) -\u003e Result\u003c()\u003e\npub async fn flush(\u0026self) -\u003e Result\u003c()\u003e\npub fn db_directory(\u0026self) -\u003e \u0026Path\n```\n\nFor a given database, keys must all be the same type (i.e., all `String`, or whatever other type can implement `Serialize` and `DeserializeOwned`). This may be relaxed at some point.\n\nValues can vary arbitrarily, again as long as they can be serialized and deserialized. This means that for values, B2 is effectively dynamically typed/late bound. Values on disk are just bytes, and they are given a type when you insert/get them.\n\nIn terms of concurrency, right now B2 uses a coarse-grained `tokio::sync::RwLock`, so there can be: `(N readers) XOR (1 writer)`. Given Bitcask's model, it should be possible to relax this so that there can be `(N readers) AND (1 writer)`, and I might do that in the future.\n\nBy default B2 flushes every write to disk. This is slow, but leads to predictable read-after-write semantics. You can relax this (and increase write throughput at the expense of read-after-write serializability) by changing an option.\n\nSee the Bitcask paper to understand in more detail why Bitcask's particular conception of a key/value store is unique and interesting and why it might or might not make sense for your requirements.\n\n## is it any good? should I use it?\n\nRight now, probably not! From what I can tell, B2 is API complete with respect to the Bitcask paper. This does not mean it functions correctly. It is undertested. It uses a simple `tokio::sync::RwLock` internally so its concurrency story is weaker than it could be. There are probably other problems with it. Nonetheless, it is a tiny amount of code in comparison to other database systems (\u003c1500 lines), so you can probably actually understand what this does just by reading the source.\n\n## why\n\nI have known about Bitcask for a while, and I wanted to learn it by building a working implementation.\n\n## todo\n\n- [ ] better testing (in general)\n- [ ] better testing (around merging, specifically)\n- [ ] allow concurrent reading and writing (relax RwLock)\n- [ ] clean up merging code\n- [ ] clean up datamodel around records/entrypointers/mergepointers\n- [ ] more research into how async drop interacts with disk writes/buffer flushes\n- [x] investigate a better, less ambiguous tombstone value\n- [x] move more of write_insert and write_delete into Record\n- [ ] improve error contexts reported to callers (e.g. with `snafu` or improving use of `thiserror`)\n- [ ] error handling and reporting in the event of a corrupt record\n- [ ] investigate allowing the access of old values rather than having the keydir refer to only the most recent value\n- [x] investigate relaxing `K` to callsite-level rather than database-level (decision: not right now. this would require either making `K` be `Box\u003cdyn KeydirKey\u003e` or serializing `K` and having every access of the keydir require a serialization, at minimum)\n- [x] file_id to FileId(u32)\n- [x] key_size to KeySize(u16)\n- [x] value_size to ValueSize(u32)\n- [ ] tx_id to `time::Time`?\n- [x] use crc32 instead of blake3\n- [x] make internal write buffer size configurable\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fckampfe%2Fb2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fckampfe%2Fb2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fckampfe%2Fb2/lists"}