{"id":17227018,"url":"https://github.com/dfeneyrou/litecask","last_synced_at":"2026-04-29T08:02:34.593Z","repository":{"id":205183746,"uuid":"713594822","full_name":"dfeneyrou/litecask","owner":"dfeneyrou","description":"A high performance single-header embeddable persistent key-value store with indexing capabilities ","archived":false,"fork":false,"pushed_at":"2025-10-25T16:57:48.000Z","size":350,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-25T18:33:34.176Z","etag":null,"topics":["bitcask","database","embedded","key-value-store","kv-store","single-header"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dfeneyrou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-02T20:58:07.000Z","updated_at":"2025-10-25T16:57:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"3fb4b954-7ca6-425d-b35b-aa487faac5f5","html_url":"https://github.com/dfeneyrou/litecask","commit_stats":null,"previous_names":["dfeneyrou/litecask"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dfeneyrou/litecask","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfeneyrou%2Flitecask","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfeneyrou%2Flitecask/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfeneyrou%2Flitecask/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfeneyrou%2Flitecask/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dfeneyrou","download_url":"https://codeload.github.com/dfeneyrou/litecask/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfeneyrou%2Flitecask/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32416146,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitcask","database","embedded","key-value-store","kv-store","single-header"],"created_at":"2024-10-15T04:17:52.345Z","updated_at":"2026-04-29T08:02:34.576Z","avatar_url":"https://github.com/dfeneyrou.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"![litecasks-logo](https://github.com/dfeneyrou/litecask/blob/main/doc/images/litecask-logo.png)\n\n[![Build and check](https://github.com/dfeneyrou/litecask/actions/workflows/build.yml/badge.svg)](https://github.com/dfeneyrou/litecask/actions/workflows/build.yml)\n\n## Litecask: a C++ single-header persistent key-value store\n - Based on [`Bitcask`](https://riak.com/assets/bitcask-intro.pdf) principles\n - High performance\n   - Insertion rate bottleneck is the disk I/O saturation\n   - Lookup throughput benefits from a scalable **concurrent hashtable** and a **built-in memory cache**\n - **Crash friendliness** because architectured as a log-structured file systems, only the non-disk-flushed data are lost\n - Ability to handle datasets much larger than RAM without degradation: only the keys reside in memory\n - Easy software integration: copying 1 header file is enough, **no external dependencies**\n - Ease of backup and restore: backuping 1 flat directory is enough\n - Support of **indexation using parts of the keys**\n - Support of entry lifetime\n\n\u003cins\u003eLitecask is:\u003c/ins\u003e\n - an efficient embedded database as a **building block** for your application\n - a **lean library** with an opinionated set of features, to maximize control and limit entropy\n   - tracked internal performance (hashtable, allocator, cache, startup time...)\n   - consistency by using ASAN, TSAN, clang-format, clang-tidy, and custom tests based on [`doctest`](https://github.com/doctest/doctest)\n - an enhanced implementation of [`Bitcask`](https://riak.com/assets/bitcask-intro.pdf), additionally featuring:\n   - a built-in cache\n   - indexing capabilities\n - a C++17 single-header file, including:\n   - a [`TLSF`](http://www.gii.upv.es/tlsf) heap-based memory allocator\n   - a concurrent hashtable with optimistic locking (inspired by [`memC3`](https://github.com/efficient/memc3) and [this analysis](https://memcached.org/blog/paper-review-memc3/))\n   - a lock-free RW-lock (aka shared mutex) reducing both false sharing and kernel access\n   - a memory cache with segmented LRU (inspired by [`memcached`](https://memcached.org/blog/modern-lru))\n\n\u003cins\u003eLitecask is not:\u003c/ins\u003e\n - a remote database\n   - it misses some layers: a high performance network part (asio, evpp...) with a client-server communication protocol\n - a billion entries database. Indeed, a fundament of [`Bitcask`](https://riak.com/assets/bitcask-intro.pdf) is to keep the key directory in memory.\n   - 100 millions entries is however in its range, depending on average key size and available RAM\n     - value sizes do not matter as they are not kept in memory, cache excepted\n   - scaling horizontally would imply making the database remote and sharded\n\n## Getting started\n\nA simple example inserting and retrieving a value is shown below:\n```C++\n// example.cpp . Place the file litecask.h is in the same folder\n// Build with: 'c++ --std=c++17 example.cpp -o example' (Linux) or 'cl.exe /std:c++17 /EHsc example.cpp' (Windows)\n\n#include \"litecask.h\"\n\nint main(int argc, char** argv)\n{\n    litecask::Datastore store;\n    store.open(\"/tmp/my_temp_db\");\n\n    // Store an entry\n    std::vector\u003cuint8_t\u003e value{1,2,3,4,5,6,7,8};\n    store.put(\"my key identifier\", value);\n\n    // Retrieve the entry\n    std::vector\u003cuint8_t\u003e retrievedValue;\n    store.get(\"my key identifier\", retrievedValue);\n    assert(retrievedValue==value);\n\n    store.close();\n}\n```\n\n## Benchmarks\n\nPerformance are highly dependent on the hardware (CPU quantity, CPU caches, memory bandwidth, disk speed, OS...).  \nThe results in this section correspond to a laptop with 16 CPUs (i7-11800H @2.30GHz) on Linux, obtained with the built-in benchmarks (matplotlib is required to draw the graphs in the optional last command):\n```sh\nmkdir build\ncd build\ncmake ..\nmake -j $(nproc)\n./bin/litecask_test benchmark -ll\n../ci/benchmark -n\n```\n\n### Access\n\n#### Monothread performance\n\n![litecasks-logo](https://github.com/dfeneyrou/litecask/blob/main/doc/images/litecask_benchmark_throughput_monothread.png)\n\nResult for a 1 million entries database, 8 bytes keys, values in cache and Zipf-1.0 access distribution.\n\nThe memory throughput graph on the right is deduced from the left graph and the size of the value.  \nThe lower rate when writing is due to disk I/O bottleneck, highlighted by the asymptote when value size grows.\n\n#### Multithread performance\n![litecasks-logo](https://github.com/dfeneyrou/litecask/blob/main/doc/images/litecask_benchmark_throughput_multithread.png)\n\nResult for a 1 million entries database, 8 bytes keys, 256 bytes values, values in cache and Zipf-1.0 access distribution.\n\nThe read access scales well with the thread quantity thanks to the concurrent hashtable and shared lock implementations.  \nThe full write does not scale with multi-threading due to the one-writer constraint (log-structured design).\n\n### Other characteristics\n\nMeasures for a 30 million entries datastore with 8 bytes keys and 16 bytes value size:\n| Characteristic       |   Performance   |\n|-------------------|-----------------|\n| Total startup load time   | 3.975 seconds (regardless of value sizes) |\n| Total Used memory         | 2064 MB  (regardless of value sizes) |\n| Startup load rate   | 7.5 million entries / second |\n| RAM / entry         | 61 bytes + the key size + 2 bytes per index (global averaged overhead) |\n| Disk size / entry   | 16 bytes + the key size + 2 bytes per index + the value size |\n\n\u003cdetails\u003e\n\u003csummary\u003eEffect of deferred write\u003c/summary\u003e\n\n\u003cbr/\u003e\nDeferred write is simply writing in an intermediate memory buffer to avoid costly kernel disk write calls. This buffer is flushed on disk only if full or non empty after a configurable timeout.\n\nIn case of process disruption, a drawback is the potential loss of these not-yet-on-disk entries.\n\nConditions: *monothread, key size 8 bytes, value size 8 bytes, write buffer size 100KB*\n| Write kind       |   Performance   |\n|-------------------|-----------------|\n| Deferred sync  | 4.170 Mop/s  |\n| Forced   sync  | 1.139 Mop/s  |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eEffect of value cache\u003c/summary\u003e\n\n\u003cbr/\u003e\nA cache keeps the value of some entries in memory so that their access does not require a costly disk read.\nThe strategy of selecting which value to keep or not is implementation dependent.\n\nConditions:  *monothread, key size 8 bytes, value size 256 bytes, 1 million entries, zipf-1.0 access pattern*\n| Cache size percentage       |   Performance   | Cache hit |\n|-------------------|-----------------|--------------------|\n| 0%   | 2.594 Mop/s  | 0.0%  |\n| 25%   | 6.132 Mop/s  | 96.0%  |\n| 50%   | 6.167 Mop/s  | 96.2%  |\n| 90%   | 6.226 Mop/s  | 96.8%   |\n| 100%   | 7.217 Mop/s  | 100%  |\n\nNote: the cache effect is even bigger effect with multithreaded access.\n\u003c/details\u003e\n\n## Indexation and query\n\n### Principle\n\nBy design and without any upper layer library, a key-value store has limited capabilities to scan and query:\n - Log Structured Hash tables scan by peering at each entry.\n - Log Structured Merge trees internally sort keys, allowing key-range based scans.\n\nLitecask, under the first category, proposes a different approach: **use parts of the key as indexes**.  \nA high level view of this internal behavior could be the usage of a dedicated hash table that takes an array of bytes as input and returns a set of unique keys as output.\n\n### Example of use\n\nLet's consider the following entry with a text-based key:\n```C++\nstore.put(\"UJohn Doe/CUS/TTax document/0001\", value);\n```\n\nThe text key can be visually split into:\n - `UJohn Doe` from byte 0, length 9\n - `CUS` from byte 10, length 3\n - `TTax document` from byte 14, length 13\n\nThese chunks of keys can be used as an index by upgrading the previous insertion command as follows:\n```C++\nstore.put(\"UJohn Doe/CUS/TTax document/0001\", value, { {0,9}, {10, 3}, {14, 13} });\n```\n\nThanks to this lightweight indexes, it is now possible to query for the user `UJohn Doe`, the country `CUS`, or type `TTax document` entries. Or an intersection of these.\n```C++\nstd::vector\u003cstd::vector\u003cuint8_t\u003e\u003e matchingKeys;\n\n// Query for user\nstore.query(\"UJohn Doe\", matchingKeys);\nassert(matchingKeys.size()==1);\n\n// Query for country\nstore.query(\"CUS\", matchingKeys);\nassert(matchingKeys.size()==1);\n\n// Query for user AND country (implicit AND. OR can be performed by additional queries and removing duplicates)\nstore.query({std::string(\"UJohn Doe\"), std::string(\"CUS\")}, matchingKeys);\nassert(matchingKeys.size()==1);\n```\n\nNote: in this example, the prefixes `U`, `C` and `T` prevent mixing \"columns\" in case of same content.  \nThe separating '/' is purely for human readability.\n\n## API\n\n#### Datastore\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eDatastore::Datastore(...)\u003c/code\u003e - Datastore instance creation \u003c/summary\u003e\n\n```C++\nDatastore::Datastore(size_t cacheBytes = 256 * 1024 * 1024);\n```\n\n| Parameter name              |   Description                       |\n|-------------------|-------------------------------------|\n| `cacheBytes`   | Defines the maximum memory usage of the value cache in bytes. Default is 256 MB. \u003cbr/\u003e Once this size is reached, values are evicted to allow insertion of new ones. \u003cbr/\u003e Value cache greatly improves read performance by avoiding disk access. |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eStatus Datastore::open(...)\u003c/code\u003e - Datastore opening \u003c/summary\u003e\n\n```C++\nStatus Datastore::open(std::filesystem::path dbDirectoryPath, bool doCreateIfNotExist = true);\n ```\n\n| Parameter name              |   Description                         |\n|-------------------|-------------------------------------|\n| `dbDirectoryPath`   | The path of the litecask datastore |\n| `doCreateIfNotExist`   | Boolean to decide to create a non existing datastore (default) |\n\u003cbr/\u003e\n\n| Return code             |   Comment                         |\n|-------------------|-------------------------------------|\n| `Status::Ok`   | The datastore was successfully opened |\n| `Status::StoreAlreadyOpen`   | The datastore instance is already in opened state |\n| `Status::CannotOpenStore`   | The provided path does not correspond to a litecask store |\n| `Status::StoreAlreadyInUse` | The datastore files are already in use by another process |\n| `Status::BadDiskAccess` | The datastore cannot be opened due to file access issues |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n \u003csummary\u003e\u003ccode\u003eStatus Datastore::close(...)\u003c/code\u003e - Datastore closing \u003c/summary\u003e\n\n```C++\nStatus Datastore::close();\n ```\n\n| Return code             |   Comment                         |\n|-------------------|-------------------------------------|\n| `Status::Ok`   | The datastore was successfully opened |\n| `Status::StoreNotOpen`   | The datastore instance was not in opened state |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n \u003csummary\u003e\u003ccode\u003evoid Datastore::sync(...)\u003c/code\u003e - Flush the write buffer \u003c/summary\u003e\n\n```C++\nvoid Datastore::sync();\n ```\n\nThis synchronization of the written data to the disk is performed:\n - When this function `sync()` is called explicitely\n - When an entry is inserted with the `forceDiskSync` flag set to `true`\n - When the write buffer is full\n - Automatically with a configurable period\n\nNote: this synchronization is at the application level, protecting against loss when the application crashes.\nIf a sudden shutdown of the machine occurs, the content of non-written OS disk cache may still be lost.\n\n\u003c/details\u003e\n\n#### Put\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eStatus Datastore::put(...)\u003c/code\u003e - Entry insertion \u003c/summary\u003e\n\n```C++\n// Key and value as pointer plus size\nStatus Datastore::put(const void* key, size_t keySize,\n                      const void* value, size_t valueSize,\n                      const std::vector\u003cKeyIndex\u003e\u0026 keyIndexes = {},\n                      uint32_t ttlSec = 0,\n                      bool forceDiskSync = false);\n\n// Variant 1: key as vector\nStatus Datastore::put(const std::vector\u003cuint8_t\u003e\u0026 key,\n                      const void* value, size_t valueSize,\n                      const std::vector\u003cKeyIndex\u003e\u0026 keyIndexes = {},\n                      uint32_t ttlSec = 0,\n                      bool forceDiskSync = false);\n\n// Variant 2: key as string. Note that the null termination is not part of the key\nStatus Datastore::put(const std::string\u0026 key,\n                      const void* value, size_t valueSize,\n                      const std::vector\u003cKeyIndex\u003e\u0026 keyIndexes = {},\n                      uint32_t ttlSec = 0,\n                      bool forceDiskSync = false);\n\n// Variant 3: key as vector and value as vector\nStatus Datastore::put(const std::vector\u003cuint8_t\u003e\u0026 key,\n                      const std::vector\u003cuint8_t\u003e\u0026 value,\n                      const std::vector\u003cKeyIndex\u003e\u0026 keyIndexes = {},\n                      uint32_t ttlSec = 0,\n                      bool forceDiskSync = false);\n\n// Variant 4: key as string and value as vector\nStatus Datastore::put(const std::string\u0026 key,\n                      const std::vector\u003cuint8_t\u003e\u0026 value,\n                      const std::vector\u003cKeyIndex\u003e\u0026 keyIndexes = {},\n                      uint32_t ttlSec = 0,\n                      bool forceDiskSync = false);\n\n// This structure defines a part of the key [start index; size[ to use as an index.\n// An array of key indexes MUST be sorted by increasing startIdx, then size if startIdx are equal.\n// Except this constraint on sorting, key indexes can overlap.\nstruct KeyIndex {\n    uint8_t startIdx; // Relative to the start of the key\n    uint8_t size;\n};\n```\n\n| Parameter name |   Description                      |\n|---------------|-------------------------------------|\n| `key`         | The pointer or the structure of the key |\n| `keySize`     | In case of key as a pointer, the size of the key in bytes. Maximum accepted size is 65534 bytes |\n| `value`       | The pointer or the structure of the value |\n| `valueSize`   | In case of value as a pointer, the size of the value in bytes |\n| `keyIndexes`  | An array of KeyIndex structures which defines the parts of the key to use as an index. Default is no index |\n| `ttlSec`      | The 'Time To Live' of the entry in second. Default is zero which means no lifetime limit |\n| `forceDiskSync` | Boolean to force the write on disk of the full write buffer after this entry. Default is false. \u003cbr/\u003e Note that it covers just the application cache, not the OS one. |\n\n\u003cbr/\u003e\n\n| Return code       |   Comment                         |\n|-------------------|-------------------------------------|\n| `Status::Ok`   | The entry was successfully stored |\n| `Status::BadKeySize`   | The key size is bigger than 65535 |\n| `Status::InconsistentKeyIndex` | The quantity of key index is bigger than 64, or the index address bytes outside of the key |\n| `Status::UnorderedKeyIndex` | The index are not in order |\n| `Status::BadValueSize` | The value size is bigger than 4294901760 bytes |\n| `Status::StoreNotOpen` | The datastore is not open |\n| `Status::OutOfMemory` | The system is running our of memory |\n\n\u003c/details\u003e\n\n#### Remove\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eStatus Datastore::remove(...)\u003c/code\u003e - Entry removal \u003c/summary\u003e\n\n```C++\n// Key pointer plus size\nStatus Datastore::remove(const void* key, size_t keySize,\n                         bool forceDiskSync = false);\n\n// Variant 1: key as vector\nStatus Datastore::remove(const std::vector\u003cuint8_t\u003e\u0026 key,\n                         bool forceDiskSync = false);\n\n// Variant 2: key as string\nStatus Datastore::remove(const std::string\u0026 key,\n                         bool forceDiskSync = false);\n ```\n\n| Parameter name    |   Description                         |\n|-------------------|-------------------------------------|\n| `key`       | The pointer or the structure of the key |\n| `keySize`   | In case of key as a pointer, the size of the key in bytes. Maximum accepted size is 65534 bytes |\n| `forceDiskSync` | Boolean to force the write on disk of the full write buffer after this entry removal. Default is false. \u003cbr/\u003e Note that it covers just the application cache, not the OS one. |\n\n\u003cbr/\u003e\n\n| Return code       |   Comment                         |\n|-------------------|-------------------------------------|\n| `Status::Ok`   | The entry was successfully stored |\n| `Status::BadKeySize`   | The key size is bigger than 65535 |\n| `Status::StoreNotOpen` | The datastore is not open |\n| `Status::EntryNotFound` | The key was not found in the datastore |\n\n\u003c/details\u003e\n\n#### Get\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eStatus Datastore::get(...)\u003c/code\u003e - Entry retrieval \u003c/summary\u003e\n\n```C++\n// Key pointer and size\nStatus Datastore::get(const void* key, size_t keySize,\n                      std::vector\u003cuint8_t\u003e\u0026 value);\n\n// Variant 1: key as vector\nStatus Datastore::get(const std::vector\u003cuint8_t\u003e\u0026 key,\n                      std::vector\u003cuint8_t\u003e\u0026 value);\n\n// Variant 2: key as string\nStatus Datastore::get(const std::string\u0026 key,\n                      std::vector\u003cuint8_t\u003e\u0026 value);\n ```\n\n| Parameter name    |   Description                         |\n|-------------------|-------------------------------------|\n| `key`       | The pointer or the structure of the key |\n| `keySize`   | In case of key as a pointer, the size of the key in bytes. Maximum accepted size is 65534 bytes |\n| `value` | The output array structure for the retrieved value |\n\n\u003cbr/\u003e\n\n| Return code             |   Comment                         |\n|-------------------|-------------------------------------|\n| `Status::Ok`   | The entry was successfully retrieved |\n| `Status::BadKeySize`   | The key size is bigger than 65535 |\n| `Status::StoreNotOpen` | The datastore is not open |\n| `Status::EntryNotFound` | The key was not found in the datastore |\n| `Status::EntryCorrupted` | The entry was retrieved from disk and the checksum is incorrect (i.e. corrupted entry) |\n\n\u003c/details\u003e\n\n#### Query\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eStatus Datastore::query(...)\u003c/code\u003e - Entries query based on index \u003c/summary\u003e\n\n```C++\n// Query variant 1: single key part as vector\nStatus query(const std::vector\u003cuint8_t\u003e\u0026 keyPart,\n             std::vector\u003cstd::vector\u003cuint8_t\u003e\u003e\u0026 matchingKeys);\n\n// Query variant 2: single key part as string\nStatus query(const std::string\u0026 keyPart,\n             std::vector\u003cstd::vector\u003cuint8_t\u003e\u003e\u0026 matchingKeys);\n\n// Query variant 3: multiple key part as vector of vector\nStatus query(const std::vector\u003cstd::vector\u003cuint8_t\u003e\u003e\u0026 keyParts,\n             std::vector\u003cstd::vector\u003cuint8_t\u003e\u003e\u0026 matchingKeys);\n\n// Query variant 4: key part as vector of string\nStatus query(const std::vector\u003cstd::string\u003e\u0026 keyParts,\n             std::vector\u003cstd::vector\u003cuint8_t\u003e\u003e\u0026 matchingKeys);\n\n// Query variant 5: single key part as vector, with arena allocator for output array of keys\nStatus query(const std::vector\u003cuint8_t\u003e\u0026 keyPart,\n             std::vector\u003cQueryResult\u003e\u0026 arenaMatchingKeys,\n             ArenaAllocator\u0026 allocator);\n\n// Query variant 6: single key part as string, with arena allocator for output array of keys\nStatus query(const std::string\u0026 keyPart,\n             std::vector\u003cQueryResult\u003e\u0026 arenaMatchingKeys,\n             ArenaAllocator\u0026 allocator);\n\n// Query variant 7: multiple key parts as vector, with arena allocator for output array of keys\nStatus query(const std::vector\u003cstd::vector\u003cuint8_t\u003e\u003e\u0026 keyParts,\n             std::vector\u003cQueryResult\u003e\u0026 arenaMatchingKeys,\n             ArenaAllocator\u0026 allocator);\n\n// Query variant 8: multiple key parts as string, with arena allocator for output array of keys\nStatus query(const std::vector\u003cstd::string\u003e\u0026 keyParts,\n             std::vector\u003cQueryResult\u003e\u0026 arenaMatchingKeys,\n             ArenaAllocator\u0026 allocator);\n\n// This structure defines a 'query result' by providing a memory span, when using an arena allocator API.\nstruct QueryResult {\n    uint8_t* ptr;\n    uint16_t size;\n};\n\n// 'minAllocChunkBytes' is the performed allocation size if the requested amount is smaller than this value.\n// For efficiency reasons, it should to be several orders of magnitude larger than the typical allocation size\nclass ArenaAllocator {\n  ArenaAllocator(size_t minAllocChunkBytes = 1024 * 1024);\n  uint8_t* allocate(size_t bytes);  // Used by the query API\n  size_t getAllocatedBytes() const;\n  void reset(); // Free all allocations, but keeps the memory chunks internally. Invalidates QueryResult content.\n};\n ```\n\n| Parameter name    |   Description                         |\n|-------------------|-------------------------------------|\n| `keyPart`       | A string or array of bytes to use as a single index to query |\n| `keyParts`   | An array of string or array of bytes to use as a multiple index to query. \u003cbr/\u003e The resulting keys shall match **all** indexes ('AND' operation) |\n| `matchingKeys` | The output array of array of bytes holding the matching keys |\n| `allocator` | An arena allocator which improves efficiency of large queries by batching allocations \u003cbr/\u003e Used only with the arena allocator APIs and in association with the QueryResult structure |\n\n\u003cbr/\u003e\n\n| Return code       |   Comment                         |\n|-------------------|-------------------------------------|\n| `Status::Ok`   | The entry was successfully retrieved |\n| `Status::BadKeySize`   | One of the provided key chunk has a size bigger than 65535 |\n\n\u003c/details\u003e\n\n#### Configuration\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eStatus Datastore::setConfig(...)\u003c/code\u003e - Set datastore configuration \u003c/summary\u003e\n\n```C++\nStatus Datastore::setConfig(const Config\u0026 config);\n\nstruct Config {\n    // General store parameters\n    // ========================\n\n    //   'dataFileMaxBytes' defines the maximum byte size of a data file before switching to a new one.\n    //   It implicitely limits the maximum size of the database as there can be at most 65535 data files.\n    //   Bigger data files make the total size bigger (up to 65535*4GB = 281 TiB)\n    //   Smaller data files make the merge time shorter\n    uint32_t dataFileMaxBytes = 100'000'000;\n\n    //   'mergeCyclePeriodMs' defines the merge period for the database, in milliseconds.\n    //   This merge cycle first checks if the 'merge' process is needed. If positive, the eligible data files\n    //   are selected and compacted into defragmented and smaller files which eventually replace the old ones.\n    uint32_t mergeCyclePeriodMs = 60'000;\n\n    //   'upkeepCyclePeriodMs' defines the upkeep period for the internal structures, in milliseconds.\n    //   It copes mainly with the cache eviction and the KeyDir resizing. This latter does not wait the end of\n    //   the cycle and start working immediately\n    uint32_t upkeepCyclePeriodMs = 1000;\n\n    //   'writeBufferFlushPeriodMs' defines the maximum time for the write buffer to be flushed on disk.\n    //   This limits the amount of data that can be lost in case of sudden interruption of the program, while\n    //   avoiding costly disk access at each write operation.\n    //   Note that the effective period is the maximum between upkeepCyclePeriodMs and writeBufferFlushPeriodMs.\n    //   Note also that the \"put\" API offers to force-flush directly on disk (with a performance cost).\n    uint32_t writeBufferFlushPeriodMs = 5000;\n\n    //   'upkeepKeyDirBatchSize' defines the quantity of KeyDir entries to update in a row.\n    //   This includes both KeyDir resizing and data file compaction mechanisms.\n    //   A higher quantity of entries will make the transition finish earlier, at the price of higher spikes of\n    //   latency on entry write or update. A too low value could paradoxically induce a forced resizing of the\n    //   remaining part of the KeyDir if the next resize arrives before the end of the previous one.\n    uint32_t upkeepKeyDirBatchSize = 100'000;\n\n    //   'upkeepValueCacheBatchSize' defines the quantity of cached value entries to update in a row in the LRU.\n    //   A higher quantity of entries will make the background task finish earlier, at the price of higher spikes of\n    //   latency on entry write or update. A too low value could paradoxically induce a forced task to clean and\n    //   evict cached values at inserting time.\n    uint32_t upkeepValueCacheBatchSize = 10000;\n\n    //   'valueCacheTargetMemoryLoadPercentage' configures the target load for the cache, so that the remaining free\n    //   space ensures a performant insertion in the cache. The eviction required to meet this target load is deferred\n    //   in a background task. Too low a value wastes cache memory, too high a value prevent the insertion a new entry\n    //   in the cache because of lack of free space.\n    uint32_t valueCacheTargetMemoryLoadPercentage = 90;\n\n    // Merge Triggers\n    // ==============\n    // They determine the conditions under which merging will be invoked. They fall into two basic categories:\n\n    //   'mergeTriggerDataFileFragmentationPercentage' describes the percentage of dead keys to total keys in\n    //   a file that will trigger merging.\n    //   Increasing this value will cause merging to occur less often.\n    uint32_t mergeTriggerDataFileFragmentationPercentage = 60;\n\n    //   'mergeTriggerDataFileDeadByteThreshold' describes how much data stored for dead keys in a single file triggers\n    //   merging. Increasing the value causes merging to occur less often, whereas decreasing the value causes merging\n    //   to happen more often.\n    uint32_t mergeTriggerDataFileDeadByteThreshold = 51'200'000;\n\n    // Merge data file selection\n    // =========================\n    // These parameters determine which files will be selected for inclusion in a merge operation.\n\n    //  'mergeSelectDataFileFragmentationPercentage' describes which percentage of dead keys to total keys in a file\n    //  causes it to be included in the merge.\n    //  Note: this value shall be less than the corresponding trigger threshold.\n    uint32_t mergeSelectDataFileFragmentationPercentage = 40;\n\n    //  'mergeSelectDataFileDeadByteThreshold' describes which ratio the minimum amount of data occupied by dead keys\n    //  in a file to cause it to be included in the merge.\n    //  Note: this value shall be less than the corresponding trigger threshold.\n    uint32_t mergeSelectDataFileDeadByteThreshold = 12'800'000;\n\n    //  'mergeSelectDataFileSmallSizeTheshold' describes the minimum size below which a file is included in the merge.\n    //  The purpose is to reduce the quantity of small data files to keep open file quantity low.\n    uint32_t mergeSelectDataFileSmallSizeTheshold = 10'000'000;\n};\n\n ```\n\n| Parameter name    |   Description             |\n|-------------------|-------------------------------------|\n| `config`       | The configuration structure |\n\n\u003cbr/\u003e\n\n| Return code             |   Comment                      |\n|-------------------|-------------------------------------|\n| `Status::Ok`      | The entry was successfully retrieved |\n| `Status::BadParameterValue`   | A parameter has a bad value. Check the logs for details (warning) |\n| `Status::InconsistentParameterValues` | Some parameter values are incompatible. Check the logs for details (warning) |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eConfig Datastore::getConfig(...)\u003c/code\u003e - Get datastore configuration \u003c/summary\u003e\n\n```C++\nConfig Datastore::getConfig() const;\n ```\n\n| Return value      |   Comment                           |\n|-------------------|-------------------------------------|\n| `config`          | The current configuration structure |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eStatus Datastore::setWriteBufferBytes(...)\u003c/code\u003e - Write buffer configuration \u003c/summary\u003e\n\n```C++\nStatus Datastore::setWriteBufferBytes(uint32_t writeBufferBytes);\n ```\n\n| Parameter name              |   Description             |\n|-------------------|-------------------------------------|\n| `writeBufferBytes` | The size of the write buffer in bytes. Default is 100 KB. \u003cbr/\u003e Note: above tens of typical entry size, increasing further this value should not have a big impact on performances |\n\n\u003cbr/\u003e\n\n| Return code             |   Comment                      |\n|-------------------|-------------------------------------|\n| `Status::Ok`      | The entry was successfully retrieved |\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003ebool Datastore::setLogLevel(...)\u003c/code\u003e - Set the logging level \u003c/summary\u003e\n\n```C++\nbool Datastore::setLogLevel(LogLevel level);\n\nenum class LogLevel { Debug = 0, Info = 1, Warn = 2, Error = 3, Fatal = 4, None = 5 };\n```\n\n| Parameter name              |   Description             |\n|-------------------|-------------------------------------|\n| `level` | The minimum log level. Logs with a strictly lower level are filtered out. Default is `LogLevel::Info` |\n\n\u003cbr/\u003e\n\n| Return code             |   Comment                      |\n|-------------------|-------------------------------------|\n| `boolean`      | True if the provided log level is within the defined range |\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003evoid Datastore::setLogHandler(...)\u003c/code\u003e - Override the default log handler \u003c/summary\u003e\n\n```C++\nvoid Datastore::setLogHandler(const std::function\u003cvoid(LogLevel, const char*, bool)\u003e\u0026 logHandler);\n```\n\n| Parameter name              |   Description             |\n|-------------------|-------------------------------------|\n| `logHandler` | The new log handler function. The provided parameters are:\u003cbr/\u003e - the log level (after filtering) \u003cbr/\u003e - the message string \u003cbr/\u003e - a boolean notifying the termination of the logging process if `true` |\n\nThe default logger simply writes in rolling files `litecask\u003cN\u003e.log` at the root of the datastore folder.\n\u003cbr/\u003e\n\n\u003c/details\u003e\n\n#### Compaction control\n\nTo remove dead entries, a merge/compaction of the data files is performed automatically in background.\nThe API in this section is not required for proper function of the datastore.  \nIt is however possible to force manually such process to target for instance a period where the load is low.  \nSuch compaction is not systematic nor global, please refer to the configuration structure for further details.  \n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003ebool Datastore::requestMerge(...)\u003c/code\u003e - Explicit merge/compaction request \u003c/summary\u003e\n\n```C++\nbool Datastore::requestMerge();\n```\n\n| Return code       |   Comment                      |\n|-------------------|--------------------------------|\n| `boolean`      |  Returns `true` if the request has been properly processed. \u003cbr/\u003e If a merge/compaction process is on-going, the returned value is `false` |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003ebool Datastore::isMergeOnGoing(...)\u003c/code\u003e - Get state of the  merge/compaction request \u003c/summary\u003e\n\n```C++\nbool Datastore::isMergeOnGoing() const;\n```\n\n| Return code       |   Comment                      |\n|-------------------|--------------------------------|\n| `boolean`      |  Returns `true` if a merge/compaction is on-going |\n\n\u003c/details\u003e\n\n\n#### Observability\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003euint64_t Datastore::getEstimatedUsedMemoryBytes(...)\u003c/code\u003e - Get datastore global used memory \u003c/summary\u003e\n\n```C++\nuint64_t Datastore::getEstimatedUsedMemoryBytes(bool withCache = false) const;\n```\n\n| Parameter name              |   Description             |\n|-------------------|-------------------------------------|\n| `withCache`       | Boolean to select the inclusion of the value cache used memory. Default is `false` |\n\n\u003cbr/\u003e\n\n| Return value       |   Comment                      |\n|-------------------|--------------------------------|\n| `uint64_t`      |  The estimated byte quantity used by the datastore |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003euint64_t Datastore::getValueCacheMaxAllocatableBytes(...)\u003c/code\u003e - Get configured value cache size \u003c/summary\u003e\n\n```C++\nuint64_t Datastore::getValueCacheMaxAllocatableBytes() const;\n```\n\n| Return value       |   Comment                      |\n|-------------------|--------------------------------|\n| `uint64_t`      |  Returns the configured value cache size in bytes |\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003euint64_t Datastore::getValueCacheAllocatedBytes(...)\u003c/code\u003e - Get used value cache size \u003c/summary\u003e\n\n```C++\nuint64_t Datastore::getValueCacheAllocatedBytes() const;\n```\n\n| Return value       |   Comment                      |\n|-------------------|--------------------------------|\n| `uint64_t`      |  Returns the currently allocated value cache size in bytes |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003econst DatastoreCounters\u0026 Datastore::getCounters(...)\u003c/code\u003e - Get datastore counters \u003c/summary\u003e\n\n```C++\nconst DatastoreCounters\u0026 Datastore::getCounters() const;\n\n// All fields are monotonic counters\nstruct DatastoreCounters {\n    // API calls\n    std::atomic\u003cuint64_t\u003e openCallQty;\n    std::atomic\u003cuint64_t\u003e openCallFailedQty;\n    std::atomic\u003cuint64_t\u003e closeCallQty;\n    std::atomic\u003cuint64_t\u003e closeCallFailedQty;\n    std::atomic\u003cuint64_t\u003e putCallQty;\n    std::atomic\u003cuint64_t\u003e putCallFailedQty;\n    std::atomic\u003cuint64_t\u003e removeCallQty;\n    std::atomic\u003cuint64_t\u003e removeCallNotFoundQty;\n    std::atomic\u003cuint64_t\u003e removeCallFailedQty;\n    std::atomic\u003cuint64_t\u003e getCallQty;\n    std::atomic\u003cuint64_t\u003e getCallNotFoundQty;\n    std::atomic\u003cuint64_t\u003e getCallCorruptedQty;\n    std::atomic\u003cuint64_t\u003e getCallFailedQty;\n    std::atomic\u003cuint64_t\u003e getWriteBufferHitQty;\n    std::atomic\u003cuint64_t\u003e getCacheHitQty;\n    std::atomic\u003cuint64_t\u003e queryCallQty;\n    std::atomic\u003cuint64_t\u003e queryCallFailedQty;\n    // Data files\n    std::atomic\u003cuint64_t\u003e dataFileCreationQty;\n    std::atomic\u003cuint64_t\u003e dataFileMaxQty;\n    std::atomic\u003cuint64_t\u003e activeDataFileSwitchQty;\n    // Index\n    std::atomic\u003cuint64_t\u003e indexArrayCleaningQty;\n    std::atomic\u003cuint64_t\u003e indexArrayCleanedEntries;\n    // Maintenance (merge / compaction)\n    std::atomic\u003cuint64_t\u003e mergeCycleQty;\n    std::atomic\u003cuint64_t\u003e mergeCycleWithMergeQty;\n    std::atomic\u003cuint64_t\u003e mergeGainedDataFileQty;\n    std::atomic\u003cuint64_t\u003e mergeGainedBytes;\n    std::atomic\u003cuint64_t\u003e hintFileCreatedQty;\n};\n```\n\n| Return value         |   Comment                      |\n|----------------------|--------------------------------|\n| `DatastoreCounters`  |  A constant reference on the internal datastore counters structure |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003econst ValueCacheCounters\u0026 Datastore::getValueCacheCounters(...)\u003c/code\u003e - Get value cache counters \u003c/summary\u003e\n\n```C++\nconst ValueCacheCounters\u0026 Datastore::getValueCacheCounters() const;\n\n// All fields are monotonic counters, except 'currentInCacheValueQty'\nstruct ValueCacheCounters {\n    std::atomic\u003cuint64_t\u003e insertCallQty;\n    std::atomic\u003cuint64_t\u003e getCallQty;\n    std::atomic\u003cuint64_t\u003e removeCallQty;\n    std::atomic\u003cuint32_t\u003e currentInCacheValueQty; // Not a monotonic counter\n    std::atomic\u003cuint64_t\u003e hitQty;\n    std::atomic\u003cuint64_t\u003e missQty;\n    std::atomic\u003cuint64_t\u003e evictedQty;\n};\n```\n\n| Return value         |   Comment                      |\n|----------------------|--------------------------------|\n| `ValueCacheCounters`  |  A constant reference on the internal value cache counters structure |\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003eDataFileStats Datastore::getFileStats(...)\u003c/code\u003e - Get data file statistics \u003c/summary\u003e\n\n```C++\nDataFileStats Datastore::getFileStats() const\n\n// Aggregated statistics over all data files\nstruct DataFileStats {\n    uint64_t fileQty;\n    uint64_t entries; // Includes all entries, including tomb and dead\n    uint64_t entryBytes;\n    uint64_t tombBytes;\n    uint64_t tombEntries; // A 'tomb' entry marks an entry deletion\n    uint64_t deadBytes;\n    uint64_t deadEntries; // An obsolete entry superseded by a newer one\n};\n```\n\n| Return value         |   Comment                      |\n|----------------------|--------------------------------|\n| `DataFileStats`      |  A data file statistics structure |\n\n\u003c/details\u003e\n\n## Misc\n\n### Support\n\nSupported OS:\n - Linux\n - Windows\n\nNote: in our tests, performance on Windows are lower than on Linux.\n\n### Limits\n\n| Description       |   Limit                |\n|-------------------|------------------------|\n| Maximum key size | 65534 bytes |\n| Maximum entry qty | System memory dependent \u003cbr/\u003e Approximate cost per entry is: key size + 60 bytes |\n| Maximum value size | 4294901760 (0xFFFF0000) bytes |\n| Maximum datastore size | File system dependent \u003cbr/\u003e - Data file handles: maximum 65535 data files \u003cbr/\u003e - Disk space: each data file can be up to 4 GB, for a total of 256 TB |\n| Maximum index quantity per entry | 64 |\n| Indexable part of the key | First 256 bytes (storage efficiency reasons) |\n\nAlso this project is young, feedback is welcome!\n\n### License\n\nLitecask source code is available under the [MIT license](https://github.com/dfeneyrou/litecask/blob/main/LICENSE)\n\nAssociated components:\n - Hash function: [`Wyhash`](https://github.com/wangyi-fudan/wyhash)\n   - Selected for its [good non-cryptographic properties, speed and small code size](https://github.com/rurban/smhasher#summary)\n   - Released in the [public domain](http://unlicense.org/)\n - Test framework: [`doctest`](https://github.com/doctest/doctest)\n   - Released under the [MIT license](https://github.com/doctest/doctest/blob/master/LICENSE.txt)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdfeneyrou%2Flitecask","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdfeneyrou%2Flitecask","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdfeneyrou%2Flitecask/lists"}