{"id":19783650,"url":"https://github.com/intersectmbo/lsm-tree","last_synced_at":"2025-04-30T22:31:39.199Z","repository":{"id":189925984,"uuid":"681572050","full_name":"IntersectMBO/lsm-tree","owner":"IntersectMBO","description":"A Haskell library for on-disk tables based on LSM-Trees","archived":false,"fork":false,"pushed_at":"2024-10-29T12:11:09.000Z","size":2605,"stargazers_count":27,"open_issues_count":31,"forks_count":7,"subscribers_count":13,"default_branch":"main","last_synced_at":"2024-10-29T12:11:39.556Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IntersectMBO.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-22T09:51:50.000Z","updated_at":"2024-10-25T12:29:57.000Z","dependencies_parsed_at":"2023-08-22T13:05:54.166Z","dependency_job_id":"9e94ffec-bd00-4714-846c-4cad65b9b510","html_url":"https://github.com/IntersectMBO/lsm-tree","commit_stats":null,"previous_names":["input-output-hk/lsm-tree"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntersectMBO%2Flsm-tree","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntersectMBO%2Flsm-tree/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntersectMBO%2Flsm-tree/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntersectMBO%2Flsm-tree/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IntersectMBO","download_url":"https://codeload.github.com/IntersectMBO/lsm-tree/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224225212,"owners_count":17276435,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T06:08:55.023Z","updated_at":"2025-04-30T22:31:39.190Z","avatar_url":"https://github.com/IntersectMBO.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- WARNING: Do not edit this file. This file is generated by `scripts/build-readme.hs`. --\u003e\n\n# lsm-tree\n\n[![Cardano Handbook](https://img.shields.io/badge/policy-Cardano%20Engineering%20Handbook-informational)](https://input-output-hk.github.io/cardano-engineering-handbook)\n[![Build](https://img.shields.io/github/actions/workflow/status/IntersectMBO/lsm-tree/ci.yml?label=Build)](https://github.com/IntersectMBO/lsm-tree/actions/workflows/ci.yml)\n[![Haddocks](https://img.shields.io/badge/documentation-Haddocks-purple)](https://IntersectMBO.github.io/lsm-tree/)\n\n\u003e :warning: **This library is in active development**: there is currently no release schedule!\n\nThis package is developed by Well-Typed LLP on behalf of Input Output Global, Inc. (IOG) and INTERSECT.\nThe main contributors are Duncan Coutts, Joris Dral, Matthias Heinzel, Wolfgang Jeltsch, Wen Kokke, and Alex Washburn.\n\n## Description\n\nThis package contains an efficient implementation of on-disk key–value\nstorage, implemented as a log-structured merge-tree or LSM-tree. An\nLSM-tree is a data structure for key–value mappings, similar to\n`Data.Map`, but optimized for large tables with a high insertion volume.\nIt has support for:\n\n- Basic key–value operations, such as lookup, insert, and delete.\n\n- Range lookups, which efficiently retrieve the values for all keys in a\n  given range.\n\n- Monoidal upserts which combine the stored and new values.\n\n- BLOB storage which assocates a large auxiliary BLOB with a key.\n\n- Durable on-disk persistence and rollback via named snapshots.\n\n- Cheap table duplication where all duplicates can be independently\n  accessed and modified.\n\n- High-performance lookups on SSDs using I/O batching and parallelism.\n\nThis package exports two modules:\n\n- `Database.LSMTree.Simple`\n\n  This module exports a simplified API which picks sensible defaults for\n  a number of configuration parameters.\n\n  It does not support upserts or BLOBs, due to their unintuitive\n  interaction, see [Upsert and BLOB](#upsertandblob \"#upsertandblob\").\n\n  If you are looking at this package for the first time, it is strongly\n  recommended that you start by reading this module.\n\n- `Database.LSMTree`\n\n  This module exports the full API.\n\n### Upsert and BLOB \u003cspan id=\"upsertandblob\" class=\"anchor\"\u003e\u003c/span\u003e\n\nThe interaction between upserts and BLOBs is unintuitive. A upsert\nupdates the value associated with the key by combining the new and old\nvalues with a user-specified function. However, any BLOB associated with\nthe key is simply deleted.\n\n### Portability \u003cspan id=\"portability\" class=\"anchor\"\u003e\u003c/span\u003e\n\n- This package only supports 64-bit, little-endian systems.\n\n- On Windows, the package has only been tested with NTFS filesystems.\n\n- On Linux, executables using this package, including test and benchmark\n  suites, must be compiled with the\n  [`-threaded`](https://downloads.haskell.org/ghc/latest/docs/users_guide/phases.html#ghc-flag-threaded \"https://downloads.haskell.org/ghc/latest/docs/users_guide/phases.html#ghc-flag-threaded\")\n  RTS option enabled.\n\n### Concurrency \u003cspan id=\"concurrency\" class=\"anchor\"\u003e\u003c/span\u003e\n\nLSM-trees can be used concurrently, but with a few restrictions:\n\n- Each session locks its session directory. This means that a database\n  cannot be accessed from different processes at the same time.\n\n- Tables can be used concurrently and concurrent use of read operations\n  such as lookups is determinstic. However, concurrent use of write\n  operations such as insert or delete with any other operation results\n  in a race condition.\n\n### Performance \u003cspan id=\"performance\" class=\"anchor\"\u003e\u003c/span\u003e\n\nThe worst-case behaviour of the library is described using [big-O\nnotation](http://en.wikipedia.org/wiki/Big_O_notation \"http://en.wikipedia.org/wiki/Big_O_notation\").\nThe documentation provides two measures of complexity:\n\n- The time complexity of operations is described in terms of the number\n  of disk I/O operations and referred to as the disk I/O complexity. In\n  practice, the time of the operations on LSM-trees is dominated by the\n  number of disk I/O actions.\n\n- The space complexity of tables is described in terms of the in-memory\n  size of an LSM-tree table. Both the in-memory and on-disk size of an\n  LSM-tree table scale linearly with the number of physical entries.\n  However, the in-memory size of an LSM-tree table is smaller than its\n  on-disk size by a significant constant. This is discussed in detail\n  below, under [In-memory size of\n  tables](#performance_size \"#performance_size\").\n\nThe complexities are described in terms of the following variables and\nconstants:\n\n- The variable *n* refers to the number of *physical* table entries. A\n  *physical* table entry is any key–operation pair, e.g., `Insert k v`\n  or `Delete k`, whereas a *logical* table entry is determined by all\n  physical entries with the same key. If the variable *n* is used to\n  describe the complexity of an operation that involves multiple tables,\n  it refers to the sum of all table entries.\n\n- The variable *o* refers to the number of open tables and cursors in\n  the session.\n\n- The variable *s* refers to the number of snapshots in the session.\n\n- The variable *b* usually refers to the size of a batch of\n  inputs/outputs. Its precise meaning is explained for each occurrence.\n\n- The constant *B* refers to the size of the write buffer, which is a\n  configuration parameter.\n\n- The constant *T* refers to the size ratio of the table, which is a\n  configuration parameter.\n\n- The constant *P* refers to the the average number of key–value pairs\n  that fit in a page of memory.\n\n#### Disk I/O cost of operations \u003cspan id=\"performance_time\" class=\"anchor\"\u003e\u003c/span\u003e\n\nThe following table summarises the cost of the operations on LSM-trees\nmeasured in the number of disk I/O operations. If the cost depends on\nthe merge policy, the table contains one entry for each merge policy.\nOtherwise, the merge policy is listed as N/A.\n\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth\u003eResource\u003c/th\u003e\n\u003cth\u003eOperation\u003c/th\u003e\n\u003cth\u003eMerge policy\u003c/th\u003e\n\u003cth\u003eCost in disk I/O operations\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd\u003eSession\u003c/td\u003e\n\u003ctd\u003eCreate/Open\u003c/td\u003e\n\u003ctd\u003eN/A\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e\u003cem\u003eO\u003c/em\u003e(1)\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eClose\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003eMergePolicyLazyLevelling\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(o \\: T \\: \\log_T\n\\frac{n}{B})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eTable\u003c/td\u003e\n\u003ctd\u003eCreate\u003c/td\u003e\n\u003ctd\u003eN/A\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e\u003cem\u003eO\u003c/em\u003e(1)\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eClose\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003eMergePolicyLazyLevelling\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(T \\: \\log_T \\frac{n}{B})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eLookup\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003eMergePolicyLazyLevelling\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(T \\: \\log_T \\frac{n}{B})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eRange Lookup\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003eMergePolicyLazyLevelling\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(T \\: \\log_T \\frac{n}{B} +\n\\frac{b}{P})$\u003c/span\u003e\n*\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eInsert/Delete/Update\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003eMergePolicyLazyLevelling\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(\\frac{1}{P} \\: \\log_T\n\\frac{n}{B})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eDuplicate\u003c/td\u003e\n\u003ctd\u003eN/A\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e\u003cem\u003eO\u003c/em\u003e(0)\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eUnion\u003c/td\u003e\n\u003ctd\u003eN/A\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(\\frac{n}{P})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eSnapshot\u003c/td\u003e\n\u003ctd\u003eSave\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003eMergePolicyLazyLevelling\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(T \\: \\log_T \\frac{n}{B})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eOpen\u003c/td\u003e\n\u003ctd\u003eN/A\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(\\frac{n}{P})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eDelete\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003eMergePolicyLazyLevelling\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(T \\: \\log_T \\frac{n}{B})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eList\u003c/td\u003e\n\u003ctd\u003eN/A\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e\u003cem\u003eO\u003c/em\u003e(\u003cem\u003es\u003c/em\u003e)\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eCursor\u003c/td\u003e\n\u003ctd\u003eCreate\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003eMergePolicyLazyLevelling\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(T \\: \\log_T \\frac{n}{B})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eClose\u003c/td\u003e\n\u003ctd\u003e\u003ccode\u003eMergePolicyLazyLevelling\u003c/code\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(T \\: \\log_T \\frac{n}{B})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003c/td\u003e\n\u003ctd\u003eRead next entry\u003c/td\u003e\n\u003ctd\u003eN/A\u003c/td\u003e\n\u003ctd\u003e\u003cspan class=\"math inline\"\u003e$O(\\frac{1}{P})$\u003c/span\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n(\\*The variable *b* refers to the number of entries retrieved by the\nrange lookup.)\n\nTODO: Document the average-case behaviour of lookups.\n\n#### In-memory size of tables \u003cspan id=\"performance_size\" class=\"anchor\"\u003e\u003c/span\u003e\n\nThe in-memory size of an LSM-tree is described in terms of the variable\n*n*, which refers to the number of *physical* database entries. A\n*physical* database entry is any key–operation pair, e.g., `Insert k v`\nor `Delete k`, whereas a *logical* database entry is determined by all\nphysical entries with the same key.\n\nThe worst-case in-memory size of an LSM-tree is *O*(*n*).\n\n- The worst-case in-memory size of the write buffer is *O*(*B*).\n\n  The maximum size of the write buffer on the write buffer allocation\n  strategy, which is determined by the `confWriteBufferAlloc` field of\n  `TableConfig`. Regardless of write buffer allocation strategy, the\n  size of the write buffer may never exceed 4GiB.\n\n  `AllocNumEntries maxEntries`  \n  The maximum size of the write buffer is the maximum number of entries\n  multiplied by the average size of a key–operation pair.\n\n- The worst-case in-memory size of the Bloom filters is *O*(*n*).\n\n  The total in-memory size of all Bloom filters is the number of bits\n  per physical entry multiplied by the number of physical entries. The\n  required number of bits per physical entry is determined by the Bloom\n  filter allocation strategy, which is determined by the\n  `confBloomFilterAlloc` field of `TableConfig`.\n\n  `AllocFixed bitsPerPhysicalEntry`  \n  The number of bits per physical entry is specified as\n  `bitsPerPhysicalEntry`.\n\n  `AllocRequestFPR requestedFPR`  \n  The number of bits per physical entry is determined by the requested\n  false-positive rate, which is specified as `requestedFPR`.\n\n  The false-positive rate scales exponentially with the number of bits\n  per entry:\n\n  | False-positive rate | Bits per entry |\n  |---------------------|----------------|\n  | 1 in 10             |  ≈ 4.77        |\n  | 1 in 100            |  ≈ 9.85        |\n  | 1 in 1, 000         |  ≈ 15.79       |\n  | 1 in 10, 000        |  ≈ 22.58       |\n  | 1 in 100, 000       |  ≈ 30.22       |\n\n- The worst-case in-memory size of the indexes is *O*(*n*).\n\n  The total in-memory size of all indexes depends on the index type,\n  which is determined by the `confFencePointerIndex` field of\n  `TableConfig`. The in-memory size of the various indexes is described\n  in reference to the size of the database in [*memory\n  pages*](https://en.wikipedia.org/wiki/Page_%28computer_memory%29 \"https://en.wikipedia.org/wiki/Page_%28computer_memory%29\").\n\n  `OrdinaryIndex`  \n  An ordinary index stores the maximum serialised key for each memory\n  page. The total in-memory size of all indexes is proportional to the\n  average size of one serialised key per memory page.\n\n  `CompactIndex`  \n  A compact index stores the 64 most significant bits of the minimum\n  serialised key for each memory page, as well as 1 bit per memory page\n  to resolve clashes, 1 bit per memory page to mark overflow pages, and\n  a negligable amount of memory for tie breakers. The total in-memory\n  size of all indexes is approximately 66 bits per memory page.\n\nThe total size of an LSM-tree must not exceed 2\u003csup\u003e41\u003c/sup\u003e physical\nentries. Violation of this condition *is* checked and will throw a\n`TableTooLargeError`.\n\n### Implementation\n\nThe implementation of LSM-trees in this package draws inspiration from:\n\n- Chris Okasaki. 1998. \"Purely Functional Data Structures\"\n  [doi:10.1017/CBO9780511530104](https://doi.org/10.1017/CBO9780511530104 \"https://doi.org/10.1017/CBO9780511530104\")\n\n- Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. \"Monkey:\n  Optimal Navigable Key-Value Store.\"\n  [doi:10.1145/3035918.3064054](https://doi.org/10.1145/3035918.3064054 \"https://doi.org/10.1145/3035918.3064054\")\n\n- Subhadeep Sarkar, Dimitris Staratzis, Ziehen Zhu, and Manos\n  Athanassoulis. 2021. \"Constructing and analyzing the LSM compaction\n  design space.\"\n  [doi:10.14778/3476249.3476274](https://doi.org/10.14778/3476249.3476274 \"https://doi.org/10.14778/3476249.3476274\")\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintersectmbo%2Flsm-tree","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintersectmbo%2Flsm-tree","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintersectmbo%2Flsm-tree/lists"}