{"id":13521303,"url":"https://github.com/mkirchner/hamt","last_synced_at":"2025-04-09T13:07:31.368Z","repository":{"id":39795247,"uuid":"393992234","full_name":"mkirchner/hamt","owner":"mkirchner","description":"A hash array-mapped trie implementation in C","archived":false,"fork":false,"pushed_at":"2024-02-10T22:19:26.000Z","size":2227,"stargazers_count":291,"open_issues_count":5,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-02T11:06:33.236Z","etag":null,"topics":["c","c99","data-structure","datastructure","hash-array-mapped-trie","immutable","tutorial"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkirchner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-08-08T15:04:42.000Z","updated_at":"2025-02-02T23:11:14.000Z","dependencies_parsed_at":"2024-02-10T23:23:16.027Z","dependency_job_id":"04161f1e-0df3-45ad-846b-c1cdbc54ed49","html_url":"https://github.com/mkirchner/hamt","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkirchner%2Fhamt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkirchner%2Fhamt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkirchner%2Fhamt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkirchner%2Fhamt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkirchner","download_url":"https://codeload.github.com/mkirchner/hamt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248045231,"owners_count":21038553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","c99","data-structure","datastructure","hash-array-mapped-trie","immutable","tutorial"],"created_at":"2024-08-01T06:00:32.470Z","updated_at":"2025-04-09T13:07:31.345Z","avatar_url":"https://github.com/mkirchner.png","language":"C","funding_links":[],"categories":["Members"],"sub_categories":[],"readme":"# libhamt\n\nA hash array-mapped trie (HAMT) implementation in C99. A HAMT is a data\nstructure that can be used to efficiently implement\n[*persistent*][wiki_persistent_data_structure] associative arrays (aka maps,\ndicts) and sets, see the [Introduction](#introduction). The implementation here\nloosely follows Bagwell's 2000 paper[[1]][bagwell_00_ideal], with a focus on\ncode clarity.\n\nWhat prompted the somewhat detailed writeup was the realization that there is\nnot a lot of in-depth documentation for HAMTs beyond the original Bagwell\npaper[[1][bagwell_00_ideal]]. Some of the more helpful posts are [Karl Krukow's\nintro to Clojure's `PersistentHashMap`][krukov_09_understanding], [C. S. Lim's\nC++ template implementation][chaelim_hamt], [Adrian Coyler's morning paper\npost][coyler_15_champ] and the original [Steindoerfer/Vinju compressed HAMT\narticle it summarizes][steindoerfer_15_optimizing]. The rest mostly seems to be\nall bits and pieces and this document is an attempt to (partially) improve that\nsituation.\n\n*FIXME: Complete docs (removal, persistence, path copying)*\n\n## Quickstart\n\nTo build the library and run the tests:\n\n```bash\n$ git clone git@github.com:mkirchner/hamt.git\n$ cd hamt\n$ make\n$ make test\n$ make runtest\n```\nIn order to use `libhamt` in your own projects, copy the required sources to\nyour own source tree.\n\n### Benchmarks\n\n![Benchmark results](./doc/img/benchmark.png)\n\nThe current HAMT implementation consistently outperforms the `libavl` AVL-tree\nand red-black tree implementations by 5x for querying, 1.5x-4x for node insert,\nand 1.5x-5x for node removal. Persistent insert and remove implementations\nscale roughly similar to the classic trees, with more favorable scaling\nbehavior for HAMT. Where table caching is an option, the persistent HAMT\nimplementation reaches better insert performance than red-black trees and \nbetter removal performance than red-black and AVL trees at appoximately 10e5\nelements.\n\nCompared to hash tables, HAMT query times start at 2x vs. GLib's `HashTable`\nand 20x vs. `hsearch(3)` (the latter still being investigated) and then get\nprogressively worse. This makes sense, given the O(1) vs. O(log N) expectations\nbetween the different approaches.\n\nNote that benchmarking and optimization is an ongoing effort and please take\nall numbers with a pinch of salt. All measurements have so far been collected\non a single system (Apple MBP M2 Max under Ventura 13.4.1).\n\nFor detailed performance comparison with AVL and red-black trees (from `libavl`)\nand the HashTree from GLib, see [the benchmarking repo][hamt_bench_github].\n\n\n# Introduction\n\nA *hash array mapped trie (HAMT)* is a data structure that can be used to\nimplement [associative arrays][wiki_associative_array] (aka maps) and\n[sets][wiki_set_adt].\n\nStructurally, HAMTs are [hash trees][wiki_hash_tree] that combine favorable\ncharacteristics of [hash tables][wiki_hash_table] and array mapped\n[tries][wiki_trie], namely almost hash table-like time complexity\nguarantees[[1]][bagwell_00_ideal] (O(log\u003csub\u003e32\u003c/sub\u003en)) and economic use of memory.\n\nAn additional benefit, and a key motivation for the work presented here, is that\naugmentation of HAMTs with path copying and garbage collection allows for a\nstraightforward and efficient implementation of [persistent][wiki_persistent]\nversions of maps and sets.\n\nThe remaining documentation starts with a description of the `libhamt` API and\ntwo examples that demonstrate the use of a HAMT as an ephemeral and persistent\ndata structure, respectively. It then details the implementation: starting from\nthe foundational data structures and the helper code required for hash\nexhaustion and table management, we cover search, insertion, removal, and\niterators. The final implementation section introduces path copying and explains\nthe changes required to support persistent insert and remove operations. It\ncloses with an outlook and an appendix.\n\n# API\n\n## HAMT lifecycle\n\nThe core data type exported in the `libhamt` interface is `struct hamt`. In order to\ncreate a `struct hamt` instance, one must call `hamt_create()`, which requires a\nhash function of type `hamt_key_hash_fn` to hash keys, a comparison function of\ntype `hamt_cmp_fn` to compare keys, and a pointer to a `hamt_allocator` instance.\n`hamt_delete()` deletes `struct hamt` instances that were created with `hamt_create()`.\n\n\n```c\n/* The libhamt core data structure is a handle to a hash array-mapped trie */\n\n/* Function signature definitions for key comparison and hashing */\ntypedef int (*hamt_cmp_fn)(const void *lhs, const void *rhs);\ntypedef uint32_t (*hamt_key_hash_fn)(const void *key, const size_t gen);\n\n/* API functions for lifecycle management */\nstruct hamt *hamt_create(hamt_key_hash_fn key_hash, hamt_cmp_fn key_cmp, struct hamt_allocator *ator);\nvoid hamt_delete(struct hamt *);\n```\n\nThe `hamt_key_hash_fn` takes a `key` and a generation `gen`. The expectation is\nthat the supplied hash function returns different hashes for the same key but\ndifferent generations. Depending on the choice of hash function this can be\nimplemented using `gen` as a seed or modifying a copy of `key` on the fly.\nSee the [examples](#examples) section for a `murmur3`-based implementation and\nthe [hashing](#hashing) section for more information on suitable hash functions.\n\n\n### Memory management\n\n`libhamt` exports its internal memory management API through the `hamt_allocator`\nstruct. The struct specifies the functions that the HAMT implementation uses to\nallocate, re-allocate and deallocate system memory. The API provides a default\n`hamt_allocator_default` which refers to the standard `malloc()`, `realloc()`\nand `free()` functions.\n\n```c\nstruct hamt_allocator {\n    void *(*malloc)(const size_t size);\n    void *(*realloc)(void *chunk, const size_t size);\n    void (*free)(void *chunk);\n};\n\nextern struct hamt_allocator hamt_allocator_default;\n```\n\nExporting the `libhamt` memory management API enables library clients to make\nuse of alternate memory management solutions, most notably of garbage collection\nsolutions (e.g. the [Boehm-Demers-Weiser GC][boehm_gc]) which are required when\nusing the HAMT as a persistent data structure (see the [structural sharing\nexample](#example-2-garbage-collected-persistent-hamts)).\n\n\n## Query\n\n```c\nsize_t hamt_size(const struct hamt *trie);\nconst void *hamt_get(const struct hamt *trie, void *key);\n```\n\nThe `hamt_size()` function returns the size of the HAMT in O(1). Querying the\nHAMT (i.e. searching a key) is done with `hamt_get()` which takes a pointer to a\nkey and returns a result in O(log\u003csub\u003e32\u003c/sub\u003e n) - or `NULL` if the key does\nnot exist in the HAMT.\n\n### Iterators\n\nThe API also provides key/value pair access through the `hamt_iterator` struct.\n```c\nsize_t hamt_size(const struct hamt *trie);\nconst void *hamt_get(const struct hamt *trie, void *key);\n```\n\nIterators are tied to a specific HAMT and are created using the\n`hamt_it_create()` function, passing the HAMT instance the iterator should refer\nto. Iterators can be advanced with the `hamt_it_next()` function and as long as\n`hamt_it_valid()` returns `true`, the `hamt_it_get_key()` and\n`hamt_it_get_value()` functions will return the pointers to the current\nkey/value pair. In order to delete an existing and/or exhausted iterator, call\n`hamt_it_delete()`.\n\n```c\ntypedef struct hamt_iterator_impl *hamt_iterator;\n\nhamt_iterator hamt_it_create(const struct hamt *trie);\nvoid hamt_it_delete(hamt_iterator it);\nbool hamt_it_valid(hamt_iterator it);\nhamt_iterator hamt_it_next(hamt_iterator it);\nconst void *hamt_it_get_key(hamt_iterator it);\nconst void *hamt_it_get_value(hamt_iterator it);\n```\n\nIterators maintain state about their traversal path and changes to the HAMT\nthat an iterator refers to implicitly invalidate the iteration (i.e. undefined\nbehavior).\n\nThe order in which iterators return the key value pairs is fully defined by\nthe structure of the trie, which, in turn, is completely defined by the choice\nof hash function and (where applicable) seed.\n\n\n## Insert \u0026 Remove\n\n`libhamt` supports ephemeral and\n[persistent][wiki_persistent] (aka not ephemeral) HAMTs through two different interfaces:\n`hamt_set()` and `hamt_remove()` for ephemeral use, and their `p`-versions\n`hamt_pset()` and `hamt_premove()` for persistent use.\n\n### Ephemeral modification\n\n```c\nconst void *hamt_set(struct hamt *trie, void *key, void *value);\nvoid *hamt_remove(struct hamt *trie, void *key);\n```\n\n`hamt_set()` takes a pair of `key` and `value` pointers and adds the pair to the HAMT,\nreturning a pointer to the `value`. If the `key` already exists, `hamt_set()`\nupdates the pointer to the `value`.\n\n`hamt_remove()` takes a `key` and removes the key/value pair with the\nrespective `key` from the HAMT, returning a pointer to the `value` that was\njust removed. If the `key` does not exist, `hamt_remove()` returns `NULL`.\n\n### Persistent HAMTs\n\nThe semantics of persistent HAMTs are different from their ephemeral\ncounterparts: since every modification creates a new version of a HAMT, the\nmodificiation functions return a new HAMT. Modification of a persistent HAMT\ntherefore requires a reassignment idiom if the goal is modification only:\n\n```c\nconst struct hamt *h = hamt_create(...)\n...\n/* Set a value and drop the reference to the old HAMT; the GC\n * will take care of cleaning up remaining unreachable allocations.\n */\nh = hamt_pset(h, some_key, some_value);\n...\n```\n\nThis seems wasteful at first glance but the respective functions implement structural\nsharing such that the overhead is limited to *~log\u003csub\u003e32\u003c/sub\u003e(N)* nodes (where *N* is the\nnumber of nodes in the graph).\n\n```c\nconst struct hamt *hamt_pset(const struct hamt *trie, void *key, void *value);\nconst struct hamt *hamt_premove(const struct hamt *trie, void *key);\n```\n\n`hamt_pset()` inserts or updates the `key` with `value` and returns an opaque\nhandle to the new HAMT. The new HAMT is guaranteed to contain the new\nkey/value pair.\n\n`hamt_premove()` attempts to remove the value with the key `key`. It is *not*\nan error if the key does not exist; the new HAMT is guaranteed to not contain\nthe key `key`.\n\n## Examples\n\n### Example 1: ephemeral HAMT w/ standard allocation\n\n```c\n#include \u003cstdint.h\u003e\n#include \u003cstdio.h\u003e\n#include \u003cstdlib.h\u003e\n#include \u003cstring.h\u003e\n\n#include \"hamt.h\"\n#include \"murmur3.h\"\n\n\nstatic uint32_t hash_string(const void *key, const size_t gen)\n{\n    return murmur3_32((uint8_t *)key, strlen((const char *)key), gen);\n}\n\nint main(int argn, char *argv[])\n{\n    enum { N = 5; };\n    struct {\n        char *country;\n        char *capital;\n    } cities[N] = {\n        {\"Germany\", \"Berlin\"},\n        {\"Spain\", \"Madrid\"},\n        {\"Italy\", \"Rome\"},\n        {\"France\", \"Paris\"},\n        {\"Romania\", \"Bucharest\"}\n        /* ... */\n    };\n\n    struct hamt *t;\n\n    /* create table */\n    t = hamt_create(hash_string, strcmp, \u0026hamt_allocator_default);\n    /* load table */\n    for (size_t i = 0; i \u003c N; i++) {\n        hamt_set(t, cities[i].country, cities[i].capital);\n    }\n\n    /* query table */\n    for (size_t i = 0; i \u003c N; i++) {\n        printf(\"%s has capital %s\\n\", cities[i].country,\n                                      hamt_get(t, cities[i].country));\n    }\n    /* cleanup */\n    hamt_delete(t);\n    return 0;\n}\n```\n\n### Example 2: Garbage-collected persistent HAMTs\n\nThe key to making use of structural sharing is to provide `libhamt` with a\n`struct hamt_allocator` instance that implements garbage collection.\n\nThe example below uses the the [Boehm-Demers-Weiser][boehm_gc] GC. For\nGC installation, compilation and linking instructions, please refer to the GC\ndocumentation.\n\nIn brief, the Boehm GC provides a `gc.h` include file and drop-in replacements\nfor the standard memory management functions, including `malloc`, `realloc`\nand `free`.\n\nThe following snippet illustrates the required changes:\n\n```c\n...\n#include \"gc.h\"  /* Boehm-Demers-Weiser GC */\n\n...\n\ninline void nop(void *_) { return; }\n\nint main(int argc, char *argv[]) {\n    ...\n    /*\n    Set up garbage collection. We set the function pointer for `free` to\n    NULL to avoid explicit freeing of memory.\n    */\n    struct hamt_allocator gc_alloc = {GC_malloc, GC_realloc, nop};\n    const struct hamt *t = hamt_create(hash_string, strcmp, \u0026gc_alloc);\n    ...\n}\n```\n\nWe set the `gc_alloc.free` function pointer to point to `nop()`, a\nno-operation function. This is necessary to ensure that we rely on the garbage\ncollector. If we were to provide a pointer to `GC_free()` (i.e. GC's drop-in\nreplacement for the `free()` function), we would still implement explicit\ndeallocation, just with a different free function.\n\n### Example 3: Iterators\n\nThe following snipped illustrates how to create, test, exhaust and dispose of\nan iterator. We first create the iterator using `hamt_it_create()`, jump into\na `while` loop and advance the iterator using `hamt_it_next()` while the\niterator is valid. In every interation we print the current key/value pair to\n`stdout`. Once we exit the loop, we clean up using `hamt_it_delete()`.\n\n```c\n    ...\n    struct hamt *t = hamt_create(hash_string, strcmp, \u0026hamt_allocator_default);\n\n    /* load table */\n    ...\n\n    /* create iterator */\n    hamt_iterator it = hamt_it_create(t);\n    while (hamt_it_valid(it)) {\n        printf(\"(%s, %s)\\n\", (char *)hamt_it_get_key(it),\n                             (char *)hamt_it_get_value(it));\n        hamt_it_next(it);\n    }\n    /* clean up */\n    hamt_it_delete(it);\n\n    ...\n    hamt_delete(t);\n    ...\n```\n\nThis concludes the description of the `libhamt` interface and we now move on\nto detailed implementation notes.\n\n# Implementation\n\n## Prelude: Setup\n\n### Project structure\n\nThe `hamt` source tree has the following structure:\n\n```\nhamt/\n  build/         Out-of-source build destination\n  include/       Header files that are part of the interface\n  src/           Source and header files\n  test/          Test and utility headers \u0026 sources\n  Makefile\n```\n\nSources are organized in three folders: the `include` folder, for all header\nfiles that are part of the public interface; the `src` folder, for the\nactual implementation and private header files; and the `test` folder, for all\ntest code, including headers and sources for testing utilities (e.g. data\nloading and benchmarking functions).\n\nThe build process is governed by a single `Makefile` in the project root\ndirectory.\n\n### Programming Style\n\n### Building the project\n\nTo build the library and run the tests:\n\n```\n$ make \u0026\u0026 make test\n```\n\n## Design\n\n### Introduction\n\n**Hash tables.** A common and practical answer to efficient value retrieval\nfrom a collection given a key is to \"use a *hash table*\".  This is good\nadvice. *Hash tables* provide insert, modification, and retrieval in amortized\nconstant average time, using space linear in the number of elements they\nstore.  They have been the subject of intensive research and\noptimization and are part of [every][sedgewick_11_algorithms]\n[introductory][cormen_09_introduction] CS textbook.  Chances are that the\nstandard library of the languange at hand contains a readily available, tried\nand tested implementation.\n\nFor instance, `std::unordered_set` and `std::unordered_map` (and their\n`*_multiset` cousins) are hash table implementations for C++ \u003csup\nid=\"ac_hash_table_cpp\"\u003e[1](#fn_hash_table_cpp)\u003c/sup\u003e; for C, multiple\n[libc][wiki_libc] implementations (e.g. [glibc][wiki_glibc], [musl][musl],\n[BSD libc][wiki_bsd_libc]) provide POSIX-compliant `hsearch` facilities,\nGNOME's [GLib][wiki_glib]\nand others provide [hash table][glib_hashtable] implementations\u003csup\nid=\"ac_hash_table_c\"\u003e[2](#fn_hash_table_c)\u003c/sup\u003e. Python has the `dict` type\nfor associative arrays which [is implemented as a hash\ntable][python_dict_pre36]\u003csup\nid=\"ac_hash_table_python\"\u003e[3](#fn_hash_table_python)\u003c/sup\u003e.  Java has\n`Hashtable`, `HashMap`, and `HashSet` \u003csup\nid=\"ac_hash_table_java\"\u003e[4](#fn_hash_table_java)\u003c/sup\u003e and JavaScript has\n[`Map`][js_map].\n\nOne property of the classical hash table implementations is that they do not\nprovide support for *persistence* (in the sense of [persistent data\nstructures][wiki_persistent], not persistent storage). They are a\n[place-oriented][hickey_value_of_values] solution to associative storage and\nmake destructive modifications to the data structure when the data changes\n(note that this is independent of any particular conflict resolution and\ncapacity maintenance strategies).\n\nPersistent associative containers require a different approach.\n\n**Persistent data structures.** *(Full) persistence* is the property of a data\nstructure to always preserve (all) previous versions if itself under\nmodification. The property is related to\n[immutability][wiki_immutable_object]: from the perspective of the client,\nevery update yields a new copy, making instances practically immutable. This\nis a huge conceptual change: if data structures are immutable, functions using\nthese data structures are pure (i.e. side effect-free). That in turn enables\n[value semantics][wiki_value_semantics], [referential\ntransparency][wiki_referential_transparency], and, consequently, substantial\nreduction in programming complexity when dealing with paralellism and\nsynchronization (see e.g. Rich Hickey's presentations on [*The Value of\nValues*][hickey_value_of_values] and [*Are We There\nYet?*][hickey_are_we_there_yet]).\n\nThe catch is that classical hash tables set a high bar in terms of time and\nspace performance characteristics, and persistent data structures need to\napproximate that bar.\n\n**Efficient persistence.** Persistent associative data structures need to\nminimize the memory overhead introduced by value\nsemantics (i.e. returning copies as opposed to modified originals) and, at\nthe same time, provide practically average constant time insert, retrieve and\ndelete capabilities to minimize the performance gap to classical hash tables.\n\nIt turns out that the data structure of choice to tackle these challenges is a\n*tree*. Trees support efficient [*structural\nsharing*][wiki_structural_sharing] strategies for efficient memory management\nand, if they are *balanced* and have *large branching factors*, provide\nO(log\u003csub\u003ek\u003c/sub\u003e N) average performance guarantees.\n\n*Persistent hash array-mapped tries* are, in essence, a sophisticated,\npractical implementation of such a data structure.\n\n\n### Persistent Hash Array-Mapped Tries\n\nOne way to understand hash array-mapped tries is to look at them as an\nevolution of *k*-ary trees (Fig. 1) that follows from a series of real-world\ntradeoffs.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"doc/img/hamt-trees.png\" width=\"600\"\u003e\u003c/img\u003e\n\u003c/p\u003e\n\u003cp class=\"image-caption\"\u003e\u003cb\u003eFigure 1:\u003c/b\u003e *k*-ary tree, hash tree, and\nhash array-mapped trie.\u003c/p\u003e\n\nIn classic *k*-ary trees Ⓐ,  Internal and leaf nodes have\ndifferent types: internal nodes point to *n* internal or leaf nodes and leaf\nnodes hold or point to data (i.e. the keys/value pairs). In their basic form,\n*n*-ary trees (just like binary trees) are not balanced and their performance\ncharacteristics can easily degrade from *O(log\u003csub\u003ek\u003c/sub\u003e n)* to *O(n)*\nfor degenerate input sequences.\n\nOne approach to balanced trees are explicit implementations of\ntree rebalancing (as in e.g. [Red-black\ntrees][wiki_red_black_trees], [AVL trees][wiki_avl_trees], or\n[B-trees][wiki_b_trees]).\n\nAnother option is to use a [*hash tree*][wiki_hash_tree] Ⓑ: like the name\nimplies, it uses the *hash* of the key, interpreted as a sequence of *b*-bit\ngroups, to detetermine the location of the leaf node that stores the key/value\npair. The group size *b* determines the branching factor 2\u003csup\u003e\u003ci\u003eb\u003c/i\u003e\u003c/sup\u003e,\ni.e. for *b*=5, every node can have 2\u003csup\u003e5\u003c/sup\u003e=32 child nodes.\nInstead of implementing explicit tree rebalancing, hash trees rely on the\ndistributional properties of a (good) hash function to place nodes uniformly.\nWhile this saves some effort for rebalancing, note that hash trees *do*\nrequire a strategy to deal with *hash exhaustion*, a topic covered below.\n\nThe challenge with vanilla hash trees is that they reserve space for *k*\nchildren in every internal node. If the tree is sparsely populated this will\ncause significant memory overhead and impact performance due to cache misses.\n\nFor that reason, HAMTs implement *array mapping* Ⓒ: instead of reserving space\nfor *n* pointers to children in each internal node, the parent node stores a\nbitmap that indicates which children are present and the actual node only\nallocates the memory required to refer to its children. This is an important\noptimization that makes trees with a high branching factor more memory\nefficient and cache-friendly.\n\nIn order to implement a *persistent* map or set, every modification operation\nmust return a modified copy and maintain the source data structure. And\nreturning actual copies is prohibitively expensive in time and memory.\n\nThis, finally, is where HAMTs really shine and the true reason why we build\nthem in the first place.\n\nHAMTs are trees and trees are compatible with\n[structural sharing][wiki_persistent_data_structure] strategies. Common\ntechniques are copy-on-write, fat nodes, [path\ncopying][wiki_persistent_structural_sharing], and there are [complex\ncombinations of the previous three][driscoll_86_making]. Path copying is\nsimple, efficient and general and therefore the technique of choice for\n`libhamt`: Instead of returning an actual copy of the tree during an insert,\nupdate or delete operations, we follow the search path to the item in\nquestion, maintaining a path copy with all the nodes along the way, make our\nmodification along this path and return it to the caller.\n\nNote that enabling persistence *requires* the use of a garbage collection\nstrategy. Under stanard `malloc()` memory management, there is no way for\nthe HAMT nodes to know how many descendants of a HAMT refer to them.\n\n### Documentation structure and implementation strategy\n\nIn the following we will address these concepts in turn: we first define the\nfoundational data structure used to build a tree and introduce the concept of\nan *anchor*. We then dive into hash functions and the *hash state management*\nrequired to make hashing work for trees of arbitrary depths and in the\npresence of hash collisions. We then turn to *table management*,\nintroducing a set of functions used to create, modify, query and dispose of\nmapped arrays.  With these pieces in place, we are ready to implement the\ninsert/update, query, and delete functions for non-persistent HAMTs. And\nlastly, we introduce the concept of path copying and close with the\nimplementation of persistent insert/update and delete functions for HAMTs.\n\n\n### Foundational data structures\n\u003c!--\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"doc/img/hamt-overview.png\" width=\"600\"\u003e\u003c/img\u003e\n\u003c/p\u003e\n\u003cp class=\"image-caption\"\u003e\u003cb\u003eFigure 1:\u003c/b\u003e HAMT data structure.\n\u003ccode\u003elibhamt\u003c/code\u003e implements\nHAMTs using linked, heap-allocated tables. Table rows hold\neither an index vector and pointer to a subtable or pointers to key and\nvalue (one pair of key/value pointers illustrated in blue, and implicit to all\nempty table fields).\u003c/p\u003e\n--\u003e\n\n`libhamt` uses different types to implement internal and leaf nodes.\n\nLeaf nodes contain two fields, called `value` and `key` (the rationale for the\nreverse ordering of the two fields will become evident shortly).\n```c\nstruct {\n    void *value;\n    void *key;\n} kv;\n```\nBoth fields are\ndefined as `void*` pointers to support referring to arbitrary data types via\ntype casting\n\u003csup id=\"ac_cpp_virtual_method_table\"\u003e[5](#fn_cpp_virtual_method_table)\u003c/sup\u003e.\n\n`libhamt`'s internal nodes are where the magic happens, based on Bagwell's *[Ideal Hash\nTrees][bagwell_00_ideal]* paper and according to the design principles\noutlined above.\n\nWith a branching factor *k*, internal nodes have at most *k* successors but\ncan be sparsely populated. To allow for a memory-efficient representation,\ninternal nodes have a pointer `ptr` that points to a fixed-size, right-sized\n*array* of child nodes (also known as a *table*) and a *k*-bit `index` bitmap field that\nkeeps track of the size and occupancy of that array.\n\n`libhamt` uses *k*=32 and because `index` is a 32-bit bitmap field, the number\nof one-bits in `index` yields the size of the array that `ptr` points to (also\nknown as the *population count* or `popcount()` of `index`).\n\nThis suggests an initial (incomplete) definition along the following lines:\n```c\nstruct {\n    struct T *ptr;  /* incomplete */\n    uint32_t index;\n} table;\n```\n\nThe specification of `T` must provide the ability for that datatype to point to\ninternal and external nodes alike, using only a single pointer type.\nA solution is to wrap the two types into a `union` (and then to wrap\nthe `union` into a `typedef` for convenience):\n\n```c\ntypedef struct hamt_node {\n    union {\n        struct {\n            void *value;\n            void *key;\n        } kv;\n        struct {\n            struct hamt_node *ptr;\n            uint32_t index;\n        } table;\n    } as;\n} hamt_node;\n```\n\nWith this structure, given a pointer `hamt_node *p` to a `hamt_node`\ninstance, `p-\u003eas.kv` addresses the leaf node, and `p-\u003eas.table` addresses the\ninternal node and `p-\u003eas.kv.value`, `p-\u003eas.kv.key`, `p-\u003eas.table.ptr`, and\n`p-\u003eas.table.index` provide access to the respective fields.\n\nTo maintain sanity, we define the following convenience macros:\n\n```c\n#define TABLE(node) node-\u003eas.table.ptr\n#define INDEX(node) node-\u003eas.table.index\n#define VALUE(node) node-\u003eas.kv.value\n#define KEY(node)   node-\u003eas.kv.key\n```\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"doc/img/hamtnode-table.png\" width=\"450\"\u003e\u003c/img\u003e\n\u003c/p\u003e\n\u003cp class=\"image-caption\"\u003e\u003cb\u003eFigure 2:\u003c/b\u003e\nMemory structure of an internal node. If \u003ccode\u003enode\u003c/code\u003e is a pointer\nto an internal node, \u003ccode\u003eTABLE(node)\u003c/code\u003e (or, equivalently, \u003ccode\u003e\nnode-\u003eas.table.ptr\u003c/code\u003e) points to the first field of the successor table.\n\u003c/p\u003e\n\n### The Anchor\n\nThe `libhamt` codebase makes liberal use of the concept of an *anchor*.  An\n*anchor* is a `hamt_node*` pointer to an internal node (i.e.\n`is_value(VALUE(anchor))` evaluates to false). An `anchor` provides access to\nall information relevant to manage the table of child nodes: `INDEX(anchor)`\nreturns the bitmap that encodes the array mapping, applying a popcount to the\nbitmap gives the size of the table and indexing is implemented using partial\npopcounts. Table elements can be accessed through\n`TABLE(anchor)[i]`, where `i` must be in the valid range.\n\n\n### Pointer tagging\n\nThe definition of `hamt_node` enables the construction of trees with a mix of\ninternal and leaf nodes. What the definition does not provide, is a way to\ndetermine if a concrete `hamt_node*` pointer points to an internal or a leaf\nnode. One solution would be to specify an `enum` that indicates the type\n(i.e. `NODE_LEAF`, etc.) and to add a `type` field to `struct hamt_node`.  While\nvalid, this would also increase the size of the struct by 50% just to maintain\na single bit of information. Luckily, there is a more memory-efficient\nsolution: pointer tagging.\n\nSince pointers need to be word-aligned, that leaves the lower 3 bits of all\npointers on 64-bit architectures always set to zero. It is possible to make\nuse of these bits under two conditions: (1) we know we are looking at a\npointer (the bottom three bits for the integer 1 are zero, too); and (2) we\ncarefully mask the bits in question whenever we actually use the pointer\n(since it would point to the wrong location otherwise). The first is not a\nproblem since we own the code; the second requires diligence and some helper\nfunctions:\n\n```c\n#define HAMT_TAG_MASK 0x3\n#define HAMT_TAG_VALUE 0x1\n#define tagged(__p) (hamt_node *)((uintptr_t)__p | HAMT_TAG_VALUE)\n#define untagged(__p) (hamt_node *)((uintptr_t)__p \u0026 ~HAMT_TAG_MASK)\n#define is_value(__p) (((uintptr_t)__p \u0026 HAMT_TAG_MASK) == HAMT_TAG_VALUE)\n```\n\nIn order to mark a leaf node as such, we set `key` as usual and tag the value\npointer before assigning it to `value`:\n\n```c\n    p-\u003eas.kv.key = key_ptr;\n    p-\u003eas.kv.value = tagged(value_ptr);\n```\n\nGiven a pointer to a leaf (e.g. a search result), we untag `value` before\nreturning it:\n\n```c\n    ...\n    if (status == SEARCH_SUCCESS) {\n        return untagged(p-\u003eas.kv.value);\n    }\n    ...\n```\n\nAnd, in order to determine what we are looking at, we use `is_value`:\n\n```c\n    if (is_value(p-\u003eas.kv.value)) {\n        /* this is a leaf */\n        ...\n    } else {\n        /* this is an internal node */\n        ...\n    }\n```\n\nPointer tagging is the reason why the `value` and `key`\nfields in the `struct kv` struct are ordered the way they are.\nThe `union` in `hamt_node` causes the\nmemory locations of the `struct kv` and `struct table` structs to overlap. Since\nthe `table.index` field is *not* a pointer (and the bottom-three-bits-are-zero\nguarantee does not apply), its storage location cannot be used for pointer\ntagging, leaving the `table.ptr` to the task. Putting `kv.value` first,\naligns the value field with `table.ptr`. The reverse order would work, but the\n`kv.key` pointer is dereferenced much more often in the code and so it is more\nconvenient to use `kv.value`.\n\n\n## Array mapping\n\nThe principal idea behing array mapping is to project a sparse bitmap index\nonto the index of a dense array, where the size of the array corresponds to\nthe number of non-zero bits in the bitmap index.\n\nGiven `hamt_node *p` is a valid pointer to a node, `INDEX(p)` corresponds to a\nsparse bitmap index. The dense array is located at `TABLE(p)` and its size is\ndetermined by the [*population count*][wiki_popcount] of `INDEX(p)`.\n\nThe mapping itself is conceptually trivial: to determine the dense index for\nevery non-zero bit in the bitmap index, count the number of non-zero bits to\nthe right of it. In other words, the first set bit goes to index 0, the second\nto index 1, and so forth.\n\nEfficiently implementing population counting (also known as the hamming\nweight) of a bitset is [not trivial][wiki_popcount]. `libhamt` falls back on a\nGCC/Clang intrinsic:\n\n```c\nstatic inline int get_popcount(uint32_t n) { return __builtin_popcount(n); }\n```\n\nWith `get_popcount()` available, determining the position (i.e. dense index)\nfor a sparse index in a bitmap reduces to calculating the population count of\nthe bitmap masked off above the sparse index:\n\n```c\nstatic inline int get_pos(uint32_t sparse_index, uint32_t bitmap)\n{\n    return get_popcount(bitmap \u0026 ((1 \u003c\u003c sparse_index) - 1));\n}\n```\n\nLastly, to determine if a node has a child at a particular index `index`, we\ncheck if the bit at that index is set in the bitmap:\n\n```c\nstatic inline bool has_index(const hamt_node *anchor, size_t index)\n{\n    return INDEX(anchor) \u0026 (1 \u003c\u003c index);\n}\n```\n\n## Hashing\n\nA [*hash function*][wiki_hash_function] is a function that takes data of\narbitrary size and maps it to a fixed-size value (often machine word sizes).\n*Good* hash functions are fast to compute and produce *uniform* output, they\nmap their inputs as evenly as possible over the output range.  If it is\npractically infeasible to invert the mapping (i.e. determine which hash\ncorresponds to which input value), the hash function is called a [cryptographic\nhash function][wiki_cryptographic_hash_function].\n\nFor the purpose of implementing a HAMT, cryptographical security is not a\ndesign goal. However, the uniformity of the hash function has direct impact on\nthe balance of the tree: it is the hash that pre-determines all key positions\nin the fully populated tree and it is its distribution properties that\ndetermines the number of collisions (and hence depth extensions) we introduce.\n\n`libhamt` does not force clients to use a particular hash function. The\nlibary exposes a hash function signature of the form\n\n```c\ntypedef uint32_t (*hamt_key_hash_fn)(const void *key, const size_t gen);\n```\n\nand expects users to provide a suitable function pointer as part of the call to\n`hamt_create()` which, among other parameters, takes a hash function:\n\n```c\n/* ... see below for a practical definition of my_keyhash_string */\n\n    struct hamt *t = hamt_create(my_keyhash_string, my_keycmp_string,\n                                 \u0026hamt_allocator_default);\n```\n\nThere are multiple [good, practical choices][why_simple_hash_functions_work]\nfor the HAMT.  Per default `libhamt` includes its [own][hamt_src_murmur],\n[tested][hamt_src_test_murmur] implementation of 32-bit\n[MurmurHash3][wiki_murmurhash]:\n\n```c\n/* from include/murmur3.h */\n\nuint32_t murmur3_32(const uint8_t *key, size_t len, uint32_t seed);\n```\n\nThis declares the *murmur* hash function. In its standard form `murmur3_32`\ntakes a pointer `key` to byte-sized objects, a count of `len` that speficies\nthe number of bytes to hash and a random seed `seed`.\n\nIn order to use murmur3 as a `hamt` hash function, we need to wrap it into a\nhelper function:\n\n```c\nstatic uint32_t my_keyhash_string(const void *key, const size_t gen)\n{\n    uint32_t hash = murmur3_32((uint8_t *)key, strlen((const char *)key), gen);\n    return hash;\n}\n```\n\nHere, the wrapper makes use of `strlen(3)`, assuming valid C strings as keys.\nNote the use of `gen` as a seed for the hash (see below for the hash exhaustion\ndiscussion).\n\nHere is a full example:\n\n```c\n#include \"murmur3.h\"\n\n/* ... */\n\nstatic uint32_t my_keyhash_string(const void *key, const size_t gen)\n{\n    uint32_t hash = murmur3_32((uint8_t *)key, strlen((const char *)key), gen);\n    return hash;\n}\n\n/* ... */\n\n    struct hamt *t = hamt_create(my_keyhash_string, my_keycmp_string,\n                                 \u0026hamt_allocator_default);\n\n```\n\n\n### Hash exhaustion: hash generations and state management\n\nFor a hash trie, the number of elements in the trie is limited by the total number\nof hashes that fits into a 32-bit `uint32_t`, i.e. 2^32-1. Since the HAMT only\nuses 30 bits (in 6 chunks of 5 bits), the number of unique keys in the trie is\nlimited to 2\u003csup\u003e30\u003c/sup\u003e-1 = 1,073,741,823 keys.\nAt the same time, since every layer of the\ntree uses 5 bits of the hash, the trie depth is limited to 32/5 = 6 layers.\nNeither the hard limit to the number of elements in the trie,\nnor the inability to build a trie beyond depth of 6 are desirable properties.\n\nTo address both issues, `libhamt` recalculates the hash with a different seed every\n6 layers. This requires a bit of state management and motivates the\nexistence of the `hash_state` data type and functions that operate on it:\n\n```c\ntypedef struct hash_state {\n    const void *key;\n    hamt_key_hash_fn hash_fn;\n    uint32_t hash;\n    size_t depth;\n    size_t shift;\n} hash_state;\n```\nThe struct maintains the pointers `key` to the key that is being hashed and\n`hash_fn` to the hash function used to calculate the current hash `hash`. At\nthe same time, it tracks the current depth `depth` in the tree (this is the\n*hash generation*) and the bitshift `shift` of the current 5-bit hash chunk.\n\nThe interface provides two functions: the means to step from the current 5-bit\nhash to the next in `hash_next()`; and the ability query the current index of a\nkey at the current trie depth in `hash_get_index()`.\n\n`hash_next()` takes a pointer to a `hash_state` instance and steps that instance\nfrom the current to the next chunk. Taking a step involves increasing the\n`depth` and `shift`, and initiating a rehash if the `shift` indicates\nthat the hash has been exhausted:\n\n```c\nstatic inline hash_state *hash_next(hash_state *h)\n{\n    h-\u003edepth += 1;\n    h-\u003eshift += 5;\n    if (h-\u003eshift \u003e 25) {\n        h-\u003ehash = h-\u003ehash_fn(h-\u003ekey, h-\u003edepth / 5);\n        h-\u003eshift = 0;\n    }\n    return h;\n}\n```\n\nThe index of a hash at its current depth corresponds to the decimal\nrepresentation of the current chunk. To determine the current chunk,\nwe right-shift the hash by `h-\u003eshift` to right-align the desired\nLSB and then mask with `0x11111` which equals `0x1f`:\n\n```c\nstatic inline uint32_t hash_get_index(const hash_state *h)\n{\n    return (h-\u003ehash \u003e\u003e h-\u003eshift) \u0026 0x1f;\n}\n```\n\n\n## Table management\n\nIn order to facilitate memory management for tables (aka the internal nodes),\n`libhamt` defines a set of helper functions. Each of these functions takes a\n`hamt_allocator` and calls the user-supplied allocation, re-allocation and\ndeallocation functions as appropriate.\n\nWe start by defining a simple memory abstraction (it would also be correct to use real functions\ninstead of preprocessor macros for this):\n\n```c\n#define mem_alloc(ator, size) (ator)-\u003emalloc(size)\n#define mem_realloc(ator, ptr, size) (ator)-\u003erealloc(ptr, size)\n#define mem_free(ator, ptr) (ator)-\u003efree(ptr)\n```\n\nThis will make it easier to add optimizations (e.g. table caching) in the\nfuture. On top of these macros, table lifecycle management is accomplished\nwith a few dedicated allocation and de-allocation functions.\n\n\n### Simple allocation and deallocation\n\n`table_allocate()` allocates tables with size `size` and returns a pointer to\nthe newly allocated table.\n\n```c\nhamt_node *table_allocate(struct hamt_allocator *ator, size_t size)\n{\n    return (hamt_node *)mem_alloc(ator, (size * sizeof(hamt_node)));\n}\n```\n\n`table_free()` deallocates the allocation referenced by `ptr`. It also\nsupports taking a `size` parameter for future extension (e.g. provide a hint\nfor allocation pool management) that is currently ignored by the underlying\n`mem_free()` implementation.\n\n```c\nvoid table_free(struct hamt_allocator *ator, hamt_node *ptr, size_t size)\n{\n    mem_free(ator, ptr);\n}\n\n```\n\n### Specialized table resize operations\n\nWhile it is possible to implement table re- and right-sizing with the\ntwo functions introduced above, it makes a lot of sense to provide specialized\nfunctionality for the key allocation/de-allocation use cases: extending,\nshrinking and gathering a table.\n\n**Table extension.** Since the tables in a HAMT are right-sized to minimize\nmemory overhead, item insertion must necessarily add an additional row to an\nexisting table. As illustrated in figure 3, the table extension function takes an anchor for an existing\ntable, allocates a new table with increased size, copies over the exsiting\nentries (leaving a gap at the appropriate position for the new row), assigns\nthe new key and value to the fields in the new row, updates the anchor with\nthe new memory location of the table and the new index, and eventually frees the\nmemory of the old table.\n\n\n\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"doc/img/table-extend.png\" width=\"450\"\u003e\u003c/img\u003e\n\u003c/p\u003e\n\u003cp class=\"image-caption\"\u003e\u003cb\u003eFigure 3:\u003c/b\u003e\nExtending a table creates a new copy of the existing table with an additional\nrow for the new node.\n\u003c/p\u003e\n\nLooking at the code, this is implemented in verbatim in the `table_extend()`\nfunction. `table_extend()` takes an `anchor` pointer to a table of\nsize `n_rows`, then uses the allocator `ator` to create a new table of size `n_rows + 1`\nwith an empty row at position `pos` and the bitmap index bit `index` set. It\nuses `memcpy()` to copy memory ranges into the the appropriate positions in\nthe new allocation, frees the old table and assignes the new table `ptr` and\n`index` in the anchor:\n\n\n```c\nhamt_node *table_extend(struct hamt_allocator *ator, hamt_node *anchor,\n                       size_t n_rows, uint32_t index, uint32_t pos)\n{\n    hamt_node *new_table = table_allocate(ator, n_rows + 1);\n    if (!new_table)\n        return NULL;\n    if (n_rows \u003e 0) {\n        /* copy over table */\n        memcpy(\u0026new_table[0], \u0026TABLE(anchor)[0], pos * sizeof(hamt_node));\n        /* note: this works since (n_rows - pos) == 0 for cases\n         * where we're adding the new k/v pair at the end and memcpy(a, b, 0)\n         * is a nop */\n        memcpy(\u0026new_table[pos + 1], \u0026TABLE(anchor)[pos],\n               (n_rows - pos) * sizeof(hamt_node));\n    }\n    table_free(ator, TABLE(anchor), n_rows);\n    TABLE(anchor) = new_table;\n    INDEX(anchor) |= (1 \u003c\u003c index);\n    return anchor;\n}\n```\n\n**Shrinking a table.** Shrinking a table is the inverse operation of table\nextension: since we maintain right-sized tables as an invariant, we need to\nadjust table sizes the moment the client deletes a key/value pair from the\nHAMT.\n\nFigure 4 illustrates the concept: given an anchor, the shrinking function\nreturns a new table with the specified row removed.\n\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"doc/img/table-shrink.png\" width=\"450\"\u003e\u003c/img\u003e\n\u003c/p\u003e\n\u003cp class=\"image-caption\"\u003e\u003cb\u003eFigure 4:\u003c/b\u003e\nShrinking a table creates a new copy of the table with the specified row\nremoved.\n\u003c/p\u003e\n\nIn the code, this is what `table_shrink()` does. In the same way as\n`table_extend()` the function takes a pointer `ator` to the global allocator,\na pointer `anchor` to the current anchor, the size of the current tables as\n`n_rows`, and the pair of one-hot bitmap index `index` and storage array\nposition `pos`. And, in analogy to table extension, the function allocation a\nright-sized table, copies the data to keep using range copies with `memcpy()`,\nfrees up the old table and updates the anchor to reflect the changes.\n\n```c\nhamt_node *table_shrink(struct hamt_allocator *ator, hamt_node *anchor,\n                       size_t n_rows, uint32_t index, uint32_t pos)\n{\n    hamt_node *new_table = NULL;\n    uint32_t new_index = 0;\n    if (n_rows \u003e 0) {\n        new_table = table_allocate(ator, n_rows - 1);\n        if (!new_table)\n            return NULL;\n        new_index = INDEX(anchor) \u0026 ~(1 \u003c\u003c index);\n        memcpy(\u0026new_table[0], \u0026TABLE(anchor)[0], pos * sizeof(hamt_node));\n        memcpy(\u0026new_table[pos], \u0026TABLE(anchor)[pos + 1],\n               (n_rows - pos - 1) * sizeof(hamt_node));\n    }\n    table_free(ator, TABLE(anchor), n_rows);\n    INDEX(anchor) = new_index;\n    TABLE(anchor) = new_table;\n    return anchor;\n}\n```\n\n**Table gathering.** As we are deleting entries from the HAMT, we may end up\nwith the table structure shown in Figure 5: a table in which one of the\nentries is a single-row table. What we want to do in these cases is to replace\nthe table entry in `TABLE(anchor)[1]` with the key/value pair from\n`TABLE(TABLE(anchor)[1])` and *gather* the one-row table into its parent.\nWhile this comes at additional computational cost upon delete, it maintains\nthe logarithmic depth properties as the HAMT changes its size.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"doc/img/table-gather.png\" width=\"450\"\u003e\u003c/img\u003e\n\u003c/p\u003e\n\u003cp class=\"image-caption\"\u003e\u003cb\u003eFigure 5:\u003c/b\u003e\nGathering pulls a one-row-sized table into its parent table (essentially\nconverting an internal node into a leaf node).\n\u003c/p\u003e\n\nThe code is straightforward: we take the allocator `alloc`, the `anchor`\npointer, and the position `pos` of the single-row table inside the parent\ntable, copy over the key and value from the child table to the parent\n(maintaining a temporary handle on the child) and then free the child table:\n\n```c\nhamt_node *table_gather(struct hamt_allocator *ator, hamt_node *anchor,\n                       uint32_t pos)\n{\n    int n_rows = get_popcount(INDEX(anchor));\n    hamt_node *table = TABLE(anchor);\n    KEY(anchor) = table[pos].as.kv.key;\n    VALUE(anchor) = table[pos].as.kv.value; /* already tagged */\n    table_free(ator, table, n_rows);\n    return anchor;\n}\n\n```\n\n**Table duplication.** Lastly, table duplication. This will be required for path\ncopying when we implement persistency and it is so straightforward that there\nis no diagram: given an anchor, `table_dup()` determines the size of the table\nthat the anchor points to, allocates the required memory and performs a range\ncopy using `memcpy()` to duplicate the table contents:\n\n```c\nhamt_node *table_dup(struct hamt_allocator *ator, hamt_node *anchor)\n{\n    int n_rows = get_popcount(INDEX(anchor));\n    hamt_node *new_table = table_allocate(ator, n_rows);\n    if (new_table) {\n        memcpy(\u0026new_table[0], \u0026TABLE(anchor)[0], n_rows * sizeof(hamt_node));\n    }\n    return new_table;\n}\n```\n\n## Putting it all together\n\nThe following subsections detail the implementations of search, insertion and\nremoval of key/value pairs in our HAMT implementation. Note that, while the\nimplementations shown here have been thoroughly tested and are deemed correct,\nthey may have been replaced by faster or more capable implementations in the\nactual `libhamt` source. An attempt is being made to keep this section up to\ndate with the actual implementation but the choice here is in favor of\nconceptual clarity and will not necessarily cover every implementation detail.\nPRs welcome.\n\n### Example data\n\n| key | key hash | binary key hash                          | 5-bit ints           |\n|-----|----------|------------------------------------------|----------------------|\n| \"0\" | d271c07f | `11 01001 00111 00011 10000 00011 11111` | [ 31  3 16  3  7 9 ] |\n| \"2\" | 0129e217 | `00 00000 10010 10011 11000 10000 10111` | [ 23 16 24 19 18 0 ] |\n| \"4\" | e131cc88 | `11 10000 10011 00011 10011 00100 01000` | [  8  4 19 3 19 16 ] |\n| \"7\" | 23ea8628 | `00 10001 11110 10101 00001 10001 01000` | [  8 17 1 21 30 17 ] |\n| \"8\" | bd920017 | `10 11110 11001 00100 00000 00000 10111` | [ 23 0  0  4 25 30 ] |\n\n### Search: internal API\n\nSearch plays a double role: finding a HAMT entry is a fundamental part of the\nHAMT interface (exposed by `hamt_get()`); and the first step in the insert and remove\nfunctions is finding the anchors to operate on.\n\nIt is therefore desirable to approach the search implementation from a\nmore generic perspective such that we do not need to re-invent the\nwheel for each of these use cases. We therefore define an internal search\nfunction\n\n```c\nstatic ... search_recursive(...);\n```\n\nthat is called from internal and the API functions alike. As the\nname implies, we implement search in a recursive manner (this is for clarity;\nconversion to an iterative solution is straightforward).\n\nWhen we search for a key in the HAMT, there are two fundamental outcomes: the\nkey is either there, or it is not (note that these are exactly the semantics\nof the user-facing `hamt_get()` function: it either returns a pointer to the\nvalue stored under the key or it returns `NULL`). However, looking more\nclosely, searches can fail for two reasons: the search can be unsuccessful\nbecause a key does not exist in the HAMT *or* it can be unsuccessful because\nthere is a key value pair that happens to have the same partial hash but a\ndifferent key (i.e. there is a hash collission or the hash has not been\nsufficiently exhausted to differentiate between the two keys).  And each of\nthese three cases is meaningful (the latter two corresponding directly to the\ntwo different insertion strategies described below).\n\nA good approach here is to define a ternary return value (as opposed to\nthe usual, binary use-`NULL`-as-a-failure-indicator approach that is often\nprevalent in C code) to allow us to signal each of these cases clearly.\n\nWe create a suitable three-value `enum` called `search_status`\n\n```c\ntypedef enum {\n    SEARCH_SUCCESS,\n    SEARCH_FAIL_NOTFOUND,\n    SEARCH_FAIL_KEYMISMATCH\n} search_status;\n```\n\nwhere `SEARCH_SUCCESS` indicates that the key in question was\nfound, `SEARCH_FAIL_NOTFOUND` indicates a search failure due to a missing key,\nand `SEARCH_FAIL_KEYMISMATCH` signals a hash conflict.\n\nIn order to return the result of a search (and not only its status), we\nintroduce a search result data type that is a bit more heavy-weight:\n\n```c\nstruct search_result {\n    search_status status;\n    hamt_node *anchor;\n    hamt_node *value;\n    hash_state *hash;\n};\n```\nHere, `anchor` always points to the anchor at which the search was terminated;\nif the search was successful, `value` points to the table row that holds the\nkey/value pair with matching key; if it was unsuccessful with a key mismatch,\n`value` points to the mismatching key/value pair; and if it was unsuccessful\nbecause the key did not exist, `value` equals `NULL`. Depending on the depth\nthat the search reached, we may have hit hash exhaustion and the hash may have\nbeen recalculated, so we are returning this here, too.\n\nGiven `struct search_result`, the return value of `search_recursive()`\nbecomes:\n\n```c\nstatic struct search_result search_recursive(...)\n{\n    // ...\n}\n```\n\nWith these prerequisites out of the way, we can tackle the actual search\nalgorithm:\n\n        search_recursive(anchor, hash, eq, key, ...):\n            if the current 5-bit sub-hash is a valid index in the current table: \n                if the index refers to a key/value pair:\n                    if the key matches the search key:\n                        return SEARCH_SUCCESS\n                    else:\n                        return SEARCH_FAIL_KEYMISMATCH\n                else (i.e. it refers to a sub-table):\n                    search_recursive(sub-table, hash_next(hash), eq, key)\n            else:\n                return SEARCH_FAIL_NOTFOUND\n\nThe basic idea is to start from the root of the HAMT and then, at every level,\ntest if the curret sub-hash of the key is present in the current sub-trie. If\nnot, bail and report failure immediately. If yes, check if the entry refers to\na key/value pair or to another table. If this is true as well, check if the\nkeys match and return success or failure accordingly. If the entry refers to\na sub-table, repeat the search at the level of the sub-table.\n\nWith the conceptual approach lined out, let's get into the implementation\ndetails.\nWe start with deriving the table index for the current search level from the\nhash. This is accomplished using \n`hash_get_index()`, which encapsulates the bit-fiddling required to extract\nthe correct 5-bit hash for the current search level and returns the index as\nan unsigned integer.\n\n```c\nstatic search_result search_recursive(hamt_node *anchor,\n                                      hash_state *hash,\n                                      hamt_cmp_fn cmp_eq,\n                                      const void *key, ...)\n{\n    uint32_t expected_index = hash_get_index(hash);\n    ...\n}\n```\n\nThe code then checks if the `expected_index` exists in the current table:\n\n```c\n    ...\n    if (has_index(anchor, expected_index)) {\n    ...\n    }\n```\n\nHere, `has_index()` is a simple helper function that checks if \nthe `INDEX(anchor)` bitfield has the bit set at `expected_index`:\n\n```c\nstatic inline bool has_expected_index(const hamt_node *anchor, size_t expected_index)\n{\n    return INDEX(anchor) \u0026 (1 \u003c\u003c expected_index);\n}\n```\n\nIf `has_index()` evaluates to false, the key does not exist in the HAMT and we\ncan immediately fail the search and return the result:\n\n```c\n{\n    uint32_t expected_index = hash_get_index(hash);\n    if (has_index(anchor, expected_index)) {\n        ...\n        ... \n        ...\n    }\n    search_result result = {.status = SEARCH_FAIL_NOTFOUND,\n                            .anchor = anchor,\n                            .value = NULL,\n                            .hash = hash};\n    return result;\n}\n```\n\nIf `has_index()` evaluates to true, we find the array index using\n`get_pos()` (see above), store it into `pos` and then acquire a pointer to the\n`next` node by addressing `pos` indices into the `anchor`'s table.\n\n```c\n{\n    ...\n    if (has_index(anchor, expected_index)) {\n        /* If yes, get the compact index to address the array */\n        int pos = get_pos(expected_index, INDEX(anchor));\n        /* Index into the table */\n        hamt_node *next = \u0026TABLE(anchor)[pos];\n        ...\n    }\n    ...\n}\n```\n\nIf the `next` node is not a value, we advance the hash state and recurse the\nsearch. If it is, we compare the keys and return success or failure\naccordingly:\n\n```c\n{\n        ...\n        /* Index into the table */\n        hamt_node *next = \u0026TABLE(anchor)[pos];\n        /* Are we looking at a value or another level of tables? */\n        if (is_value(VALUE(next))) {\n            if ((*cmp_eq)(key, KEY(next)) == 0) {\n                /* Found: keys match */\n                search_result result = {.status = SEARCH_SUCCESS,\n                                        .anchor = anchor,\n                                        .value = next,\n                                        .hash = hash};\n                return result;\n            }\n            /* Not found: same hash but different key */\n            search_result result = {.status = SEARCH_FAIL_KEYMISMATCH,\n                                    .anchor = anchor,\n                                    .value = next,\n                                    .hash = hash};\n            return result;\n        } else {\n            /* For table entries, recurse to the next level */\n            return search_recursive(next, hash_next(hash), cmp_eq, key);\n        }\n```\n\nThat concludes the implementation of the recursive search function and the\ncomplete implementation looks like this:\n\n```c\nstatic search_result search_recursive(hamt_node *anchor, hash_state *hash,\n                                      hamt_cmp_fn cmp_eq, const void *key)\n{\n    /* Determine the expected index in table */\n    uint32_t expected_index = hash_get_index(hash);\n    /* Check if the expected index is set */\n    if (has_index(anchor, expected_index)) {\n        /* If yes, get the compact index to address the array */\n        int pos = get_pos(expected_index, INDEX(anchor));\n        /* Index into the table */\n        hamt_node *next = \u0026TABLE(anchor)[pos];\n        /* Are we looking at a value or another level of tables? */\n        if (is_value(VALUE(next))) {\n            if ((*cmp_eq)(key, KEY(next)) == 0) {\n                /* Found: keys match */\n                search_result result = {.status = SEARCH_SUCCESS,\n                                        .anchor = anchor,\n                                        .value = next,\n                                        .hash = hash};\n                return result;\n            }\n            /* Not found: same hash but different key */\n            search_result result = {.status = SEARCH_FAIL_KEYMISMATCH,\n                                    .anchor = anchor,\n                                    .value = next,\n                                    .hash = hash};\n            return result;\n        } else {\n            /* For table entries, recurse to the next level */\n            return search_recursive(next, hash_next(hash), cmp_eq, key);\n        }\n    }\n    /* Not found: expected index is not set, key does not exist */\n    search_result result = {.status = SEARCH_FAIL_NOTFOUND,\n                            .anchor = anchor,\n                            .value = NULL,\n                            .hash = hash};\n    return result;\n}\n```\n\n### Search: external API\n\nThe external API for search is `hamt_get(trie, key)` which takes a `trie`\nand attempts to find (and return) a key/value pair specified by `key`. Its\nimplementation uses `search_recursive()` from above:\n\n```c\nconst void *hamt_get(const struct hamt *trie, void *key)\n{\n    hash_state *hash = \u0026(hash_state){.key = key,\n                                     .hash_fn = trie-\u003ekey_hash,\n                                     .hash = trie-\u003ekey_hash(key, 0),\n                                     .depth = 0,\n                                     .shift = 0};\n    search_result sr = search_recursive(trie-\u003eroot, hash, trie-\u003ekey_cmp, key,\n                                        NULL, trie-\u003eator);\n    if (sr.status == SEARCH_SUCCESS) {\n        return untagged(sr.VALUE(value));\n    }\n    return NULL;\n}\n```\n\nIn order to use `search_recursive()`, it is necessary to set up the hash state\nmanagement, initializing it with the `key`, the hashed `key`, and starting\nsearch from level `0` (corresponding to a shift of `0`). If the search is\nnot successful, the function returns `NULL`, if it is successful, it passes\na `void` pointer to the value that corresponds to `key`. Note the *untagging*\nof the `value` field since we're using it as a *tagged pointer* to indicate\nfield types.\n\n\n### Insert: internal functions\n\n`libhamt` does not support an explicit insertion function; all insertions into\nthe HAMT are *upserts*, i.e. after calling `hamt_set()` the API guarantees\nthat the requested key/value pair exists, irrespective of potential previous\nentries that may have had the same key but a different value.\n\nThe internal function that implements this behavior is `set()`:\n\n```c\nstatic const hamt_node *set(struct hamt *h, hamt_node *anchor, hamt_key_hash_fn hash_fn,\n                            hamt_cmp_fn cmp_fn, void *key, void *value)\n```\n\n`set()` takes a HAMT, an anchor in that HAMT, hashing and comparison\nfunctions as well as a key/value pair. After initializing the hash state, the\nfunction makes use of `search_recursive` to find the specified `key`. It deals\nwith three different search outcomes: (1) if the search is successful, the\nvalue of `key` gets replaced with the new `value`; (2) if the search is\nunsuccessful because the key does not exist, it attempts to insert a new\nkey/value pair at the appropriate position; and (3) if the search fails due to\na key mismatch (i.e. there is an entry at the expected hash position but its\nkey does not equal `key`), it extends the hash trie until the new key/value\npair can be placed correctly. Cases (2) and (3) are covered by the\n`insert_kv()` and `insert_table()` helper functions, respectively.\n\n```c\nstatic const hamt_node *set(struct hamt *h, hamt_node *anchor, hamt_key_hash_fn hash_fn,\n                            hamt_cmp_fn cmp_fn, void *key, void *value)\n{\n    hash_state *hash = \u0026(hash_state){.key = key,\n                                     .hash_fn = hash_fn,\n                                     .hash = hash_fn(key, 0),\n                                     .depth = 0,\n                                     .shift = 0};\n    search_result sr =\n        search_recursive(anchor, hash, cmp_fn, key, NULL, h-\u003eator);\n    const hamt_node *inserted;\n    switch (sr.status) {\n    case SEARCH_SUCCESS:\n        sr.VALUE(value) = tagged(value);\n        inserted = sr.value;\n        break;\n    case SEARCH_FAIL_NOTFOUND:\n        if ((inserted = insert_kv(sr.anchor, sr.hash, key, value, h-\u003eator)) !=\n            NULL) {\n            h-\u003esize += 1;\n        }\n        break;\n    case SEARCH_FAIL_KEYMISMATCH:\n        if ((inserted = insert_table(sr.value, sr.hash, key, value, h-\u003eator)) !=\n            NULL) {\n            h-\u003esize += 1;\n        }\n        break;\n    }\n    return inserted;\n}\n```\n\nIf the call to `search_recursive()` fails with `SEARCH_FAIL_NOTFOUND`, we know\nthat there is a free row in the table of `sr.anchor`. To insert the new\n`key`/`value` pair, we calculate the position of the `key` in the current\ntable: it extracts the 0-31 index position for the current key and stores it\ninto `ix`, extends the existing `INDEX(anchor)` index bitmap to include the\nnew key by setting the `ix`-th bit, and then calculates the dense index\nposition of the new entry via `get_pos()`. It then uses `table_extend()` to\nextend the table to the correct size and populates the `key` and `value`\nentries to reflect the new key/value pair. Note the pointer tagging on the\nvalue field to mark it as a key/value row in the table (as opposed to a row\nthat points to a sub-table).\n\n```c\nstatic const hamt_node *insert_kv(hamt_node *anchor, hash_state *hash,\n                                  void *key, void *value,\n                                  struct hamt_allocator *ator)\n{\n    /* calculate position in new table */\n    uint32_t ix = hash_get_index(hash);\n    uint32_t new_index = INDEX(anchor) | (1 \u003c\u003c ix);\n    int pos = get_pos(ix, new_index);\n    /* extend table */\n    size_t n_rows = get_popcount(INDEX(anchor));\n    anchor = table_extend(ator, anchor, n_rows, ix, pos);\n    if (!anchor)\n        return NULL;\n    hamt_node *new_table = TABLE(anchor);\n    /* set new k/v pair */\n    new_table[pos].as.kv.key = key;\n    new_table[pos].as.kv.value = tagged(value);\n    /* return a pointer to the inserted k/v pair */\n    return \u0026new_table[pos];\n}\n```\n\nWhen the call to `search_recursive()` in `set()` fails with\n`SEARCH_FAIL_KEYMISMATCH`, the situation is different: there is another entry\n(either a key/value pair or a reference to a sub-table) in the HAMT that\ncurrently occupies a transitionary trie location for `key`. This is expected\nto happen regularly: keys are always inserted with the shortest possible trie\npath that resolves hashing conflicts between *existing* keys. As more and more\nentries are added to the HAMT, these paths necessarily must increase in\nlength. This situation is handled by `insert_table()`:\n\n```c\nstatic const hamt_node *insert_table(hamt_node *anchor, hash_state *hash,\n                                     void *key, void *value,\n                                     struct hamt_allocator *ator)\n{\n    /* Collect everything we know about the existing value */\n    hash_state *x_hash =\n        \u0026(hash_state){.key = KEY(anchor),\n                      .hash_fn = hash-\u003ehash_fn,\n                      .hash = hash-\u003ehash_fn(KEY(anchor), hash-\u003edepth / 5),\n                      .depth = hash-\u003edepth,\n                      .shift = hash-\u003eshift};\n    void *x_value = VALUE(anchor); /* tagged (!) value ptr */\n    /* increase depth until the hashes diverge, building a list\n     * of tables along the way */\n    hash_state *next_hash = hash_next(hash);\n    hash_state *x_next_hash = hash_next(x_hash);\n    uint32_t next_index = hash_get_index(next_hash);\n    uint32_t x_next_index = hash_get_index(x_next_hash);\n    while (x_next_index == next_index) {\n        TABLE(anchor) = table_allocate(ator, 1);\n        INDEX(anchor) = (1 \u003c\u003c next_index);\n        next_hash = hash_next(next_hash);\n        x_next_hash = hash_next(x_next_hash);\n        next_index = hash_get_index(next_hash);\n        x_next_index = hash_get_index(x_next_hash);\n        anchor = TABLE(anchor);\n    }\n    /* the hashes are different, let's allocate a table with two\n     * entries to store the existing and new values */\n    TABLE(anchor) = table_allocate(ator, 2);\n    INDEX(anchor) = (1 \u003c\u003c next_index) | (1 \u003c\u003c x_next_index);\n    /* determine the proper position in the allocated table */\n    int x_pos = get_pos(x_next_index, INDEX(anchor));\n    int pos = get_pos(next_index, INDEX(anchor));\n    /* fill in the existing value; no need to tag the value pointer\n     * since it is already tagged. */\n    TABLE(anchor)[x_pos].as.kv.key = (void *)x_hash-\u003ekey;\n    TABLE(anchor)[x_pos].as.kv.value = x_value;\n    /* fill in the new key/value pair, tagging the pointer to the\n     * new value to mark it as a value ptr */\n    TABLE(anchor)[pos].as.kv.key = key;\n    TABLE(anchor)[pos].as.kv.value = tagged(value);\n\n    return \u0026TABLE(anchor)[pos];\n}\n```\n\n`insert_table()` works in three stages: (1) it initiatlizes the `hash_state`\nfor the current anchor; (2) creates a series of single-entry tables until the\nhashes of the current and new keys diverge; and (3) finally creates a new\ntable of size 2 that holds the old entry as well as the new key/value pair.\n\n### Insert: external API\n\nThe implementation of the external API for inserting and updating values in\nthe HAMT is straighforward:\n\n```c\nconst void *hamt_set(struct hamt *trie, void *key, void *value)\n{\n    const hamt_node *n =\n        set(trie, trie-\u003eroot, trie-\u003ekey_hash, trie-\u003ekey_cmp, key, value);\n    return VALUE(n);\n}\n```\n\n`hamt_set()` uses a vanilla call to the internal `set()` function and returns\na pointer to the value of the new key.\n\n\n### Remove\n\n### Iterators\n\n## Persistent data structures and structural sharing\n\n### Path copying\n\n### Insert\n\n### Remove\n\n# Appendix\n\n## Unit testing\n\nFor testing, `hamt` uses a variant of [John Brewer's `minunit` testing\nframework][brewer_xx_minunit]. Minunit is extremely minimalistic and its\nheader-only implementation easily fits on a single page:\n\n```c\n// test/minunit.h\n#ifndef MINUNIT_H\n#define MINUNIT_H\n\n#define MU_ASSERT(test, message)                                               \\\n    do {                                                                       \\\n        if (!(test))                                                           \\\n            return message;                                                    \\\n    } while (0)\n#define MU_RUN_TEST(test)                                                      \\\n    do {                                                                       \\\n        char *message = test();                                                \\\n        mu_tests_run++;                                                        \\\n        if (message)                                                           \\\n            return message;                                                    \\\n    } while (0)\n\n#define MU_TEST_CASE(name) static char *name()\n#define MU_TEST_SUITE(name) static char *name()\n\nextern int mu_tests_run;\n\n#endif /* !MINUNIT_H */\n```\n\nWith `minunit`, every unit test is a `MU_TEST_CASE` We use `MU_ASSERT` to test\nthe test invariants.  Test cases are grouped into `MU_TEST_SUITE`s as\nsequential calls to `MU_RUN_TEST`.  When an assertion fails, the `return`\nstatement in `MU_ASSERT` short-circuts test execution and returns a non-null\npointer to the respective `message` (generally a static string). This, in turn,\ncauses `MU_RUN_TEST` to issue a `return` call with the string pointer,\nshort-circuting the remaining test suite. The header also declares a global\nvariable `mu_tests_run` that keeps track of the total number of executed\ntests.\n\nThe following listing illustrates the basic structure of unit test\nimplementations with `minunit`, check the [actual tests](test/test_hamt.c) for\na full listing.\n\n```c\n// test/test_hamt.c\n#include \"minunit.h\"\n#include \"../src/hamt.c\"\n\nint mu_tests_run = 0;\n\nMU_TEST_CASE(test_dummy)\n{\n    /* do something here */\n    MU_ASSERT(0 == 0, \"Oops X-{\");\n    return 0;\n}\n\nMU_TEST_SUITE(test_suite)\n{\n    /* Add tests here */\n    MU_RUN_TEST(test_dummy);\n    /*\n     * ... many more ...\n     */\n    return 0;\n}\n\nint main()\n{\n    printf(\"---=[ Hash array mapped trie tests\\n\");\n    char *result = test_suite();\n    if (result != 0) {\n        printf(\"%s\\n\", result);\n    } else {\n        printf(\"All tests passed.\\n\");\n    }\n    printf(\"Tests run: %d\\n\", tests_run);\n    return result != 0;\n}\n```\n\nNote that the test setup `include`s the `hamt.c` implementation file. This is a\ncommon trick used in unit testing to gain easy access to testing `static`\nfunctions that would otherwise be inaccessible since they are local to the\n`hamt.c` compilation unit. This requires some care in\nthe Makefile setup in order to avoid symbol duplication.\n\n\n# Footnotes\n\n\u003cb id=\"fn_hash_table_cpp\"\u003e[1]\u003c/b\u003e\nThe `std::unordered_*` methods implement open hashing (aka separate chaining),\nwith the hash table being an array of buckets, each pointing to the head of a\nlinked list. This is a deliberate and reasonable compromise for general use;\ngaining an order of magnitude of speed improvements for specialized use cases\n(e.g. append-only, guaranteed high-quality hash functions) is possible. See\n[this stackoverflow post][cpp_unordered_map_impl] for a summary of the [standard\nproposal][austern_03_proposal].\n[↩](#ac_hash_table_cpp)\n\n\u003cb id=\"fn_hash_table_c\"\u003e[2]\u003c/b\u003e\n`musl` provides a `hsearch` implementation that uses closed hashing with\nquadratic probing for conflict resolution. The\n[documentation][musl_libc_hsearch] states that they use powers of two for\ntable sizing which seems wrong due to the impact on the modulo (table sizes\nshould ideally be prime). The GLib `GHashTable` has surprisingly little\ndocumentation in its implementation details but [appears to be\nusing][glib_hashtable] a separate chaining approach similar to the C++\nsolution.\n[↩](#ac_hash_table_c)\n  \n\u003cb id=\"fn_hash_table_python\"\u003e[3]\u003c/b\u003e Python's `dict` implementation uses\nclosed hashing (aka open addressing) with pseudo-random probing to mitigate\nthe poor hashing properties of standard python `hash()` function for some data\ntypes (from [here][python_dict_pre36]). Python keeps the load factor below\n0.66; this avoids gradual performance degradation associated w/ high load\nfactors in closed hashing but comes at increased memory footprint. The\n[codebase][python_dictobj] was refactored to split the actual data from the\nhash table in 3.6, resulting in better memory efficiency and GC friendliness\n(see [here][python_dict_impl36] and [here][python_dict_impl36_2]).\n[↩](#ac_hash_table_python)\n\n\u003cb id=\"fn_hash_table_java\"\u003e[4]\u003c/b\u003e Java provides `Hashtable\u003cK,V\u003e` and\n`HashMap\u003cK,V\u003e`, both of which implement `Map` and `Collection` interfaces; in\naddition, `Hashtable` is synchronized. The `HashSet` type internally uses a\n`HashMap`. `Hashtable` and `HashMap` implement open hashing\n(separate chaining) with a default load factor of 0.75; The OpenJDK\nimplementation of `HashMap` converts\nbetween linked list and tree representations in the hash buckets, depending on\nbucket size, see [the source][openjdk_java_util_hashmap].\n[↩](#ac_hash_table_java)\n\n\u003cb id=\"fn_cpp_virtual_method_table\"\u003e[5]\u003c/b\u003e\nThere are alternative approaches to enable (somewhat) typesafe templating in\nC, mainly by implementing what basically amounts to virtual method tables\nusing the C preprocessor. See e.g. [here][cpp_vmts] for a useful stackoverflow\nsummary or [here][c_templating] for a more in-depth treatise.\n[↩](#ac_cpp_virtual_method_table)\n\n[cpp_vmts]: https://stackoverflow.com/questions/10950828/simulation-of-templates-in-c-for-a-queue-data-type/11035347\n[c_templating]: http://blog.pkh.me/p/20-templating-in-c.html\n\n\n[austern_03_proposal]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html\n[bagwell_00_ideal]: https://lampwww.epfl.ch/papers/idealhashtrees.pdf\n[boehm_gc]: https://www.hboehm.info/gc/\n[brewer_xx_minunit]: http://www.jera.com/techinfo/jtns/jtn002.html\n[chaelim_hamt]: https://github.com/chaelim/HAMT\n[cormen_09_introduction]: https://www.amazon.com/Introduction-Algorithms-3rd-MIT-Press/dp/0262033844/ref=zg_bs_491298_1/147-2375898-2942653?pd_rd_i=0262033844\u0026psc=1\n[coyler_15_champ]: https://blog.acolyer.org/2015/11/27/hamt/\n[cpp_unordered_map_impl]: https://stackoverflow.com/a/31113618\n[driscoll_86_making]: https://www.cs.cmu.edu/~sleator/papers/another-persistence.pdf\n[glib_hashtable]: https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/ghash.c\n[hamt_bench_github]: https://github.com/mkirchner/hamt-bench\n[hickey_are_we_there_yet]: https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/AreWeThereYet.md\n[hickey_value_of_values]: https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/ValueOfValues.md\n[js_map]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map\n[krukov_09_understanding]: http://blog.higher-order.net/2009/09/08/understanding-clojures-persistenthashmap-deftwice.html\n[musl]: https://www.musl-libc.org\n[musl_libc_hsearch]: https://git.musl-libc.org/cgit/musl/tree/src/search/hsearch.c\n[openjdk_java_util_hashmap]: https://github.com/openjdk/jdk17/blob/74007890bb9a3fa3a65683a3f480e399f2b1a0b6/src/java.base/share/classes/java/util/HashMap.java\n[python_dict_impl36]: https://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html\n[python_dict_impl36_2]: https://mail.python.org/pipermail/python-dev/2012-December/123028.html\n[python_dict_pre36]: https://stackoverflow.com/a/9022835\n[python_dictobj]: https://github.com/python/cpython/blob/main/Objects/dictobject.c\n[sedgewick_11_algorithms]: https://www.amazon.com/Algorithms-4th-Robert-Sedgewick/dp/032157351X\n[steindoerfer_15_optimizing]: https://michael.steindorfer.name/publications/oopsla15.pdf\n[stutter]: https://github.com/mkirchner/stutter\n[wiki_associative_array]: https://en.wikipedia.org/wiki/Associative_array\n[wiki_avl_trees]: https://en.wikipedia.org/wiki/AVL_tree\n[wiki_b_trees]: https://en.wikipedia.org/wiki/B-tree\n[wiki_bsd_libc]:https://en.wikipedia.org/wiki/C_standard_library#BSD_libc\n[wiki_cryptographic_hash_function]: https://en.wikipedia.org/wiki/Cryptographic_hash_function\n[wiki_glib]: https://en.wikipedia.org/wiki/GLib\n[wiki_glibc]: https://en.wikipedia.org/wiki/Glibc\n[wiki_hash_function]: https://en.wikipedia.org/wiki/Hash_function\n[wiki_hash_table]: https://en.wikipedia.org/wiki/Hash_table\n[wiki_hash_tree]: https://en.wikipedia.org/wiki/Hash_tree_(persistent_data_structure)\n[wiki_immutable_object]: https://en.wikipedia.org/wiki/Immutable_object\n[wiki_libc]: https://en.wikipedia.org/wiki/C_standard_library\n[wiki_persistent]: https://en.wikipedia.org/wiki/Persistent_data_structure\n[wiki_persistent_data_structure]: https://en.wikipedia.org/wiki/Persistent_data_structure\n[wiki_persistent_structural_sharing]: https://en.wikipedia.org/wiki/Persistent_data_structure#Techniques_for_preserving_previous_versions\n[wiki_popcount]:https://en.wikipedia.org/wiki/Hamming_weight\n[wiki_red_black_trees]: https://en.wikipedia.org/wiki/Red–black_tree\n[wiki_referential_transparency]: https://en.wikipedia.org/wiki/Referential_transparency\n[wiki_set_adt]: https://en.wikipedia.org/wiki/Set_(abstract_data_type)\n[wiki_structural_sharing]: https://en.wikipedia.org/wiki/Persistent_data_structure#Trees\n[wiki_trie]: https://en.wikipedia.org/wiki/Trie\n[wiki_value_semantics]: https://en.wikipedia.org/wiki/Value_semantics\n[why_simple_hash_functions_work]: https://theoryofcomputing.org/articles/v009a030/v009a030.pdf\n[hamt_src_murmur]: ./src/murmur3.c\n[hamt_src_test_murmur]: https://github.com/mkirchner/hamt/blob/62a24e5501d72d5fb505d3c642113015f46904d3/test/test_hamt.c#L92\n[wiki_murmurhash]: https://en.wikipedia.org/wiki/MurmurHash\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkirchner%2Fhamt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkirchner%2Fhamt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkirchner%2Fhamt/lists"}