{"id":15047643,"url":"https://github.com/martinus/unordered_dense","last_synced_at":"2025-05-14T21:07:47.333Z","repository":{"id":39249911,"uuid":"506872145","full_name":"martinus/unordered_dense","owner":"martinus","description":"A fast \u0026 densely stored hashmap and hashset based on robin-hood backward shift deletion","archived":false,"fork":false,"pushed_at":"2025-02-02T08:20:34.000Z","size":1634,"stargazers_count":1051,"open_issues_count":14,"forks_count":83,"subscribers_count":22,"default_branch":"main","last_synced_at":"2025-04-13T18:44:34.091Z","etag":null,"topics":["c-plus-plus","cpp","cpp17","hash","hash-tables","header-only-library","no-dependencies","stl-containers","unordered-map","unordered-set"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/martinus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["martinus"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2022-06-24T04:03:59.000Z","updated_at":"2025-04-13T06:30:36.000Z","dependencies_parsed_at":"2023-12-22T18:28:54.066Z","dependency_job_id":"0e7b783c-786d-41a2-96bb-30949714e744","html_url":"https://github.com/martinus/unordered_dense","commit_stats":null,"previous_names":[],"tags_count":30,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martinus%2Funordered_dense","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martinus%2Funordered_dense/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martinus%2Funordered_dense/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martinus%2Funordered_dense/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/martinus","download_url":"https://codeload.github.com/martinus/unordered_dense/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254227613,"owners_count":22035670,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","cpp","cpp17","hash","hash-tables","header-only-library","no-dependencies","stl-containers","unordered-map","unordered-set"],"created_at":"2024-09-24T21:02:06.808Z","updated_at":"2025-05-14T21:07:42.322Z","avatar_url":"https://github.com/martinus.png","language":"C++","readme":"\u003ca id=\"top\"\u003e\u003c/a\u003e\n\n[![Release](https://img.shields.io/github/release/martinus/unordered_dense.svg)](https://github.com/martinus/unordered_dense/releases)\n[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/martinus/unordered_dense/main/LICENSE)\n[![meson_build_test](https://github.com/martinus/unordered_dense/actions/workflows/main.yml/badge.svg)](https://github.com/martinus/unordered_dense/actions)\n[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/6220/badge)](https://bestpractices.coreinfrastructure.org/projects/6220)\n[![Sponsors](https://img.shields.io/github/sponsors/martinus?style=social)](https://github.com/sponsors/martinus)\n\n# 🚀 ankerl::unordered_dense::{map, set} \u003c!-- omit in toc --\u003e\n\nA fast \u0026 densely stored hashmap and hashset based on robin-hood backward shift deletion for C++17 and later.\n\nThe classes `ankerl::unordered_dense::map` and `ankerl::unordered_dense::set` are (almost) drop-in replacements of `std::unordered_map` and `std::unordered_set`. While they don't have as strong iterator / reference stability guaranties, they are typically *much* faster.\n\nAdditionally, there are `ankerl::unordered_dense::segmented_map` and `ankerl::unordered_dense::segmented_set` with lower peak memory usage. and stable iterator/references on insert.\n\n- [1. Overview](#1-overview)\n- [2. Installation](#2-installation)\n  - [2.1. Installing using cmake](#21-installing-using-cmake)\n- [3. Usage](#3-usage)\n  - [3.1. Modules](#31-modules)\n  - [3.2. Hash](#32-hash)\n    - [3.2.1. Simple Hash](#321-simple-hash)\n    - [3.2.2. High Quality Hash](#322-high-quality-hash)\n    - [3.2.3. Specialize `ankerl::unordered_dense::hash`](#323-specialize-ankerlunordered_densehash)\n    - [3.2.4. Heterogeneous Overloads using `is_transparent`](#324-heterogeneous-overloads-using-is_transparent)\n    - [3.2.5. Automatic Fallback to `std::hash`](#325-automatic-fallback-to-stdhash)\n    - [3.2.6. Hash the Whole Memory](#326-hash-the-whole-memory)\n  - [3.3. Container API](#33-container-api)\n    - [3.3.1. `auto extract() \u0026\u0026 -\u003e value_container_type`](#331-auto-extract----value_container_type)\n    - [3.3.2. `extract()` single Elements](#332-extract-single-elements)\n    - [3.3.3. `[[nodiscard]] auto values() const noexcept -\u003e value_container_type const\u0026`](#333-nodiscard-auto-values-const-noexcept---value_container_type-const)\n    - [3.3.4. `auto replace(value_container_type\u0026\u0026 container)`](#334-auto-replacevalue_container_type-container)\n  - [3.4. Custom Container Types](#34-custom-container-types)\n  - [3.5. Custom Bucket Types](#35-custom-bucket-types)\n    - [3.5.1. `ankerl::unordered_dense::bucket_type::standard`](#351-ankerlunordered_densebucket_typestandard)\n    - [3.5.2. `ankerl::unordered_dense::bucket_type::big`](#352-ankerlunordered_densebucket_typebig)\n- [4. `segmented_map` and `segmented_set`](#4-segmented_map-and-segmented_set)\n- [5. Design](#5-design)\n  - [5.1. Inserts](#51-inserts)\n  - [5.2. Lookups](#52-lookups)\n  - [5.3. Removals](#53-removals)\n- [6. Real World Usage](#6-real-world-usage)\n\n## 1. Overview\n\nThe chosen design has a few advantages over `std::unordered_map`: \n\n* Perfect iteration speed - Data is stored in a `std::vector`, all data is contiguous!\n* Very fast insertion \u0026 lookup speed, in the same ballpark as [`absl::flat_hash_map`](https://abseil.io/docs/cpp/guides/container`)\n* Low memory usage\n* Full support for `std::allocators`, and [polymorphic allocators](https://en.cppreference.com/w/cpp/memory/polymorphic_allocator). There are `ankerl::unordered_dense::pmr` typedefs available\n* Customizeable storage type: with a template parameter you can e.g. switch from `std::vector` to `boost::interprocess::vector` or any other compatible random-access container.\n* Better debugging: the underlying data can be easily seen in any debugger that can show an `std::vector`.\n\nThere's no free lunch, so there are a few disadvantages:\n\n* Deletion speed is relatively slow. This needs two lookups: one for the element to delete, and one for the element that is moved onto the newly empty spot.\n* no `const Key` in `std::pair\u003cKey, Value\u003e`\n* Iterators and references are not stable on insert or erase.\n\n## 2. Installation\n\n\u003c!-- See https://github.com/bernedom/SI/blob/main/doc/installation-guide.md --\u003e\nThe default installation location is `/usr/local`.\n\n### 2.1. Installing using cmake \n\nClone the repository and run these commands in the cloned folder:\n\n```sh\nmkdir build \u0026\u0026 cd build\ncmake ..\ncmake --build . --target install\n```\n\nConsider setting an install prefix if you do not want to install `unordered_dense` system wide, like so:\n\n```sh\nmkdir build \u0026\u0026 cd build\ncmake -DCMAKE_INSTALL_PREFIX:PATH=${HOME}/unordered_dense_install ..\ncmake --build . --target install\n```\n\nTo make use of the installed library, add this to your project:\n\n```cmake\nfind_package(unordered_dense CONFIG REQUIRED)\ntarget_link_libraries(your_project_name unordered_dense::unordered_dense)\n```\n\n## 3. Usage\n\n### 3.1. Modules\n\n`ankerl::unordered_dense` supports c++20 modules. Simply compile `src/ankerl.unordered_dense.cpp` and use the resulting module, e.g. like so:\n\n```sh\nclang++ -std=c++20 -I include --precompile -x c++-module src/ankerl.unordered_dense.cpp\nclang++ -std=c++20 -c ankerl.unordered_dense.pcm\n```\n\nTo use the module with e.g. in `module_test.cpp`, use \n\n```cpp\nimport ankerl.unordered_dense;\n```\n\nand compile with e.g.\n\n```sh\nclang++ -std=c++20 -fprebuilt-module-path=. ankerl.unordered_dense.o module_test.cpp -o main\n```\n\nA simple demo script can be found in `test/modules`.\n\n### 3.2. Hash\n\n`ankerl::unordered_dense::hash` is a fast and high quality hash, based on [wyhash](https://github.com/wangyi-fudan/wyhash). The `ankerl::unordered_dense` map/set differentiates between hashes of high quality (good [avalanching effect](https://en.wikipedia.org/wiki/Avalanche_effect)) and bad quality. Hashes with good quality contain a special marker:\n\n```cpp\nusing is_avalanching = void;\n```\n\nThis is the cases for the specializations `bool`, `char`, `signed char`, `unsigned char`, `char8_t`, `char16_t`, `char32_t`, `wchar_t`, `short`, `unsigned short`, `int`, `unsigned int`, `long`, `long long`, `unsigned long`, `unsigned long long`, `T*`, `std::unique_ptr\u003cT\u003e`, `std::shared_ptr\u003cT\u003e`, `enum`, `std::basic_string\u003cC\u003e`, and `std::basic_string_view\u003cC\u003e`.\n\nHashes that do not contain such a marker are assumed to be of bad quality and receive an additional mixing step inside the map/set implementation.\n\n#### 3.2.1. Simple Hash\n\nConsider a simple custom key type:\n\n```cpp\nstruct id {\n    uint64_t value{};\n\n    auto operator==(id const\u0026 other) const -\u003e bool {\n        return value == other.value;\n    }\n};\n```\n\nThe simplest implementation of a hash is this:\n\n```cpp\nstruct custom_hash_simple {\n    auto operator()(id const\u0026 x) const noexcept -\u003e uint64_t {\n        return x.value;\n    }\n};\n```\nThis can be used e.g. with \n\n```cpp\nauto ids = ankerl::unordered_dense::set\u003cid, custom_hash_simple\u003e();\n```\n\nSince `custom_hash_simple` doesn't have a `using is_avalanching = void;` marker it is considered to be of bad quality and additional mixing of `x.value` is automatically provided inside the set.\n\n#### 3.2.2. High Quality Hash\n\nBack to the `id` example, we can easily implement a higher quality hash:\n\n```cpp\nstruct custom_hash_avalanching {\n    using is_avalanching = void;\n\n    auto operator()(id const\u0026 x) const noexcept -\u003e uint64_t {\n        return ankerl::unordered_dense::detail::wyhash::hash(x.value);\n    }\n};\n```\n\nWe know `wyhash::hash` is of high quality, so we can add `using is_avalanching = void;` which makes the map/set directly use the returned value.\n\n\n#### 3.2.3. Specialize `ankerl::unordered_dense::hash`\n\nInstead of creating a new class you can also specialize `ankerl::unordered_dense::hash`:\n\n```cpp\ntemplate \u003c\u003e\nstruct ankerl::unordered_dense::hash\u003cid\u003e {\n    using is_avalanching = void;\n\n    [[nodiscard]] auto operator()(id const\u0026 x) const noexcept -\u003e uint64_t {\n        return detail::wyhash::hash(x.value);\n    }\n};\n```\n\n#### 3.2.4. Heterogeneous Overloads using `is_transparent`\n\nThis map/set supports heterogeneous overloads as described in [P2363 Extending associative containers with the remaining heterogeneous overloads](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2363r3.html) which is [targeted for C++26](https://wg21.link/p2077r2). This has overloads for `find`, `count`, `contains`, `equal_range` (see [P0919R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0919r3.html)), `erase` (see [P2077R2](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2077r2.html)), and  `try_emplace`, `insert_or_assign`, `operator[]`, `at`, and `insert` \u0026 `emplace` for sets (see [P2363R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2363r3.html)).\n\nFor heterogeneous overloads to take affect, both `hasher` and `key_equal` need to have the attribute `is_transparent` set.\n\nHere is an example implementation that's usable with any string types that is convertible to `std::string_view` (e.g. `char const*` and `std::string`):\n\n```cpp\nstruct string_hash {\n    using is_transparent = void; // enable heterogeneous overloads\n    using is_avalanching = void; // mark class as high quality avalanching hash\n\n    [[nodiscard]] auto operator()(std::string_view str) const noexcept -\u003e uint64_t {\n        return ankerl::unordered_dense::hash\u003cstd::string_view\u003e{}(str);\n    }\n};\n```\n\nTo make use of this hash you'll need to specify it as a type, and also a `key_equal` with `is_transparent` like [std::equal_to\u003c\u003e](https://en.cppreference.com/w/cpp/utility/functional/equal_to_void):\n\n```cpp\nauto map = ankerl::unordered_dense::map\u003cstd::string, size_t, string_hash, std::equal_to\u003c\u003e\u003e();\n```\n\nFor more information see the examples in `test/unit/transparent.cpp`.\n\n\n#### 3.2.5. Automatic Fallback to `std::hash`\n\nWhen an implementation for `std::hash` of a custom type is available, this is automatically used and assumed to be of bad quality (thus `std::hash` is used, but an additional mixing step is performed).\n\n\n#### 3.2.6. Hash the Whole Memory\n\nWhen the type [has a unique object representation](https://en.cppreference.com/w/cpp/types/has_unique_object_representations) (no padding, trivially copyable), one can just hash the object's memory. Consider a simple class\n\n```cpp\nstruct point {\n    int x{};\n    int y{};\n\n    auto operator==(point const\u0026 other) const -\u003e bool {\n        return x == other.x \u0026\u0026 y == other.y;\n    }\n};\n```\n\nA fast and high quality hash can be easily provided like so:\n\n```cpp\nstruct custom_hash_unique_object_representation {\n    using is_avalanching = void;\n\n    [[nodiscard]] auto operator()(point const\u0026 f) const noexcept -\u003e uint64_t {\n        static_assert(std::has_unique_object_representations_v\u003cpoint\u003e);\n        return ankerl::unordered_dense::detail::wyhash::hash(\u0026f, sizeof(f));\n    }\n};\n```\n\n### 3.3. Container API\n\nIn addition to the standard `std::unordered_map` API (see https://en.cppreference.com/w/cpp/container/unordered_map) we have additional API that is somewhat similar to the node API, but leverages the fact that we're using a random access container internally:\n\n#### 3.3.1. `auto extract() \u0026\u0026 -\u003e value_container_type`\n\nExtracts the internally used container. `*this` is emptied.\n\n#### 3.3.2. `extract()` single Elements\n\nSimilar to `erase()` I have an API call `extract()`. It behaves exactly the same as `erase`, except that the return value is the moved element that is removed from the container:\n\n* `auto extract(const_iterator it) -\u003e value_type`\n* `auto extract(Key const\u0026 key) -\u003e std::optional\u003cvalue_type\u003e`\n* `template \u003cclass K\u003e auto extract(K\u0026\u0026 key) -\u003e std::optional\u003cvalue_type\u003e`\n\nNote that the `extract(key)` API returns an `std::optional\u003cvalue_type\u003e` that is empty when the key is not found.\n\n#### 3.3.3. `[[nodiscard]] auto values() const noexcept -\u003e value_container_type const\u0026`\n\nExposes the underlying values container.\n\n#### 3.3.4. `auto replace(value_container_type\u0026\u0026 container)`\n\nDiscards the internally held container and replaces it with the one passed. Non-unique elements are\nremoved, and the container will be partly reordered when non-unique elements are found.\n\n### 3.4. Custom Container Types\n\n`unordered_dense` accepts a custom allocator, but you can also specify a custom container for that template argument. That way it is possible to replace the internally used `std::vector` with e.g. `std::deque` or any other container like `boost::interprocess::vector`. This supports fancy pointers (e.g. [offset_ptr](https://www.boost.org/doc/libs/1_80_0/doc/html/interprocess/offset_ptr.html)), so the container can be used with e.g. shared memory provided by `boost::interprocess`.\n\n### 3.5. Custom Bucket Types\n\nThe map/set supports two different bucket types. The default should be good for pretty much everyone.\n\n#### 3.5.1. `ankerl::unordered_dense::bucket_type::standard`\n\n* Up to 2^32 = 4.29 billion elements.\n* 8 bytes overhead per bucket.\n\n#### 3.5.2. `ankerl::unordered_dense::bucket_type::big`\n\n* up to 2^63 = 9223372036854775808 elements.\n* 12 bytes overhead per bucket.\n\n## 4. `segmented_map` and `segmented_set`\n\n`ankerl::unordered_dense` provides a custom container implementation that has lower memory requirements than the default `std::vector`. Memory is not contiguous, but it can allocate segments without having to reallocate and move all the elements. In summary, this leads to\n\n* Much smoother memory usage, memory usage increases continuously.\n* No high peak memory usage.\n* Faster insertion because elements never need to be moved to new allocated blocks\n* Slightly slower indexing compared to `std::vector` because an additional indirection is needed.\n\nHere is a comparison against `absl::flat_hash_map` and the `ankerl::unordered_dense::map` when inserting 10 million entries\n![allocated memory](doc/allocated_memory.png)\n\nAbseil is fastest for this simple inserting test, taking a bit over 0.8 seconds. It's peak memory usage is about 430 MB. Note how the memory usage goes down after the last peak; when it goes down to ~290MB it has finished rehashing and could free the previously used memory block.\n\n`ankerl::unordered_dense::segmented_map` doesn't have these peaks, and instead has a smooth increase of memory usage. Note there are still sudden drops \u0026 increases in memory because the indexing data structure needs still needs to increase by a fixed factor. But due to holding the data in a separate container we are able to first free the old data structure, and then allocate a new, bigger indexing structure; thus we do not have peaks.\n\n## 5. Design\n\nThe map/set has two data structures:\n* `std::vector\u003cvalue_type\u003e` which holds all data. map/set iterators are just `std::vector\u003cvalue_type\u003e::iterator`!\n* An indexing structure (bucket array), which is a flat array with 8-byte buckets.\n\n### 5.1. Inserts\n\nWhenever an element is added it is `emplace_back` to the vector. The key is hashed, and an entry (bucket) is added at the\ncorresponding location in the bucket array. The bucket has this structure:\n\n```cpp\nstruct Bucket {\n    uint32_t dist_and_fingerprint;\n    uint32_t value_idx;\n};\n```\n\nEach bucket stores 3 things:\n* The distance of that value from the original hashed location (3 most significant bytes in `dist_and_fingerprint`)\n* A fingerprint; 1 byte of the hash (lowest significant byte in `dist_and_fingerprint`)\n* An index where in the vector the actual data is stored.\n\nThis structure is especially designed for the collision resolution strategy robin-hood hashing with backward shift\ndeletion.\n\n### 5.2. Lookups\n\nThe key is hashed and the bucket array is searched if it has an entry at that location with that fingerprint. When found,\nthe key in the data vector is compared, and when equal the value is returned.\n\n### 5.3. Removals\n\nSince all data is stored in a vector, removals are a bit more complicated:\n\n1. First, lookup the element to delete in the index array.\n2. When found, replace that element in the vector with the last element in the vector. \n3. Update *two* locations in the bucket array: First remove the bucket for the removed element\n4. Then, update the `value_idx` of the moved element. This requires another lookup.\n\n\n## 6. Real World Usage\n\nOn 2023-09-10 I did a quick search on github to see if this map is used in any popular open source projects. Here are some of the projects\nI found. Please send me a note if you want on that list!\n\n* [PruaSlicer](https://github.com/prusa3d/PrusaSlicer) -  G-code generator for 3D printers (RepRap, Makerbot, Ultimaker etc.) \n* [Kismet](https://github.com/kismetwireless/kismet): Wi-Fi, Bluetooth, RF, and more. Kismet is a sniffer, WIDS, and wardriving tool for Wi-Fi, Bluetooth, Zigbee, RF, and more, which runs on Linux and macOS\n* [Rspamd](https://github.com/rspamd/rspamd) - Fast, free and open-source spam filtering system.\n* [kallisto](https://github.com/pachterlab/kallisto) -  Near-optimal RNA-Seq quantification\n* [Slang](https://github.com/shader-slang/slang) - Slang is a shading language that makes it easier to build and maintain large shader codebases in a modular and extensible fashion.\n* [CyberFSR2](https://github.com/PotatoOfDoom/CyberFSR2) - Drop-in DLSS replacement with FSR 2.0 for various games such as Cyberpunk 2077.\n* [ossia score](https://github.com/ossia/score) - A free, open-source, cross-platform intermedia sequencer for precise and flexible scripting of interactive scenarios. \n* [HiveWE](https://github.com/stijnherfst/HiveWE) - A Warcraft III World Editor (WE) that focusses on speed and ease of use.\n* [opentxs](https://github.com/Open-Transactions/opentxs) - The Open-Transactions project is a collaborative effort to develop a robust, commercial-grade, fully-featured, free-software toolkit implementing the OTX protocol as well as a full-strength financial cryptography library, API, GUI, command-line interface, and prototype notary server.\n* [LuisaCompute](https://github.com/LuisaGroup/LuisaCompute) - High-Performance Rendering Framework on Stream Architectures\n* [Lethe](https://github.com/lethe-cfd/lethe) - Lethe (pronounced /ˈliːθiː/) is open-source computational fluid dynamics (CFD) software which uses high-order continuous Galerkin formulations to solve the incompressible Navier–Stokes equations (among others).\n* [PECOS](https://github.com/amzn/pecos) - PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval.\n* [Operon](https://github.com/heal-research/operon) - A modern C++ framework for symbolic regression that uses genetic programming to explore a hypothesis space of possible mathematical expressions in order to find the best-fitting model for a given regression target.\n* [MashMap](https://github.com/marbl/MashMap) - A fast approximate aligner for long DNA sequences\n* [minigpt4.cpp](https://github.com/Maknee/minigpt4.cpp) - Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)\n","funding_links":["https://github.com/sponsors/martinus"],"categories":["Containers"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartinus%2Funordered_dense","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmartinus%2Funordered_dense","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartinus%2Funordered_dense/lists"}