{"id":15715341,"url":"https://github.com/cloudflare/trie-hard","last_synced_at":"2025-05-14T13:08:13.103Z","repository":{"id":256398018,"uuid":"853269684","full_name":"cloudflare/trie-hard","owner":"cloudflare","description":"Novel implementation of a Trie data structure optimized for small, sparse maps","archived":false,"fork":false,"pushed_at":"2024-11-07T16:44:51.000Z","size":1084,"stargazers_count":557,"open_issues_count":4,"forks_count":13,"subscribers_count":14,"default_branch":"main","last_synced_at":"2025-04-12T14:15:14.389Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cloudflare.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-06T10:20:11.000Z","updated_at":"2025-04-08T18:23:15.000Z","dependencies_parsed_at":"2024-12-05T23:01:20.021Z","dependency_job_id":"f5670a51-c1da-404c-98e8-b4ba02e4d716","html_url":"https://github.com/cloudflare/trie-hard","commit_stats":null,"previous_names":["cloudflare/trie-hard"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Ftrie-hard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Ftrie-hard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Ftrie-hard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Ftrie-hard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cloudflare","download_url":"https://codeload.github.com/cloudflare/trie-hard/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254149960,"owners_count":22022851,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T21:41:01.371Z","updated_at":"2025-05-14T13:08:08.087Z","avatar_url":"https://github.com/cloudflare.png","language":"Rust","funding_links":[],"categories":["Rust","Frameworks \u0026 Libraries"],"sub_categories":["Utilities \u0026 Processing"],"readme":"[![Crates.io](https://img.shields.io/crates/v/trie-hard.svg)](https://crates.io/crates/trie-hard)\n[![Documentation](https://docs.rs/trie-hard/badge.svg)](https://docs.rs/trie-hard/)\n\n# Don't just Trie, Trie Hard\n\nThis crate is an implementation of the [trie](https://en.wikipedia.org/wiki/Trie) data structure that is optimized for reading from small maps where a large number of misses are expected. This is still a work in progress in the sense that as none of the other features you would expect to find in a trie (like prefix search), but it is used in production in Cloudflare's [Pingora](https://blog.cloudflare.com/pingora-open-source) to detect and remove specific headers from 30 million requests every second before they are proxied to their final destinations.\n\n## Performance\n\nThere are several other trie implementations for rust that are more full-featured, so if you are looking for a more robust tool, you will probably want to check out [`radix_trie`](https://crates.io/crates/radix_trie) which seems to have the best features and performance. On the other hand, if you want raw speed and have the same narrow use case, you came to the right place!\n\nHere is a chart showing the time taken to read 10k entries from a map that consists of 119 entries containing only lower-case characters, numbers, and `-`. As you can see, when miss rate gets above 50% the performance of trie-hard surpasses `std::HashMap` and improves as miss rates get higher. \n\n![Trie Hard read is faster than HashMap for small maps where miss rate is high](https://github.com/cloudflare/trie-hard/blob/main/resources/HeadersVsHashMap.png?raw=true \"Header Read vs HashMap Benchmark\")\n\nThe improvement of performance as miss rate grows is one of the features of tries, and you can see the same trend in the performance of `radix-trie`. \n\n![Trie Hard read is faster than RadixTrie but follows the same curve](https://github.com/cloudflare/trie-hard/blob/main/resources/HeadersVsRadixTrie.png?raw=true \"Header Read vs RadixTrie Benchmark\")\n\nThis also shows the difference in performance between radix-trie and trie-hard, but this should not be interpreted as a reason to use trie-hard over radix-trie. radix-trie is a full-featured trie and balances the read performance with write performance. For fairness, here is a breakdown of the loading performance of the two trie implementations.\n\n| Item       | Time to load 15.5k words |\n| ---------- | ------------------------ |\n| Trie Hard  | 11.92 ms                 |\n| Radix Trie | 3.49 ms                  |\n\nFor insertion radix is ~3x times faster and also supports incremental changes whereas trie-hard is only designed for bulk loading.\n\n## How Does it Work?\n\nTrie Hard achieves its speed in 2 ways.\n\n1. All node and edge information is kept in contiguous regions on the heap. This prevents jumping around in memory during gets, and maximizes the chance of child nodes already appearing in cache.\n2. Relationships between nodes and edges is encoded into individual bits in unsigned integers.\n\n### Example\n\nLet's say we want to store the (uninteresting) strings, \"and\", \"ant\", \"dad\", \"do\", \"dot\" in our trie. Let's work through an example of how the data is organized and how we can read from it.\n\n#### Construction\n\nTrie-hard only supports bulk loading so, we run \n\n```rust\nlet trie = [\"and\", \"ant\", \"dad\", \"do\", \"dot\"].into_iter().collect::\u003cTrieHard\u003c'_, _\u003e\u003e();\n```\n\nThe first thing trie-hard does is find all the used bytes in the entries. It assigns each a unique 1-bit mask. For our simple case, this will look like\n\n| Byte   | Mask      |\n| ------ | --------- |\n| `b'a'` | `0b00001` |\n| `b'd'` | `0b00010` |\n| `b'n'` | `0b00100` |\n| `b'o'` | `0b01000` |\n| `b't'` | `0b10000` |\n\nThe number of masks required determines the underlying integer type used to represent the mask. In our case, we only have 5 bits, so the underlying type will be `u8`. The implication of needing a bit for each unique byte is that potentially we would require a 256-bit integer (`u256`), so an implementation of `U256` is provided as part of this crate. _Note_ The `U256` type should not be used directly. It only implements (and tests) the few operations needed to work with `trie-hard`. The use case `trie-hard` was designed for only needs to support at most 37 unique bytes, but in practice only 30 unique bytes appear in the stored map. This means we are storing our underlying tree information in `u32`s.\n\nNext we will construct the graph representing the tree starting with the bytes that appear first in the input strings. Only `a` and `d` appear in the first position, so for the node representing the first byte (and also the root of the trie), we create a mask indicating only `a` and `d` are allowed.\n\n```rust\nlet root = Node {\n    // a (0b00001) + d (0b00010) = 0b00011\n    mask: 0b11,\n    // ..\n}\n```\n\nThis tells us that if a byte other than `a` or `d` appears in the first position, the key being tested does not appear in the trie. This ability to make an exclusion decision at every step is what makes tries more appealing than even hashmaps in some cases. Searching for a string in a hashmap requires hashing the entire string whereas a trie can potentially determine that a string is not part of a set within a single byte.\n\nIf the byte is `a` or `d` we still need to know which node to go to next. All nodes in the graph are stored in a contiguous vector (with the root node at index zero). Each node will contain the information on where its child appears in the array of nodes. In our example the root node will point to nodes with indexes 1 and 2. Where 1 is the index with keys starting with `a` and 2 is the node for keys starting with `d`. It is important that these child nodes are ordered by their corresponding byte. \n\n```rust\nlet root = Node {\n    mask: 0b11,\n    // 1 -\u003e keys that start with b'a'\n    // 2 -\u003e keys that start with b'd'\n    children: vec![1, 2],\n    // ..\n}\n```\n\n\u003e _Note_. In the final map, the child indices are not stored in per-node vecs. They are instead stored in a central vec, and only the index into that vec corresponding with this node is stored in the node.\n\nAt this point we can visualize the conceptual trie and trie-hard like this\n\n| Conceptual                                                                                                                                                    | Trie-Hard                                                                                                                                                                                                       |\n| ------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| ![First Layer Conceptual Trie](https://github.com/cloudflare/trie-hard/blob/main/resources/FirstLayerVanilla.png?raw=true \"Header Read vs HashMap Benchmark\") | ![Trie Hard read is faster than HashMap for small maps where miss rate is high](https://github.com/cloudflare/trie-hard/blob/main/resources/FirstLayerTrieHard.png?raw=true \"Header Read vs HashMap Benchmark\") |\n\nBecause of the recursive nature of a trie, we can repeat the same process of creating a mask based on allowed bytes at each node and preparing a set of children for each node. When we reach a complete word that appears in the initial set, we need to signify that the node is a valid word. Visually we will mark them with green, but in rust they just appear as a different enum variant of `TrieNode`.\n\nAfter repeating for one more layer, we can visualize the trie like the this.\n\n| Conceptual                                                                                                                                                     | Trie-Hard                                                                                                                                                                                                        |\n| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| ![First Layer Conceptual Trie](https://github.com/cloudflare/trie-hard/blob/main/resources/SecondLayerVanilla.png?raw=true \"Header Read vs HashMap Benchmark\") | ![Trie Hard read is faster than HashMap for small maps where miss rate is high](https://github.com/cloudflare/trie-hard/blob/main/resources/SecondLayerTrieHard.png?raw=true \"Header Read vs HashMap Benchmark\") |\n\nNotice that `do` shows up as green because it is a complete word found in the original collection.\n\nFinally, we add the last layer and complete this small trie.\n\n\n| Conceptual                                                                                                                                          | Trie-Hard                                                                                                                                                                                             |\n| --------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| ![First Layer Conceptual Trie](https://github.com/cloudflare/trie-hard/blob/main/resources/Vanilla.png?raw=true \"Header Read vs HashMap Benchmark\") | ![Trie Hard read is faster than HashMap for small maps where miss rate is high](https://github.com/cloudflare/trie-hard/blob/main/resources/TrieHard.png?raw=true \"Header Read vs HashMap Benchmark\") |\n\n#### Reading \n\nWhen reading from the trie, we are checking a key to see if it is contained in the trie. As stated earlier, the benefit of a trie is that it lets lookups fail as fast as possible. Let's walk through the process of reading from trie-hard to see why. \n\nTrie's work on a per-byte basis, so each step of the read process takes a single byte from the input key, and checks the current node in the trie to see if is allowed. The first thing we do with the input byte is convert it from its `u8` value into its mask using the lookup table we created while constructing the trie. Let's say we want to do a lookup `dot`. We start with the first byte `d` and the current (root) node. \n\n`d` converts to a mask of `00010`. To determine if `d` is allowed in the root node, we take a bit-wise `\u0026` of their masks. If the result \u003e 0, then the byte is allowed. The mask for the root node is `00011`, and `00011 \u0026 00010 = 00010 \u003e 0`, so `d` is allowed. \n\nNext we need to determine which node to traverse for our input. This involves some more bit-wise arithmetic in the form of this **_one weird trick_**. We can count the number of set bits in the node's mask that are less-significant than the set bit in the input byte's mask. The number of set bits gives us the index in to the array of children. The formula/rust code for this is below. Notice it uses all bit-wise operations and intrinsics, so this operation equates to just a few cpu instructions in the compiled code.\n\n```rust\nlet child_index = ((input_mask - 1) \u0026 node.mask).count_ones()\n```\n\nApplying this formula to our current input mask and node, we get\n\n```rust\n//            ((input_mask - 1) \u0026 node.mask).count_ones()              \nchild_index = ((  0b00010  - 1) \u0026  0b00011 ).count_ones() // = 1\n```\n\nThe next node is the one at **index = 1** in the current node's child array.\n\nThe reason this works may actually easier to understand with a larger example. Consider this node from a larger trie (unrelated to our original example).\n\n```rust\nTrieNode {\n    // bits for - n   h  fd  a\n    mask: 0b000000100010011001,\n    // 5 possible children for 5 set bits in mask\n    children: [\n        3,  // \u003c-- a\n        6,  // \u003c-- d\n        9,  // \u003c-- f\n        10, // \u003c-- h\n        12  // \u003c-- n\n    ]\n}\n```\n\nIf we want to know which child to go to after receiving the byte `h`, we first need `h`'s mask = `000000000010000000`. Now if we step through the formula we see that subtracting 1 from the `h`'s mask gives us:\n\n```\n    000000000010000000\n  -                  1\n ----------------------\n    000000000001111111\n```\nNotice this gives us a new mask where _all_ bit with lower significance than `h`'s are set. Continuing to follow the formula, we apply a big-wise `\u0026` between the new mask and the node's mask, we get:\n\n```\n\n    000000100010011001\n  \u0026 000000000001111111\n ----------------------\n    000000000000011001\n```\n\nThe bits in this new mask are exactly what we wanted: only the set bits from the node's mask that have a lower significance than `h`'s. The last step is to call [`count_ones`](https://doc.rust-lang.org/std/primitive.u32.html#method.count_ones) which is provided as an intrinsic by llvm. This gives us the answer of **3**.\n\nGoing back to our original example, we now repeat the same steps for the next input byte and the new current node. This process continues until we run out of input or encounter a node that will not accept the current byte. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudflare%2Ftrie-hard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloudflare%2Ftrie-hard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudflare%2Ftrie-hard/lists"}