{"id":27106409,"url":"https://github.com/bytehamster/consensusrecsplit","last_synced_at":"2026-01-04T17:10:10.056Z","repository":{"id":285387186,"uuid":"901847230","full_name":"ByteHamster/ConsensusRecSplit","owner":"ByteHamster","description":"ConsensusRecSplit is a perfect hash function with very small space consumption based on Combined Search and Encoding of Successful Seeds (Consensus)","archived":false,"fork":false,"pushed_at":"2025-03-31T12:17:06.000Z","size":226,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-31T13:33:57.713Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ByteHamster.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-11T12:31:52.000Z","updated_at":"2025-03-31T12:17:09.000Z","dependencies_parsed_at":"2025-03-31T13:33:59.809Z","dependency_job_id":"5bcb62fb-5d0b-4ed1-8bea-307709a2c98a","html_url":"https://github.com/ByteHamster/ConsensusRecSplit","commit_stats":null,"previous_names":["bytehamster/consensusrecsplit"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FConsensusRecSplit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FConsensusRecSplit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FConsensusRecSplit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FConsensusRecSplit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ByteHamster","download_url":"https://codeload.github.com/ByteHamster/ConsensusRecSplit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247543581,"owners_count":20955865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-06T19:55:48.380Z","updated_at":"2026-01-04T17:10:10.050Z","avatar_url":"https://github.com/ByteHamster.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ConsensusRecSplit\n\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n![Build status](https://github.com/ByteHamster/ConsensusRecSplit/actions/workflows/build.yml/badge.svg)\n\nA minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions.\nPerfect hash functions have applications in databases, bioinformatics, and as a building block of various space-efficient data structures.\n\nConsensusRecSplit is a perfect hash function with very small space consumption.\nIt is based on *Combined Search and Encoding of Successful Seeds* (Consensus), applied to\nthe recursive splitting idea of [RecSplit](https://github.com/vigna/sux/blob/master/sux/function/RecSplit.hpp).\nCompared to previous approaches, ConsensusRecSplit achieves a space consumption that is orders of magnitude closer to the lower bound.\nOn 100 million keys and about an hour of construction time, it achieves a stunning 1.4448 bits per key, while the lower bound is 1.4427 bits per key.\nRecSplit achieves 1.6127 bits per key in the same construction time.\n\nWhile the RecSplit tree with Consensus has polynomial running time, the first splittings touch a large number of keys, hurting cache locality.\nThis is why we combine it with a simple [threshold-based k-perfect hash function](https://arxiv.org/abs/2310.14959).\nWe then perform combined search and encoding on the splitting seeds, while also combining the k-perfect buckets with one another.\nThe k-perfect hash function itself currently does not use Consensus, even though it should in the future to improve space efficiency.\nThe bucket size (k) gives a trade-off between query performance, construction performance, and space consumption.\nRather large k such as 32768 work best in our experiments.\n\n### Construction Performance with 100M Keys\n\n![Plot](plot.png)\n\n### Library usage\n\nClone this repository (with submodules) and add the following to your `CMakeLists.txt`.\n\n```\nadd_subdirectory(path/to/ConsensusRecSplit)\ntarget_link_libraries(YourTarget PRIVATE ConsensusRecSplit)\n```\n\nYou can construct a ConsensusRecSplit perfect hash function as follows.\n\n```cpp\nstd::vector\u003cstd::string\u003e keys = {\"abc\", \"def\", \"123\", \"456\"};\nconsensus::ConsensusRecSplit\u003c/* k */ 4096, /* overhead */ 0.01\u003e hashFunc(keys);\nstd::cout \u003c\u003c hashFunc(\"abc\") \u003c\u003c std::endl;\n```\n\n### Licensing\nThis code is licensed under the [GPLv3](/LICENSE).\n\nIf you use this work in an academic context or publication, please cite our paper:\n\n```\n@article{lehmann2025consensus,\n  author = {Hans-Peter Lehmann and Peter Sanders and Stefan Walzer and Jonatan Ziegler},\n  title = {Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing},\n  journal = {CoRR},\n  volume = {abs/2502.05613},\n  year = {2025},\n  doi = {10.48550/ARXIV.2502.05613}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Fconsensusrecsplit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytehamster%2Fconsensusrecsplit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Fconsensusrecsplit/lists"}