{"id":17179254,"url":"https://github.com/bytehamster/shockhash","last_synced_at":"2025-10-16T07:34:56.687Z","repository":{"id":189178518,"uuid":"680152770","full_name":"ByteHamster/ShockHash","owner":"ByteHamster","description":"Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force","archived":false,"fork":false,"pushed_at":"2025-02-08T16:20:44.000Z","size":1083,"stargazers_count":10,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-27T07:47:49.340Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ByteHamster.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-18T13:19:23.000Z","updated_at":"2025-02-08T16:20:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"e30e49cc-0e29-4381-a502-4debd0cb89b5","html_url":"https://github.com/ByteHamster/ShockHash","commit_stats":{"total_commits":191,"total_committers":3,"mean_commits":"63.666666666666664","dds":"0.010471204188481686","last_synced_commit":"8031b1663f57d939347e1f85926e18012e985e9f"},"previous_names":["bytehamster/shockhash"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FShockHash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FShockHash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FShockHash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FShockHash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ByteHamster","download_url":"https://codeload.github.com/ByteHamster/ShockHash/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248743772,"owners_count":21154738,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T00:25:21.052Z","updated_at":"2025-10-16T07:34:51.637Z","avatar_url":"https://github.com/ByteHamster.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ShockHash\n\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n![Build status](https://github.com/ByteHamster/ShockHash/actions/workflows/build.yml/badge.svg)\n\nA minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions.\nPerfect hash functions have applications in databases, bioinformatics, and as a building block of various space-efficient data structures.\n\nShockHash (**s**mall, **h**eavily **o**verloaded **c**uc**k**oo **hash** tables) is an MPHF that achieves space very close to the lower bound,\nwhile still being fast to construct.\nIn contrast to the simple brute-force approach that needs to try e^n = 2.72^n different hash function seeds,\nShockHash significantly reduces the search space.\nInstead of sampling hash functions hoping for them to be minimal perfect, it samples random graphs,\nhoping for them to be a pseudoforest.\nIn its most space-efficient variant, it can reduce the running time to just 1.16^n,\nwhile still being asymptotically space optimal.\n\nStill being an exponential time algorithm, we integrate ShockHash into several partitioning frameworks.\nOur implementation inside the [RecSplit](https://github.com/vigna/sux/blob/master/sux/function/RecSplit.hpp) framework achieves the best space efficiency.\nUsing ShockHash inside our novel k-perfect hash function achieves fast queries\nwhile still being faster to construct and more space efficient than any previous approaches.\n\n### Library Usage\n\nClone this repo and add the following to your `CMakeLists.txt`.\nNote that the repo has submodules, so either use `git clone --recursive` or `git submodule update --init --recursive`.\n\n```cmake\nadd_subdirectory(path/to/ShockHash)\ntarget_link_libraries(YourTarget PRIVATE ShockHash)\n```\n\nThen use one of the following classes:\n\n- [ShockHash](https://github.com/ByteHamster/ShockHash/blob/main/include/ShockHash.h) is the original ShockHash algorithm integrated into the RecSplit framework.\n- [SIMDShockHash](https://github.com/ByteHamster/ShockHash/blob/main/include/SIMDShockHash.hpp) is the SIMD-parallel version of the original ShockHash algorithm. Both ShockHash and the RecSplit framework are SIMD-parallelized. If this implementation is used on a machine without SIMD support, it is slower than the non-SIMD version because SIMD operations are emulated.\n- [ShockHash2](https://github.com/ByteHamster/ShockHash/blob/main/include/ShockHash2.h) is the bipartite ShockHash algorithm. Only the inner ShockHash loop is SIMD-parallel, the RecSplit framework is not. If this implementation is used on a machine without SIMD support, the implementation uses sequential operations without explicitly emulating SIMD. To turn off SIMD, change to SIMD lanes of size 1 in [ShockHash2-internal.h](https://github.com/ByteHamster/ShockHash/blob/main/include/ShockHash2-internal.h).\n\nConstructing a ShockHash perfect hash function is then straightforward:\n\n```cpp\nstd::vector\u003cstd::string\u003e keys = {\"abc\", \"def\", \"123\", \"456\"};\nshockhash::ShockHash\u003c30, false\u003e shockHash(keys, 2000); // ShockHash base case size n=30, bucket size b=2000\nstd::cout \u003c\u003c shockHash(\"abc\") \u003c\u003c \" \" \u003c\u003c shockHash(\"def\") \u003c\u003c \" \"\n          \u003c\u003c shockHash(\"123\") \u003c\u003c \" \" \u003c\u003c shockHash(\"456\") \u003c\u003c std::endl;\n// Output: 1 3 2 0\n```\n\nWe also give the base-case implementations without the RecSplit framework, which makes it easier to understand the main idea.\n\n- Original [ShockHash](https://github.com/ByteHamster/ShockHash/blob/main/benchmark/bijections/ShockHash1.h).\n- [Bipartite ShockHash](https://github.com/ByteHamster/ShockHash/blob/main/include/ShockHash2-internal.h). The outer loop that is also given in the pseudocode of the paper is given in `BijectionsShockHash2::findSeed`.\n\n### Construction performance\n\n[![Plots preview](https://raw.githubusercontent.com/ByteHamster/ShockHash/main/plots.png)](https://arxiv.org/abs/2310.14959)\n\n### Licensing\nShockHash is licensed exactly like `libstdc++` (GPLv3 + GCC Runtime Library Exception), which essentially means you can use it everywhere, exactly like `libstdc++`.\nYou can find details in the [COPYING](/COPYING) and [COPYING.RUNTIME](/COPYING.RUNTIME) files.\n\nIf you use [ShockHash](https://arxiv.org/abs/2308.09561) or [bipartite ShockHash](https://arxiv.org/abs/2310.14959) in an academic context or publication, please cite our papers:\n\n```\n@inproceedings{lehmann2023shockhash,\n  author = {Hans-Peter Lehmann and\n    Peter Sanders and\n    Stefan Walzer},\n  title = {{ShockHash}: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force},\n  booktitle = {{ALENEX}},\n  pages = {194--206},\n  publisher = {{SIAM}},\n  year = {2024},\n  doi = {10.1137/1.9781611977929.15}\n}\n\n@article{lehmann2023towardsArxiv,\n  author = {Hans-Peter Lehmann and\n    Peter Sanders and\n    Stefan Walzer},\n  title = {{ShockHash}: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force},\n  journal = {CoRR},\n  volume = {abs/2308.09561},\n  year = {2023},\n  doi = {10.48550/ARXIV.2308.09561}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Fshockhash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytehamster%2Fshockhash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Fshockhash/lists"}