{"id":17179292,"url":"https://github.com/bytehamster/gpurecsplit","last_synced_at":"2026-01-04T15:16:13.689Z","repository":{"id":64961822,"uuid":"552030573","full_name":"ByteHamster/GpuRecSplit","owner":"ByteHamster","description":"Parallel space-efficient minimal perfect hash function on SIMD and GPU","archived":false,"fork":false,"pushed_at":"2025-03-31T12:18:47.000Z","size":1163,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-13T01:42:50.020Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ByteHamster.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-10-15T16:55:39.000Z","updated_at":"2025-03-31T12:18:51.000Z","dependencies_parsed_at":"2023-11-13T09:27:23.851Z","dependency_job_id":"85924653-e522-4c45-8a7f-dc0d9a778eb1","html_url":"https://github.com/ByteHamster/GpuRecSplit","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FGpuRecSplit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FGpuRecSplit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FGpuRecSplit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FGpuRecSplit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ByteHamster","download_url":"https://codeload.github.com/ByteHamster/GpuRecSplit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248750109,"owners_count":21155687,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T00:25:28.652Z","updated_at":"2026-01-04T15:16:13.647Z","avatar_url":"https://github.com/ByteHamster.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GpuRecSplit / SimdRecSplit\n\nWe greatly improve the construction time of the [RecSplit](https://arxiv.org/abs/1910.06416) Minimal Perfect Hash Function using two orthogonal approaches.\n*Rotation fitting* hashes the objects in each leaf to two sets and tries to combine them to a bijection by cyclically shifting one set to fill the holes in the other.\nIn addition, we harness parallelism on the level of bits, vectors, cores, and GPUs.\nThe code in this repository achieves significant speedups on SIMD machines and GPUs, compared\nto the original [RecSplit implementation](https://github.com/vigna/sux/blob/master/sux/function/RecSplit.hpp).\n\n### Construction performance\n\n\u003cimg src=\"https://raw.githubusercontent.com/ByteHamster/GpuRecSplit/main/plots.png\" alt=\"Plots preview\" /\u003e\n\n| l | b | Method | Threads | B/Object | us/Object | Speedup |\n|---:|---:|:---|---:|---:|---:|---:|\n| 16 | 2000 | RecSplit [\\[ALENEX'20\\]](https://arxiv.org/abs/1910.06416) | 1 | 1.560 | 1175.4 |  |\n| 16 | 2000 | SimdRecSplit | 1 | 1.560 | 138.0 | 8 |\n| 16 | 2000 | SimdRecSplit | 16 | 1.560 | 27.9 | 42 |\n| 16 | 2000 | GpuRecSplit | 4 | 1.5601 | 1.0 | 1173 |\n| 18 | 50 | RecSplit [\\[ALENEX'20\\]](https://arxiv.org/abs/1910.06416) | 1 | 1.707 | 2942.9 |  |\n| 18 | 50 | SimdRecSplit | 1 | 1.709 | 58.3 | 50 |\n| 18 | 50 | SimdRecSplit | 16 | 1.708 | 12.3 | 239 |\n| 18 | 50 | GpuRecSplit | 4 | 1.709 | 0.5 | 5438 |\n| 24 | 2000 | GpuRecSplit | 4 | 1.498 | 467.9 |  |\n\nIn the space efficient configurations here, we use n = 5 million objects (strong scaling).\nFor more detailed measurements, refer to [our paper](https://arxiv.org/abs/2212.09562).\n\n### Library Usage\n\nClone (with submodules, `git clone --recursive`) this repo and add it to your `CMakeLists.txt`:\n\n```\nadd_subdirectory(path/to/GpuRecSplit)\ntarget_link_libraries(YourTarget PRIVATE RecSplit SIMDRecSplit GPURecSplit) # or a subset of the targets\n```\n\n### Reproducing Experiments\n\nThis repository contains the source code and our reproducibility artifacts for the benchmarks specific to GpuRecSplit/SimdRecSplit.\nBenchmarks that compare SimdRecSplit to competitors are available in a different repository: https://github.com/ByteHamster/MPHF-Experiments\n\nWe provide an easy to use Docker image to quickly reproduce our results.\nAlternatively, you can look at the `Dockerfile` to see all libraries, tools, and commands necessary to compile.\n\n#### Cloning the Repository\n\nThis repository contains submodules.\nTo clone the repository including submodules, use the following command.\n\n```\ngit clone --recursive https://github.com/ByteHamster/GpuRecSplit.git\n```\n\n#### Building the Docker Image\n\nRun the following command to build the Docker image.\nBuilding the image takes about 10 minutes, as some packages (including LaTeX for the plots) have to be installed.\n\n```bash\ndocker build -t gpurecsplit --no-cache .\n```\n\nSome compiler warnings (red) are expected when building dependencies and will not prevent building the image or running the experiments.\nPlease ignore them!\n\n#### Running the Experiments\nDue to the long total running time of all experiments in our paper, we provide run scripts for a slightly simplified version of the experiments.\nThey run fewer iterations and output fewer data points.\n\nYou can modify the benchmarks scripts in `scripts/dockerVolume` if you want to change the number of runs or data points.\nThis does not require the Docker image to recompile.\nDifferent experiments can be started by using the following command:\n\n```bash\ndocker run --interactive --tty -v \"$(pwd)/scripts/dockerVolume:/opt/dockerVolume\" gpurecsplit /opt/dockerVolume/\u003cscript\u003e.sh\n```\n\n`\u003cscript\u003e` depends on the experiment you want to run.\n\n| Figure                                                               | Launch command                                | Estimated runtime  |\n| :------------------------------------------------------------------- | :-------------------------------------------- | :----------------- |\n| Figure 3 \u003cbr /\u003e\u003cimg src=\"preview-gpurecsplit-figure-3.png\" width=\"300\"/\u003e | /opt/dockerVolume/brute-force-vs-rotations.sh | 30 minutes         |\n\nThe resulting plots can be found in `scripts/dockerVolume` and have the file extension `.pdf`.\nMore experiments comparing GpuRecSplit with competitors can be found in a different repository: https://github.com/ByteHamster/MPHF-Experiments\n\n### Licensing\nGpuRecSplit is licensed exactly like `libstdc++` (GPLv3 + GCC Runtime Library Exception), which essentially means you can use it everywhere, exactly like `libstdc++`.\nYou can find details in the [COPYING](/COPYING) and [COPYING.RUNTIME](/COPYING.RUNTIME) files.\n\nIf you use the project in an academic context or publication, please cite [our paper](https://arxiv.org/abs/2212.09562):\n\n```\n@inproceedings{bez2022high,\n  author = {Dominik Bez and\n    Florian Kurpicz and\n    Hans{-}Peter Lehmann and\n    Peter Sanders},\n  title = {High Performance Construction of {RecSplit} Based Minimal Perfect Hash\n    Functions},\n  booktitle = {{ESA}},\n  series = {LIPIcs},\n  volume = {274},\n  pages = {19:1--19:16},\n  publisher = {Schloss Dagstuhl - Leibniz-Zentrum f{\\\"{u}}r Informatik},\n  year = {2023},\n  doi = {10.4230/LIPICS.ESA.2023.19}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Fgpurecsplit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytehamster%2Fgpurecsplit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Fgpurecsplit/lists"}