{"id":17179246,"url":"https://github.com/bytehamster/sichash","last_synced_at":"2025-04-13T16:31:03.504Z","repository":{"id":65274367,"uuid":"528333228","full_name":"ByteHamster/SicHash","owner":"ByteHamster","description":"A (Minimal) Perfect Hash Function based on irregular cuckoo hashing, retrieval, and overloading.","archived":false,"fork":false,"pushed_at":"2024-10-01T11:59:47.000Z","size":2359,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-10-31T21:43:26.634Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ByteHamster.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-24T08:29:34.000Z","updated_at":"2024-09-25T07:18:05.000Z","dependencies_parsed_at":"2024-08-23T09:50:55.818Z","dependency_job_id":"5d9f82a2-4d56-4696-bf19-90e99c21a57e","html_url":"https://github.com/ByteHamster/SicHash","commit_stats":{"total_commits":173,"total_committers":1,"mean_commits":173.0,"dds":0.0,"last_synced_commit":"ffeeb0083561ce8592ac9b9f3e87b25df5057f58"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FSicHash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FSicHash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FSicHash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FSicHash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ByteHamster","download_url":"https://codeload.github.com/ByteHamster/SicHash/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223597074,"owners_count":17170872,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T00:25:19.356Z","updated_at":"2025-04-13T16:31:03.497Z","avatar_url":"https://github.com/ByteHamster.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SicHash\n\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n![Build status](https://github.com/ByteHamster/SicHash/actions/workflows/build.yml/badge.svg)\n\nA perfect hash function (PHF) maps a set S of n keys to the first m integers without collisions.\nIt is called _minimal_ perfect (MPHF) if m=n.\nPerfect hash functions have applications in databases, bioinformatics, and as a building block of various space-efficient data structures.\n\nSicHash is a (minimal) perfect hash function based on irregular cuckoo hashing, retrieval, and overloading.\nEach input key has a small number of choices for output positions.\nUsing cuckoo hashing, SicHash determines a mapping from each key to one of its choices,\nsuch that there are no collisions between keys.\nIt then stores the mapping from keys to their candidate index space-efficiently using\nthe [BuRR](https://github.com/lorenzhs/BuRR) retrieval data structure.\n\nSicHash offers a very good trade-off between construction performance, query performance, and space consumption.\n\n### Library Usage\n\nClone this repo and add the following to your `CMakeLists.txt`.\nNote that the repo has submodules, so either use `git clone --recursive` or `git submodule update --init --recursive`.\n\n```\nadd_subdirectory(path/to/SicHash)\ntarget_link_libraries(YourTarget PRIVATE SicHash)\n```\n\nConstructing a SicHash perfect hash function is then straightforward:\n\n```cpp\nstd::vector\u003cstd::string\u003e keys = {\"abc\", \"def\", \"123\", \"456\"};\nsichash::SicHashConfig config;\nsichash::SicHash\u003ctrue\u003e hashFunc(keys, config);\nstd::cout \u003c\u003c hashFunc(\"abc\") \u003c\u003c std::endl;\n```\n\n### Construction Performance\n\n[![Plots preview](https://raw.githubusercontent.com/ByteHamster/SicHash/main/plots-construction.png)](https://arxiv.org/pdf/2210.01560)\n\n### Query Performance\n\n[![Plots preview](https://raw.githubusercontent.com/ByteHamster/SicHash/main/plots-query.png)](https://arxiv.org/pdf/2210.01560)\n\n### Reproducing Experiments\n\nThis repository contains the source code and our reproducibility artifacts for the benchmarks specific to SicHash.\nBenchmarks that compare SicHash to competitors are available in a different repository: https://github.com/ByteHamster/MPHF-Experiments\n\nWe provide an easy to use Docker image to quickly reproduce our results.\nAlternatively, you can look at the `Dockerfile` to see all libraries, tools, and commands necessary to compile SicHash.\n\n#### Building the Docker Image\n\nRun the following command to build the Docker image.\nBuilding the image takes about 5 minutes, as some packages (including LaTeX for the plots) have to be installed.\n\n```bash\ndocker build -t sichash --no-cache .\n```\n\nSome compiler warnings (red) are expected when building competitors and will not prevent building the image or running the experiments.\nPlease ignore them!\n\n#### Running the Experiments\nDue to the long total running time of all experiments in our paper, we provide run scripts for a slightly simplified version of the experiments.\nThey run fewer iterations and output fewer data points.\n\nYou can modify the benchmarks scripts in `scripts/dockerVolume` if you want to change the number of runs or data points.\nThis does not require the Docker image to recompile.\nDifferent experiments can be started by using the following command:\n\n```bash\ndocker run --interactive --tty -v \"$(pwd)/scripts/dockerVolume:/opt/dockerVolume\" sichash /opt/dockerVolume/figure-1.sh\n```\n\nThe number also refers to the figure in the paper.\n\n| Figure in paper | Launch command                | Estimated runtime  |\n| :-------------- | :---------------------------- | :----------------- |\n| 1               | /opt/dockerVolume/figure-1.sh | 10 minutes         |\n\nThe resulting plots can be found in `scripts/dockerVolume` and are called `figure-\u003cnumber\u003e.pdf`.\nMore experiments comparing SicHash with competitors can be found in a different repository: https://github.com/ByteHamster/MPHF-Experiments\n\n### License\n\nThis code is licensed under the [GPLv3](/LICENSE).\nIf you use the project in an academic context or publication, please cite [our paper](https://doi.org/10.1137/1.9781611977561.ch15):\n\n```\n@inproceedings{lehmann2023sichash,\n  author       = {Hans{-}Peter Lehmann and\n                  Peter Sanders and\n                  Stefan Walzer},\n  title        = {SicHash - Small Irregular Cuckoo Tables for Perfect Hashing},\n  booktitle    = {{ALENEX}},\n  pages        = {176--189},\n  publisher    = {{SIAM}},\n  year         = {2023},\n  doi          = {10.1137/1.9781611977561.CH15}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Fsichash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytehamster%2Fsichash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Fsichash/lists"}