{"id":17179274,"url":"https://github.com/bytehamster/lemonhash","last_synced_at":"2025-04-13T16:21:46.769Z","repository":{"id":155075897,"uuid":"561816083","full_name":"ByteHamster/LeMonHash","owner":"ByteHamster","description":"Learned Monotone Minimal Perfect Hashing","archived":false,"fork":false,"pushed_at":"2025-04-01T08:04:12.000Z","size":561,"stargazers_count":25,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-11T23:16:13.585Z","etag":null,"topics":["data-structures","hashing","learned-index"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ByteHamster.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-11-04T14:56:11.000Z","updated_at":"2025-04-01T08:04:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"23c3bf89-1805-4c34-b33a-e26bf464db3d","html_url":"https://github.com/ByteHamster/LeMonHash","commit_stats":{"total_commits":204,"total_committers":2,"mean_commits":102.0,"dds":"0.38235294117647056","last_synced_commit":"a4e394b1c5502a539f6ee34fcc5dc348b934067f"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FLeMonHash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FLeMonHash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FLeMonHash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ByteHamster%2FLeMonHash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ByteHamster","download_url":"https://codeload.github.com/ByteHamster/LeMonHash/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248741846,"owners_count":21154386,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-structures","hashing","learned-index"],"created_at":"2024-10-15T00:25:25.310Z","updated_at":"2025-04-13T16:21:46.763Z","avatar_url":"https://github.com/ByteHamster.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Learned Monotone Minimal Perfect Hashing\n\n\u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"img/lemon_wordmark_dark.png\"\u003e\n  \u003cimg src=\"img/lemon_wordmark.png\" width=\"400\" alt=\"Logo\"\u003e\n\u003c/picture\u003e\n\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n![Build status](https://github.com/ByteHamster/LeMonHash/actions/workflows/build.yml/badge.svg)\n\nA monotone minimal perfect hash function (MMPHF) maps a set S of n input keys to the first n integers without collisions.\nAt the same time, it respects the natural order of the input universe.\nIn other words, it maps each input key to its rank.\nMMPHFs have many applications in databases and space-efficient data structures.\n\nLeMonHash (**Le**arned **Mon**otone Minimal Perfect **Hash**ing) is a novel MMPHF that **learns** about regularities in the input data\nto achieve significant space and performance improvements.\nIt uses the [PGM-Index](https://github.com/gvinciguerra/PGM-index) to calculate a learned rank estimate for each key\nand then solves collisions between these estimates using the retrieval data structure [BuRR](https://github.com/lorenzhs/BuRR).\nCompared to competitors that are mostly based on tree-like data structures, LeMonHash is a lot more flat and therefore faster to query.\nLeMonHash dominates most competitors in terms of construction throughput, query throughput, and space consumption -- simultaneously.\nWe also give a variant for variable-length strings that achieves significantly faster queries than competitors.\n\n### Usage\n\nRequirements:\n\n- GCC 11 or later\n- boost\n\nClone the repository (as a submodule) and add the following to your `CMakeLists.txt`.\n\n```cmake\nadd_subdirectory(path/to/LeMonHash)\ntarget_link_libraries(YourTarget PRIVATE LeMonHash)\n```\n\nThen you can use the straight-forward interface of LeMonHash:\n\n```cpp\nstd::vector\u003cuint64_t\u003e inputData {0, 1, 7, 15, 23, 42, 250};\nlemonhash::LeMonHash\u003c\u003e hashFunc(inputData);\nfor (uint64_t x : inputData) {\n    std::cout \u003c\u003c x \u003c\u003c \": \\t\" \u003c\u003c hashFunc(x) \u003c\u003c std::endl;\n}\n```\n\n### Query performance\n\n[![Plots preview](img/plots.png)](https://arxiv.org/pdf/2304.11012)\n\n### License\n\nThis code is licensed under the [GPLv3](/LICENSE).\nIf you use the project in an academic context or publication, please cite our [paper](https://arxiv.org/pdf/2304.11012):\n\n```bibtex\n@inproceedings{DBLP:conf/esa/FerraginaL0V23,\n  author       = {Paolo Ferragina and\n                  Hans{-}Peter Lehmann and\n                  Peter Sanders and\n                  Giorgio Vinciguerra},\n  title        = {Learned Monotone Minimal Perfect Hashing},\n  booktitle    = {{ESA}},\n  series       = {LIPIcs},\n  volume       = {274},\n  pages        = {46:1--46:17},\n  publisher    = {Schloss Dagstuhl - Leibniz-Zentrum f{\\\"{u}}r Informatik},\n  year         = {2023},\n  doi          = {10.4230/LIPICS.ESA.2023.46}\n}\n```\n\nThe code of the experiments comparing LeMonHash to competitors from the literature is available [here](https://github.com/ByteHamster/MMPHF-Experiments).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Flemonhash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytehamster%2Flemonhash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytehamster%2Flemonhash/lists"}