{"id":31447975,"url":"https://github.com/ashvardanian/hashevals","last_synced_at":"2025-11-09T11:03:18.522Z","repository":{"id":317443630,"uuid":"1065326055","full_name":"ashvardanian/HashEvals","owner":"ashvardanian","description":"Minimalistic Rust toolkit for hash function quality analysis. Tests avalanche effect, differential patterns, and statistical distribution across variable-length n-grams.","archived":false,"fork":false,"pushed_at":"2025-10-06T16:30:52.000Z","size":65,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-14T20:54:42.221Z","etag":null,"topics":["benchmark","bioinformatics","bloom-filter","evaluation","hash","hashing","hashmap","hashtable","minhash","nlp","smhasher","string","string-manipulation","testing"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ashvardanian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-27T13:53:55.000Z","updated_at":"2025-10-13T16:51:22.000Z","dependencies_parsed_at":"2025-09-30T23:28:12.146Z","dependency_job_id":"3c350154-0f54-44b9-949c-73c161642a5c","html_url":"https://github.com/ashvardanian/HashEvals","commit_stats":null,"previous_names":["ashvardanian/hashevals"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ashvardanian/HashEvals","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FHashEvals","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FHashEvals/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FHashEvals/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FHashEvals/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ashvardanian","download_url":"https://codeload.github.com/ashvardanian/HashEvals/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashvardanian%2FHashEvals/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":283496071,"owners_count":26845317,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-09T02:00:05.828Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","bioinformatics","bloom-filter","evaluation","hash","hashing","hashmap","hashtable","minhash","nlp","smhasher","string","string-manipulation","testing"],"created_at":"2025-10-01T02:19:13.631Z","updated_at":"2025-11-09T11:03:18.516Z","avatar_url":"https://github.com/ashvardanian.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![HashEvals](https://github.com/ashvardanian/ashvardanian/blob/master/repositories/HashEvals.jpg?raw=true)](https://github.com/ashvardanian/HashEvals)\n\n__HashEvals__ is a Rust program that stress-tests hash functions for avalanche quality, integral collisions, and distribution skew across variable-length n-grams.\nThe suite draws inspiration from the SMHasher family of benchmarks while keeping a lean codebase that is easy to extend with new hash primitives.\n\nWhat It Tests?\n\n- __Avalanche behavior__ – flips every input bit and measures how many output bits change, tracking worst-case bias and per-bit variance. Ideally, flipping any input bit changes each output bit with 50% probability.\n- __Integer collisions__ – hashes random integers with only last `n` bytes populated and watches for birthday-paradox collisions (for `n ≤ 8`). Representative of open-addressing hash tables with integer keys.\n- __Distribution probe__ – runs large bucketed Chi² checks to highlight skewed output distributions. Representative of constructing bucketed hash tables or load balancers.\n\nEach test operates over continuous random buffers generated with `ChaCha20Rng` so results are deterministic under the same input `--seed`.\nHashes returning 32-bit values (e.g. `Crc32`, `RabinKarp32`) still participate in the avalanche and distribution checks.\n\n## Results\n\n```sh\n Function    |   Avg.Bias | Worst.Bias | Integral ⨳ |     Chi² |    Throughput \n-------------+------------+------------+------------+----------+---------------\n Blake3      |  0.15142 % |  3.75977 % |   42.347 % | 2021.527 |   582.1 MiB/s \n SeaHash     |  0.17826 % |  4.44336 % |   42.055 % | 2012.333 |  3525.7 MiB/s \n SipHash     |  0.19405 % |  4.88281 % |   41.682 % | 2010.550 |  2734.0 MiB/s \n FoldHash    |  0.19693 % |  4.88281 % |   42.379 % | 2022.245 | 10712.6 MiB/s \n FarmHash    |  0.19945 % |  5.07812 % |   42.112 % | 1985.036 |  6123.6 MiB/s \n xxHash3     |  0.20226 % |  5.07812 % |   42.096 % | 2006.815 |  8122.3 MiB/s \n gxHash      |  0.21399 % |  5.41992 % |   41.964 % | 1988.415 |  1020.3 MiB/s \n StringZilla |  0.21524 % |  5.51758 % |   41.932 % | 1996.037 | 10994.1 MiB/s \n MurMur3     |  0.21968 % |  5.56641 % |   42.200 % | 1993.416 |  3914.6 MiB/s \n aHash       |  0.24295 % |  6.20117 % |   42.094 % | 1988.110 | 11371.3 MiB/s \n FxHash      |  1.86375 % |  6.92404 % |   42.130 % | 2022.595 | 11154.7 MiB/s \n Crc32       | 15.87577 % | 37.50000 % |   42.828 % | 1315.053 |  2811.4 MiB/s \n RabinKarp32 | 50.00000 % | 50.00000 % |   38.630 % | 1318.352 |   310.4 MiB/s \n```\n\n```\nConfiguration:\n  Hash functions: StringZilla, SipHash, aHash, xxHash3, gxHash, Crc32, MurMur3, FarmHash, Blake3, FxHash, FoldHash, SeaHash, RabinKarp32\n  N-gram sizes: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 60, 100, 200, 300, 400, 500] bytes\n  Samples per size: 10'000'000 n-grams\n  Random seed: 42\n  Total avalanche tests: 15'040'000'000 bit flips per hash function\n  Optimization: For N \u003e 8, randomly sample 64 unique bit positions to avoid quadratic complexity\n```\n\n### Tiny N-grams (≤ 8 bytes):\n\n```sh\n Function    |   Avg.Bias | Worst.Bias | Integral ⨳ |     Chi² |   Throughput \n-------------+------------+------------+------------+----------+--------------\n Blake3      |  0.49842 % |  3.75977 % |   42.347 % | 4401.729 |   66.1 MiB/s \n SeaHash     |  0.59017 % |  4.44336 % |   42.055 % | 4353.642 |  791.8 MiB/s \n SipHash     |  0.64224 % |  4.88281 % |   41.682 % | 4330.098 |  479.7 MiB/s \n FoldHash    |  0.65240 % |  4.88281 % |   42.379 % | 4411.870 | 1533.8 MiB/s \n FarmHash    |  0.66107 % |  5.07812 % |   42.112 % | 4321.060 |  890.7 MiB/s \n xxHash3     |  0.67025 % |  5.07812 % |   42.096 % | 4387.052 | 1373.7 MiB/s \n gxHash      |  0.70988 % |  5.41992 % |   41.964 % | 4302.142 | 1500.2 MiB/s \n StringZilla |  0.71370 % |  5.51758 % |   41.932 % | 4305.659 | 1055.7 MiB/s \n MurMur3     |  0.72892 % |  5.56641 % |   42.200 % | 4314.708 |  614.1 MiB/s \n aHash       |  0.80697 % |  6.20117 % |   42.094 % | 4275.918 | 1320.6 MiB/s \n FxHash      |  4.33818 % |  6.92404 % |   42.130 % | 4384.974 | 1652.4 MiB/s \n Crc32       | 19.01042 % | 37.50000 % |   42.828 % | 1994.412 |  563.5 MiB/s \n RabinKarp32 | 50.00000 % | 50.00000 % |   38.630 % | 1981.562 |  574.9 MiB/s \n ```\n\n### Short N-grams (9-32 bytes):\n\n```sh\n Function    |   Avg.Bias | Worst.Bias |     Chi² |   Throughput \n-------------+------------+------------+----------+--------------\n SeaHash     |  0.00469 % |  0.00545 % | 1034.607 | 1893.7 MiB/s \n xxHash3     |  0.00500 % |  0.00627 % | 1012.462 | 4016.5 MiB/s \n FarmHash    |  0.00502 % |  0.00645 % | 1010.069 | 2688.5 MiB/s \n FoldHash    |  0.00506 % |  0.00715 % | 1013.351 | 4202.9 MiB/s \n SipHash     |  0.00520 % |  0.00638 % | 1044.576 | 1272.2 MiB/s \n gxHash      |  0.00523 % |  0.00723 % | 1011.157 | 4359.3 MiB/s \n Blake3      |  0.00536 % |  0.00818 % | 1012.602 |  214.6 MiB/s \n MurMur3     |  0.00544 % |  0.00682 % | 1017.549 | 1646.4 MiB/s \n StringZilla |  0.00556 % |  0.00730 % | 1010.320 | 3470.7 MiB/s \n aHash       |  0.00576 % |  0.00676 % | 1018.854 | 3913.9 MiB/s \n FxHash      |  1.19828 % |  5.17577 % | 1017.994 | 4507.2 MiB/s \n Crc32       | 14.78365 % | 20.31250 % | 1038.868 |  807.9 MiB/s \n RabinKarp32 | 50.00000 % | 50.00000 % | 1033.444 |  497.3 MiB/s \n```\n\n### Long N-grams (\u003e 32 bytes):\n\n```sh\n Function    |   Avg.Bias | Worst.Bias |     Chi² |    Throughput \n-------------+------------+------------+----------+---------------\n aHash       |  0.00485 % |  0.00592 % | 1037.755 | 19606.7 MiB/s \n MurMur3     |  0.00491 % |  0.00545 % | 1012.738 |  5598.8 MiB/s \n StringZilla |  0.00495 % |  0.00589 % | 1052.259 | 21763.4 MiB/s \n gxHash      |  0.00512 % |  0.00605 % | 1020.838 |   921.4 MiB/s \n SeaHash     |  0.00513 % |  0.00626 % | 1008.996 |  4353.9 MiB/s \n FarmHash    |  0.00523 % |  0.00633 % |  982.769 |  8735.3 MiB/s \n Blake3      |  0.00523 % |  0.00584 % | 1033.929 |   969.0 MiB/s \n FoldHash    |  0.00538 % |  0.00718 % | 1022.016 | 16155.2 MiB/s \n SipHash     |  0.00563 % |  0.00687 % | 1010.762 |  3673.6 MiB/s \n xxHash3     |  0.00569 % |  0.00692 % |  987.596 | 10743.6 MiB/s \n FxHash      |  0.00635 % |  0.01340 % | 1049.392 | 16451.8 MiB/s \n Crc32       | 14.06250 % | 18.75000 % | 1007.642 |  4796.7 MiB/s \n RabinKarp32 | 50.00000 % | 50.00000 % | 1051.375 |   293.0 MiB/s \n```\n\n## Replicating the Results\n\nBuild and execute like any standard Cargo binary:\n\n```sh\ncargo run --release -- --list-hashes  # list supported hash functions\ncargo run --release -- --help         # show CLI options\n```\n\nTo run a very small sample set for a quick sanity check:\n\n```sh\nRUSTFLAGS=\"-C target-cpu=native\" cargo run --release -- --samples 100 --verbose\n```\n\nFor a proper comparison, consider running for 1 million samples:\n\n```sh\nRUSTFLAGS=\"-C target-cpu=native\" cargo run --release -- --samples 1000000\n```\n\n## Contributing\n\nAdd new hashers by implementing `HashFunction` trait in `hash_functions.rs` and pushing the boxed instance into `get_all_hash_functions()`.\nEach implementation exposes its display name, bit width, and `hash(\u0026[u8]) -\u003e u64` method, which the test harness dispatches automatically across all metrics.\nIf you want to add a new stress-testing methodology, please open an issue or PR to discuss the design!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashvardanian%2Fhashevals","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashvardanian%2Fhashevals","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashvardanian%2Fhashevals/lists"}