https://github.com/ashvardanian/hashevals
Minimalistic Rust toolkit for hash function quality analysis. Tests avalanche effect, differential patterns, and statistical distribution across variable-length n-grams.
https://github.com/ashvardanian/hashevals
benchmark bioinformatics bloom-filter evaluation hash hashing hashmap hashtable minhash nlp smhasher string string-manipulation testing
Last synced: 2 months ago
JSON representation
Minimalistic Rust toolkit for hash function quality analysis. Tests avalanche effect, differential patterns, and statistical distribution across variable-length n-grams.
- Host: GitHub
- URL: https://github.com/ashvardanian/hashevals
- Owner: ashvardanian
- License: apache-2.0
- Created: 2025-09-27T13:53:55.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-10-06T16:30:52.000Z (3 months ago)
- Last Synced: 2025-10-14T20:54:42.221Z (3 months ago)
- Topics: benchmark, bioinformatics, bloom-filter, evaluation, hash, hashing, hashmap, hashtable, minhash, nlp, smhasher, string, string-manipulation, testing
- Language: Rust
- Homepage:
- Size: 63.5 KB
- Stars: 9
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/ashvardanian/HashEvals)
__HashEvals__ is a Rust program that stress-tests hash functions for avalanche quality, integral collisions, and distribution skew across variable-length n-grams.
The suite draws inspiration from the SMHasher family of benchmarks while keeping a lean codebase that is easy to extend with new hash primitives.
What It Tests?
- __Avalanche behavior__ – flips every input bit and measures how many output bits change, tracking worst-case bias and per-bit variance. Ideally, flipping any input bit changes each output bit with 50% probability.
- __Integer collisions__ – hashes random integers with only last `n` bytes populated and watches for birthday-paradox collisions (for `n ≤ 8`). Representative of open-addressing hash tables with integer keys.
- __Distribution probe__ – runs large bucketed Chi² checks to highlight skewed output distributions. Representative of constructing bucketed hash tables or load balancers.
Each test operates over continuous random buffers generated with `ChaCha20Rng` so results are deterministic under the same input `--seed`.
Hashes returning 32-bit values (e.g. `Crc32`, `RabinKarp32`) still participate in the avalanche and distribution checks.
## Results
```sh
Function | Avg.Bias | Worst.Bias | Integral ⨳ | Chi² | Throughput
-------------+------------+------------+------------+----------+---------------
Blake3 | 0.15142 % | 3.75977 % | 42.347 % | 2021.527 | 582.1 MiB/s
SeaHash | 0.17826 % | 4.44336 % | 42.055 % | 2012.333 | 3525.7 MiB/s
SipHash | 0.19405 % | 4.88281 % | 41.682 % | 2010.550 | 2734.0 MiB/s
FoldHash | 0.19693 % | 4.88281 % | 42.379 % | 2022.245 | 10712.6 MiB/s
FarmHash | 0.19945 % | 5.07812 % | 42.112 % | 1985.036 | 6123.6 MiB/s
xxHash3 | 0.20226 % | 5.07812 % | 42.096 % | 2006.815 | 8122.3 MiB/s
gxHash | 0.21399 % | 5.41992 % | 41.964 % | 1988.415 | 1020.3 MiB/s
StringZilla | 0.21524 % | 5.51758 % | 41.932 % | 1996.037 | 10994.1 MiB/s
MurMur3 | 0.21968 % | 5.56641 % | 42.200 % | 1993.416 | 3914.6 MiB/s
aHash | 0.24295 % | 6.20117 % | 42.094 % | 1988.110 | 11371.3 MiB/s
FxHash | 1.86375 % | 6.92404 % | 42.130 % | 2022.595 | 11154.7 MiB/s
Crc32 | 15.87577 % | 37.50000 % | 42.828 % | 1315.053 | 2811.4 MiB/s
RabinKarp32 | 50.00000 % | 50.00000 % | 38.630 % | 1318.352 | 310.4 MiB/s
```
```
Configuration:
Hash functions: StringZilla, SipHash, aHash, xxHash3, gxHash, Crc32, MurMur3, FarmHash, Blake3, FxHash, FoldHash, SeaHash, RabinKarp32
N-gram sizes: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 60, 100, 200, 300, 400, 500] bytes
Samples per size: 10'000'000 n-grams
Random seed: 42
Total avalanche tests: 15'040'000'000 bit flips per hash function
Optimization: For N > 8, randomly sample 64 unique bit positions to avoid quadratic complexity
```
### Tiny N-grams (≤ 8 bytes):
```sh
Function | Avg.Bias | Worst.Bias | Integral ⨳ | Chi² | Throughput
-------------+------------+------------+------------+----------+--------------
Blake3 | 0.49842 % | 3.75977 % | 42.347 % | 4401.729 | 66.1 MiB/s
SeaHash | 0.59017 % | 4.44336 % | 42.055 % | 4353.642 | 791.8 MiB/s
SipHash | 0.64224 % | 4.88281 % | 41.682 % | 4330.098 | 479.7 MiB/s
FoldHash | 0.65240 % | 4.88281 % | 42.379 % | 4411.870 | 1533.8 MiB/s
FarmHash | 0.66107 % | 5.07812 % | 42.112 % | 4321.060 | 890.7 MiB/s
xxHash3 | 0.67025 % | 5.07812 % | 42.096 % | 4387.052 | 1373.7 MiB/s
gxHash | 0.70988 % | 5.41992 % | 41.964 % | 4302.142 | 1500.2 MiB/s
StringZilla | 0.71370 % | 5.51758 % | 41.932 % | 4305.659 | 1055.7 MiB/s
MurMur3 | 0.72892 % | 5.56641 % | 42.200 % | 4314.708 | 614.1 MiB/s
aHash | 0.80697 % | 6.20117 % | 42.094 % | 4275.918 | 1320.6 MiB/s
FxHash | 4.33818 % | 6.92404 % | 42.130 % | 4384.974 | 1652.4 MiB/s
Crc32 | 19.01042 % | 37.50000 % | 42.828 % | 1994.412 | 563.5 MiB/s
RabinKarp32 | 50.00000 % | 50.00000 % | 38.630 % | 1981.562 | 574.9 MiB/s
```
### Short N-grams (9-32 bytes):
```sh
Function | Avg.Bias | Worst.Bias | Chi² | Throughput
-------------+------------+------------+----------+--------------
SeaHash | 0.00469 % | 0.00545 % | 1034.607 | 1893.7 MiB/s
xxHash3 | 0.00500 % | 0.00627 % | 1012.462 | 4016.5 MiB/s
FarmHash | 0.00502 % | 0.00645 % | 1010.069 | 2688.5 MiB/s
FoldHash | 0.00506 % | 0.00715 % | 1013.351 | 4202.9 MiB/s
SipHash | 0.00520 % | 0.00638 % | 1044.576 | 1272.2 MiB/s
gxHash | 0.00523 % | 0.00723 % | 1011.157 | 4359.3 MiB/s
Blake3 | 0.00536 % | 0.00818 % | 1012.602 | 214.6 MiB/s
MurMur3 | 0.00544 % | 0.00682 % | 1017.549 | 1646.4 MiB/s
StringZilla | 0.00556 % | 0.00730 % | 1010.320 | 3470.7 MiB/s
aHash | 0.00576 % | 0.00676 % | 1018.854 | 3913.9 MiB/s
FxHash | 1.19828 % | 5.17577 % | 1017.994 | 4507.2 MiB/s
Crc32 | 14.78365 % | 20.31250 % | 1038.868 | 807.9 MiB/s
RabinKarp32 | 50.00000 % | 50.00000 % | 1033.444 | 497.3 MiB/s
```
### Long N-grams (> 32 bytes):
```sh
Function | Avg.Bias | Worst.Bias | Chi² | Throughput
-------------+------------+------------+----------+---------------
aHash | 0.00485 % | 0.00592 % | 1037.755 | 19606.7 MiB/s
MurMur3 | 0.00491 % | 0.00545 % | 1012.738 | 5598.8 MiB/s
StringZilla | 0.00495 % | 0.00589 % | 1052.259 | 21763.4 MiB/s
gxHash | 0.00512 % | 0.00605 % | 1020.838 | 921.4 MiB/s
SeaHash | 0.00513 % | 0.00626 % | 1008.996 | 4353.9 MiB/s
FarmHash | 0.00523 % | 0.00633 % | 982.769 | 8735.3 MiB/s
Blake3 | 0.00523 % | 0.00584 % | 1033.929 | 969.0 MiB/s
FoldHash | 0.00538 % | 0.00718 % | 1022.016 | 16155.2 MiB/s
SipHash | 0.00563 % | 0.00687 % | 1010.762 | 3673.6 MiB/s
xxHash3 | 0.00569 % | 0.00692 % | 987.596 | 10743.6 MiB/s
FxHash | 0.00635 % | 0.01340 % | 1049.392 | 16451.8 MiB/s
Crc32 | 14.06250 % | 18.75000 % | 1007.642 | 4796.7 MiB/s
RabinKarp32 | 50.00000 % | 50.00000 % | 1051.375 | 293.0 MiB/s
```
## Replicating the Results
Build and execute like any standard Cargo binary:
```sh
cargo run --release -- --list-hashes # list supported hash functions
cargo run --release -- --help # show CLI options
```
To run a very small sample set for a quick sanity check:
```sh
RUSTFLAGS="-C target-cpu=native" cargo run --release -- --samples 100 --verbose
```
For a proper comparison, consider running for 1 million samples:
```sh
RUSTFLAGS="-C target-cpu=native" cargo run --release -- --samples 1000000
```
## Contributing
Add new hashers by implementing `HashFunction` trait in `hash_functions.rs` and pushing the boxed instance into `get_all_hash_functions()`.
Each implementation exposes its display name, bit width, and `hash(&[u8]) -> u64` method, which the test harness dispatches automatically across all metrics.
If you want to add a new stress-testing methodology, please open an issue or PR to discuss the design!