{"id":13731510,"url":"https://github.com/skeeto/hash-prospector","last_synced_at":"2025-04-04T08:07:52.219Z","repository":{"id":43039902,"uuid":"142380776","full_name":"skeeto/hash-prospector","owner":"skeeto","description":"Automated integer hash function discovery","archived":false,"fork":false,"pushed_at":"2024-03-01T17:54:01.000Z","size":89,"stargazers_count":707,"open_issues_count":20,"forks_count":29,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-03-28T07:06:13.835Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/skeeto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-26T02:58:36.000Z","updated_at":"2025-03-27T22:34:06.000Z","dependencies_parsed_at":"2024-03-01T18:51:54.974Z","dependency_job_id":"c008edb4-3390-47bc-a68b-7c3ccd13679f","html_url":"https://github.com/skeeto/hash-prospector","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skeeto%2Fhash-prospector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skeeto%2Fhash-prospector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skeeto%2Fhash-prospector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skeeto%2Fhash-prospector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/skeeto","download_url":"https://codeload.github.com/skeeto/hash-prospector/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247142052,"owners_count":20890652,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T02:01:31.690Z","updated_at":"2025-04-04T08:07:52.197Z","avatar_url":"https://github.com/skeeto.png","language":"C","funding_links":[],"categories":["C","Maths"],"sub_categories":[],"readme":"# Hash Function Prospector\n\nThis is a little tool for automated [integer hash function][wang]\ndiscovery. It generates billions of [integer hash functions][jenkins] at\nrandom from a selection of [nine reversible operations][rev] ([also][]).\nThe generated functions are JIT compiled and their avalanche behavior is\nevaluated. The current best function is printed out in C syntax.\n\nThe *avalanche score* is the number of output bits that remain \"fixed\"\non average when a single input bit is flipped. Lower scores are better.\nIdeally the score is 0 — e.g. every output bit flips with a 50% chance\nwhen a single input bit is flipped.\n\nProspector can generate both 32-bit and 64-bit integer hash functions.\nCheck the usage (`-h`) for the full selection of options. Due to the JIT\ncompiler, only x86-64 is supported, though the functions it discovers\ncan, of course, be used anywhere.\n\nArticle: [Prospecting for Hash Functions][article]\n\n## Discovered Hash Functions\n\nThere are two useful classes of hash functions discovered by the\nprospector and the other helper utilities here. Both use an\n*xorshift-multiply-xorshift* construction, but with a different number\nof rounds.\n\n### Two round functions\n\n**Update**: [TheIronBorn has used combinatorial optimization][best] to\ndiscover the best known parameters for this construction:\n\n    [16 21f0aaad 15 d35a2d97 15] = 0.10760229515479501\n\n* * *\n\nThis 32-bit, two-round permutation has a particularly low bias and even\nbeats the venerable MurmurHash3 32-bit finalizer by a tiny margin. The\nhash function construction was discovered by the prospector, then the\nparameters were tuned using hill climbing and a genetic algorithm.\n\n```c\n// exact bias: 0.17353355999581582\nuint32_t\nlowbias32(uint32_t x)\n{\n    x ^= x \u003e\u003e 16;\n    x *= 0x7feb352d;\n    x ^= x \u003e\u003e 15;\n    x *= 0x846ca68b;\n    x ^= x \u003e\u003e 16;\n    return x;\n}\n\n// inverse\nuint32_t\nlowbias32_r(uint32_t x)\n{\n    x ^= x \u003e\u003e 16;\n    x *= 0x43021123;\n    x ^= x \u003e\u003e 15 ^ x \u003e\u003e 30;\n    x *= 0x1d69e2a5;\n    x ^= x \u003e\u003e 16;\n    return x;\n}\n```\n\nMore 2-round constants with low bias, some even better than `lowbias32`:\n\n    [15 d168aaad 15 af723597 15] = 0.15983776156606694\n    [17 9e485565 16 ef1d6b47 16] = 0.16143129787074881\n    [16 604baa5d 15 43d6ce97 15] = 0.16491052655811722\n    [16 a812d533 15 b278e4ad 17] = 0.16540778981744320\n    [16 9c8f2d35 15 5d1346b5 17] = 0.16835348823718840\n    [16 88c0a94b 14 9d06da59 17] = 0.16898511658356749\n    [16 a52fb2cd 15 551e4d49 16] = 0.17162579707098322\n    [16 b237694b 15 eb5b4593 15] = 0.17274184020173433\n    [16 7feb352d 15 846ca68b 16] = 0.17353355999581582\n    [16 4bdc9aa5 15 2729b469 16] = 0.17355424787865850\n    [16 dc63b4d3 15 2c32b9a9 15] = 0.17368589564800074\n    [16 e02bd533 15 0364c8ad 17] = 0.17447893149410759\n    [16 603a32a7 15 5a522677 15] = 0.17514135907753242\n    [16 ac10d4eb 15 9d51b169 16] = 0.17676510450127819\n    [15 f15f5959 14 7db29359 16] = 0.18103205436627479\n    [16 83747333 14 aa256573 16] = 0.18105722344231542\n    [16 be8b6ca7 14 6dd624b5 16] = 0.18223928664971270\n    [17 7186cd35 15 fe6bba73 15] = 0.18312741727971640\n    [16 93f2552b 15 959b4a4d 15] = 0.18360629205797341\n    [16 df892d4b 15 3c2da6b3 16] = 0.18368195486921446\n    [15 49c34cd3 13 e7418ca7 16] = 0.18400092964673831\n    [15 4811acab 15 5591acd7 16] = 0.18522661033580071\n    [16 dc85aaa7 15 6658a5cb 15] = 0.18577280285788791\n    [16 1ec9b4db 15 3224d38d 17] = 0.18631684392389897\n    [16 8ee0d535 15 5dc6b5af 15] = 0.18664478683752250\n    [16 462daaad 15 0a36c95d 16] = 0.18674876992866513\n    [16 17cdd657 15 a426cb25 15] = 0.18995262675473334\n    [16 ab39aacb 15 a1b5d19b 15] = 0.19045785238099658\n    [17 cd8512ad 15 b95c5a73 15] = 0.19050717016846502\n    [16 aecc96b5 15 f64dcd47 15] = 0.19077817816874504\n    [15 2548acd5 15 0b39d397 16] = 0.19121161052714156\n    [15 7f19c559 15 b356358d 16] = 0.19198007174447981\n    [16 4ffcab35 15 e98db28b 16] = 0.19423994132339928\n    [15 1216ccb5 15 3abcdca9 15] = 0.19426091938816648\n    [16 97219aad 15 ab46b735 15] = 0.19536391240344408\n    [16 c845a997 15 f214db9b 17] = 0.19553179377831409\n    [15 3a7ba96b 13 5e919299 16] = 0.19563436462680908\n    [16 c3d9a965 16 362e4b47 15] = 0.19575424692659107\n    [17 179cd515 15 4c495d47 15] = 0.19608530402798924\n    [16 5dce3553 15 a655d8e9 15] = 0.19621753012889542\n    [17 88a5ad35 16 96338b27 16] = 0.19653922266398804\n    [17 0364d657 15 ac2a34c5 15] = 0.19665754791333651\n    [16 3c9aa9ab 16 051369d7 16] = 0.19687211117412906\n    [17 0ee6d967 15 9c8a4a33 16] = 0.19722490309575344\n    [16 b921a6cb 14 30b5a6d1 16] = 0.19745192295417058\n    [18 a136aaad 16 9f6d62d7 17] = 0.19768193144773874\n    [16 0ae84d3b 15 3b9d4e5b 17] = 0.19776257374279985\n    [17 24f4d2cd 15 1ba3b969 16] = 0.19789489706453650\n    [16 418fb5b3 15 8cf3539b 16] = 0.19817117175199098\n    [16 f0ae2ad7 15 8965d939 16] = 0.19881758420284917\n    [17 9bde596b 16 1c9e9647 16] = 0.19882570872036193\n    [16 bd10754b 14 35a29b0d 16] = 0.19885203058591913\n    [17 78d31553 15 c547ac65 15] = 0.19918133404528665\n    [15 81aab34d 15 18e746a3 15] = 0.19938572052445763\n    [16 054335ab 15 146da68b 16] = 0.19943843016872725\n    [17 a1c76a55 16 5ca46b97 16] = 0.19959562213253398\n    [15 c62f4d53 14 62b8a46b 16] = 0.19973996656987172\n    [16 6872cd2d 15 f4a0d975 17] = 0.19992260539370590\n\nThis next function was discovered using only the prospector. It has a bit more\nbias than the previous function.\n\n```c\n// exact bias: 0.34968228323361017\nuint32_t\nprospector32(uint32_t x)\n{\n    x ^= x \u003e\u003e 15;\n    x *= 0x2c1b3c6d;\n    x ^= x \u003e\u003e 12;\n    x *= 0x297a2d39;\n    x ^= x \u003e\u003e 15;\n    return x;\n}\n```\n\nTo use the prospector search randomly for alternative multiplication constants,\nrun it like so:\n\n    $ ./prospector -p xorr:15,mul,xorr:12,mul,xorr:15\n\n### Three round functions\n\nAnother round of multiply-xorshift in this construction allows functions\nwith carefully chosen parameters to reach the theoretical bias limit\n(bias = ~0.021). For example, this hash function is indistinguishable\nfrom a perfect PRF (e.g. a random permutation of all 32-bit integers):\n\n```c\n// exact bias: 0.020888578919738908\nuint32_t\ntriple32(uint32_t x)\n{\n    x ^= x \u003e\u003e 17;\n    x *= 0xed5ad4bb;\n    x ^= x \u003e\u003e 11;\n    x *= 0xac4c1b51;\n    x ^= x \u003e\u003e 15;\n    x *= 0x31848bab;\n    x ^= x \u003e\u003e 14;\n    return x;\n}\n\n// inverse\nuint32_t\ntriple32_r(uint32_t x)\n{\n    x ^= x \u003e\u003e 14 ^ x \u003e\u003e 28;\n    x *= 0x32b21703;\n    x ^= x \u003e\u003e 15 ^ x \u003e\u003e 30;\n    x *= 0x469e0db1;\n    x ^= x \u003e\u003e 11 ^ x \u003e\u003e 22;\n    x *= 0x79a85073;\n    x ^= x \u003e\u003e 17;\n    return x;\n}\n```\n\nMore 3-round constants with low bias:\n\n    [17 ed5ad4bb 11 ac4c1b51 15 31848bab 14] = 0.020888578919738908\n    [16 aeccedab 14 ac613e37 16 19c89935 17] = 0.021246568167078764\n    [16 236f7153 12 33cd8663 15 3e06b66b 16] = 0.021280991798512679\n    [18 4260bb47 13 27e8e1ed 15 9d48a33b 15] = 0.021576730651802156\n    [17 3f6cde45 12 51d608ef 16 6e93639d 17] = 0.021772288363808408\n    [15 5dfa224b 14 4bee7e4b 17 930ee371 15] = 0.02184521628884813\n    [17 3964f363 14 9ac3751d 16 4e8772cb 17] = 0.021883292578109576\n    [16 66046c65 14 d3f0865b 16 f9999193 16] = 0.0219446068365007\n    [16 b1a89b33 14 09136aaf 16 5f2a44a7 15] = 0.021998624107282542\n    [16 24767aad 12 daa18229 16 e9e53beb 16] = 0.022043911220395354\n    [15 42f91d8d 14 61355a85 15 dcf2a949 14] = 0.022052539152635078\n    [15 4df8395b 15 466b428b 16 b4b2868b 16] = 0.022140187420461286\n    [16 2bbed51b 14 cd09896b 16 38d4c587 15] = 0.022159936298777144\n    [16 0ab694cd 14 4c139e47 16 11a42c3b 16] = 0.02220928191220355\n    [17 7f1e072b 12 8750a507 16 ecbb5b5f 16] = 0.022283743052847804\n    [16 f1be7bad 14 73a54099 15 3b85b963 15] = 0.022316544125749647\n    [16 66e756d5 14 b5f5a9cd 16 84e56b11 16] = 0.022372957847491555\n    [15 233354bb 15 ce1247bd 16 855089bb 17] = 0.022406591070966285\n    [16 eb6805ab 15 d2c7b7a7 16 7645a32b 16] = 0.022427060650927547\n    [16 8288ab57 14 0d1bfe57 16 131631e5 16] = 0.022431656871313443\n    [16 45109e55 14 3b94759d 16 adf31ea5 17] = 0.022436433678417977\n    [15 26cd1933 14 e3da1d59 16 5a17445d 16] = 0.022460520416491526\n    [16 7001e6eb 14 bb8e7313 16 3aa8c523 15] = 0.022491767264054854\n    [16 49ed0a13 14 83588f29 15 658f258d 15] = 0.022500668856510898\n    [16 6cdb9705 14 4d58d2ed 14 c8642b37 16] = 0.022504626537729222\n    [16 a986846b 14 bdd5372d 15 ad44de6b 17] = 0.022528238323120016\n    [16 c9575725 15 9448f4c5 16 3b7a5443 16] = 0.022586511310042686\n    [15 fc54c453 13 08213789 15 669f96eb 16] = 0.022591114646032095\n    [16 d47ef17b 14 642fa58f 16 a8b65b9b 16] = 0.022600633971701509\n    [15 00bfaa73 14 8799c69b 16 731985b1 16] = 0.022645866629596379\n    [16 953a55e9 15 8523822b 17 56e7aa63 15] = 0.022667180032713324\n    [16 a3d7345b 15 7f41c9c7 16 308bd62d 17] = 0.022688845770122031\n    [16 195565c7 14 16064d6f 16 0f9ec575 15] = 0.022697810688752193\n    [16 13566dbb 14 59369a03 15 990f9d1b 16] = 0.022712430070797596\n    [16 8430cc4b 15 a7831cbd 15 c6ccbd33 15] = 0.022734765033419774\n    [16 699f272b 14 09c01023 16 39bd48c3 15] = 0.022854175321846512\n    [15 336536c3 13 4f0e38b1 16 15d229f7 16] = 0.022884125170795171\n    [16 221f686d 12 d8948a07 16 ed8a8345 16] = 0.022902500408830236\n    [16 d7ca8cbb 13 eb4e259f 15 34ab1143 16] = 0.022905955538176669\n    [16 7cb04f65 14 9b96da73 16 83625687 15] = 0.022906573700088178\n    [15 5156196b 14 940d8869 15 0086f473 17] = 0.022984943828687553\n\nPrepending an increment to `triple32` breaks the `hash(0) = 0` issue while\nalso lowering the bias a tiny bit further:\n\n```c\n// exact bias: 0.020829410544597495\nuint32_t\ntriple32inc(uint32_t x)\n{\n    x++;\n    x ^= x \u003e\u003e 17;\n    x *= 0xed5ad4bb;\n    x ^= x \u003e\u003e 11;\n    x *= 0xac4c1b51;\n    x ^= x \u003e\u003e 15;\n    x *= 0x31848bab;\n    x ^= x \u003e\u003e 14;\n    return x;\n}\n\n// inverse\nuint32_t\ntriple32inc_r(uint32_t x)\n{\n    x ^= x \u003e\u003e 14 ^ x \u003e\u003e 28;\n    x *= 0x32b21703;\n    x ^= x \u003e\u003e 15 ^ x \u003e\u003e 30;\n    x *= 0x469e0db1;\n    x ^= x \u003e\u003e 11 ^ x \u003e\u003e 22;\n    x *= 0x79a85073;\n    x ^= x \u003e\u003e 17;\n    x--;\n    return x;\n}\n```\n\n## Measuring exact bias\n\nThe `-E` mode evaluates the bias of a given hash function (`-p` or `-l`). By\ndefault the prospector uses an estimate to quickly evaluate a function's bias,\nbut it's non-deterministic and there's a lot of noise in the result. To\nexhaustively measure the exact bias, use the `-e` option.\n\nThe function to be checked can be defined using `-p` and a pattern or\n`-l` and a shared library containing a function named `hash()`. For\nexample, to measure the exact bias of the best hash function above:\n\n    $ ./prospector -Eep xorr:16,mul:e2d0d4cb,xorr:15,mul:3c6ad939,xorr:15\n\nOr drop the function in a C file named hash.c, and name the function\n`hash()`. This lets you test hash functions that can't be represented\nusing the prospector's limited notion of hash functions.\n\n    $ cc -O3 -shared -fPIC -l hash.so hash.c\n    $ ./prospector -Eel ./hash.so\n\nBy default it treats its input as a 32-bit hash function. Use the `-8`\nswitch to test (by estimation) 64-bit functions. There is no exact,\nexhaustive test for 64-bit hash functions since that would take far too\nlong.\n\n## Reversible operation selection\n\n```c\nx  = ~x;\nx ^= constant;\nx *= constant | 1; // e.g. only odd constants\nx += constant;\nx ^= x \u003e\u003e constant;\nx ^= x \u003c\u003c constant;\nx += x \u003c\u003c constant;\nx -= x \u003c\u003c constant;\nx \u003c\u003c\u003c= constant; // left rotation\nx = bswap(x) // swap high and low bytes.\n```\n\nTechnically `x = ~x` is covered by `x ^= constant`. However, `~x` is\nuniquely special and particularly useful. The generator is very unlikely\nto generate the one correct constant for the XOR operator that achieves\nthe same effect.\n\n## 16-bit hashes\n\nBecause the constraints are different for 16-bit hashes there's a separate\ntool for generating these hashes: `hp16`. Unlike the 32-bit / 64-bit\nprospector, this implementation is fully portable and will run on just\nabout any system. It's also capable of generating and evaluating 128KiB\ns-boxes.\n\nSince 16-bit hashes are more likely to be needed on machines that, say,\nlack fast multiplication instructions, certain operations can be omitted\nduring exploration (`-m`, `-r`).\n\nSome interesting results so far:\n\n```c\n// 2-round xorshift-multiply (-Xn2)\n// bias = 0.0085905051336723701\nuint16_t hash16_xm2(uint16_t x)\n{\n    x ^= x \u003e\u003e 8; x *= 0x88b5U;\n    x ^= x \u003e\u003e 7; x *= 0xdb2dU;\n    x ^= x \u003e\u003e 9;\n    return x;\n}\n\n// 3-round xorshift-multiply (-Xn3)\n// bias = 0.0045976709018820602\nuint16_t hash16_xm3(uint16_t x)\n{\n    x ^= x \u003e\u003e  7; x *= 0x2993U;\n    x ^= x \u003e\u003e  5; x *= 0xe877U;\n    x ^= x \u003e\u003e  9; x *= 0x0235U;\n    x ^= x \u003e\u003e 10;\n    return x;\n}\n\n// No multiplication (-Imn6)\n// bias = 0.023840118344741465\nuint16_t hash16_s6(uint16_t x)\n{\n    x += x \u003c\u003c 7; x ^= x \u003e\u003e 8;\n    x += x \u003c\u003c 3; x ^= x \u003e\u003e 2;\n    x += x \u003c\u003c 4; x ^= x \u003e\u003e 8;\n    return x;\n}\n\n// Which is identical to this xorshift-multiply\nuint16_t hash16_s6(uint16_t x)\n{\n    x *= 0x0081U; x ^= x \u003e\u003e 8;\n    x *= 0x0009U; x ^= x \u003e\u003e 2;\n    x *= 0x0011U; x ^= x \u003e\u003e 8;\n    return x;\n}\n```\n\nA good 3-round xorshift hash (a short search via `hp16 -Xn3`) is a close\napproximation of a good s-box (i.e. `hp16 -S`).\n\nBe mindful of C integer promotion rules when doing 16-bit operations. For\ninstance, on 32-bit implementations unsigned 16-bit operands will be\npromoted to signed 32-bit integers, leading to incorrect results in\ncertain cases. The C programs printed by this program are careful to\npromote 16-bit operations to \"unsigned int\" where needed.\n\n\n[also]: https://marc-b-reynolds.github.io/math/2017/10/13/IntegerBijections.html\n[article]: https://nullprogram.com/blog/2018/07/31/\n[best]: https://github.com/skeeto/hash-prospector/issues/19\n[jenkins]: http://burtleburtle.net/bob/hash/integer.html\n[rev]: http://papa.bretmulvey.com/post/124027987928/hash-functions\n[wang]: https://gist.github.com/badboy/6267743\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskeeto%2Fhash-prospector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fskeeto%2Fhash-prospector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskeeto%2Fhash-prospector/lists"}