{"id":22338198,"url":"https://github.com/mrecachinas/hexhamming","last_synced_at":"2025-07-10T06:10:54.824Z","repository":{"id":43935482,"uuid":"172738230","full_name":"mrecachinas/hexhamming","owner":"mrecachinas","description":":heavy_division_sign: SIMD-accelerated bitwise hamming distance Python module for hexadecimal strings","archived":false,"fork":false,"pushed_at":"2025-04-27T15:21:07.000Z","size":586,"stargazers_count":19,"open_issues_count":4,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-27T16:26:32.515Z","etag":null,"topics":["avx","c","edit-distance","hamming-distance","hexadecimal","python","simd","sse42"],"latest_commit_sha":null,"homepage":"https://github.com/mrecachinas/hexhamming","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mrecachinas.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-02-26T15:29:04.000Z","updated_at":"2025-04-27T15:19:47.000Z","dependencies_parsed_at":"2024-12-01T23:16:24.234Z","dependency_job_id":null,"html_url":"https://github.com/mrecachinas/hexhamming","commit_stats":{"total_commits":120,"total_committers":5,"mean_commits":24.0,"dds":"0.44999999999999996","last_synced_commit":"b83edc88677e69edc71d942394b992e4f7202eaa"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrecachinas%2Fhexhamming","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrecachinas%2Fhexhamming/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrecachinas%2Fhexhamming/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrecachinas%2Fhexhamming/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mrecachinas","download_url":"https://codeload.github.com/mrecachinas/hexhamming/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252833831,"owners_count":21811262,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avx","c","edit-distance","hamming-distance","hexadecimal","python","simd","sse42"],"created_at":"2024-12-04T06:13:26.791Z","updated_at":"2025-05-07T07:33:29.152Z","avatar_url":"https://github.com/mrecachinas.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"``hexhamming``\n====================\n\n|Pip|_ |Prs|_ |Github|_\n\n.. |Pip| image:: https://badge.fury.io/py/hexhamming.svg\n.. _Pip: https://badge.fury.io/py/hexhamming\n\n.. |Prs| image:: https://img.shields.io/badge/PRs-welcome-brightgreen.svg\n.. _Prs: .github/CONTRIBUTING.md#pull-requests\n\n.. |Github| image:: https://github.com/mrecachinas/hexhamming/workflows/build/badge.svg\n.. _Github: https://github.com/mrecachinas/hexhamming/actions\n\nWhat does it do?\n----------------\n\nThis module performs a fast bitwise hamming distance of two hexadecimal strings.\n\nThis looks like::\n\n    DEADBEEF = 11011110101011011011111011101111\n    00000000 = 00000000000000000000000000000000\n    XOR      = 11011110101011011011111011101111\n    Hamming  = number of ones in DEADBEEF ^ 00000000 = 24\n\nThis essentially amounts to\n\n::\n\n    \u003e\u003e\u003e import gmpy\n    \u003e\u003e\u003e gmpy.popcount(0xdeadbeef ^ 0x00000000)\n    24\n\nexcept with Python strings, so\n\n::\n\n    \u003e\u003e\u003e import gmpy\n    \u003e\u003e\u003e gmpy.popcount(int(\"deadbeef\", 16) ^ int(\"00000000\", 16))\n    24\n\nA few assumptions are made and enforced:\n\n* this is a valid hexadecimal string (i.e., ``[a-fA-F0-9]+``)\n* the strings are the same length\n* the strings do not begin with ``\"0x\"``\n\nWhy yet another Hamming distance library?\n-----------------------------------------\n\nThere are a lot of fantastic (python) libraries that offer methods to calculate\nvarious edit distances, including Hamming distances: Distance, textdistance,\nscipy, jellyfish, etc.\n\nIn this case, I needed a hamming distance library that worked on hexadecimal\nstrings (i.e., a Python ``str``) and performed blazingly fast.\nFurthermore, I often did not care about hex strings greater than 256 bits.\nThat length constraint is different vs all the other libraries and enabled me\nto explore vectorization techniques via ``numba``, ``numpy``, and\n``SSE/AVX`` intrinsics.\n\nLastly, I wanted to minimize dependencies, meaning you do not need to install\n``numpy``, ``gmpy``, ``cython``, ``pypy``, ``pythran``, etc.\n\nEventually, after playing around with ``gmpy.popcount``, ``numba.jit``,\n``pythran.run``, ``numpy``, I decided to write what I wanted\nin essentially raw C. At this point, I'm using raw ``char*`` and\n``int*``, so exploring re-writing this in Fortran makes little sense.\n\nInstallation\n-------------\n\nTo install, ensure you have Python 3.6+. Run\n\n::\n\n    pip install hexhamming\n\nor to install from source\n\n::\n\n    git clone https://github.com/mrecachinas/hexhamming\n    cd hexhamming\n    python setup.py install # or pip install .\n\nIf you want to contribute to hexhamming, you should install the dev\ndependencies\n\n::\n\n    pip install -r requirements-dev.txt\n\nand make sure the tests pass with\n\n::\n\n    python -m pytest -vls .\n\nExample\n-------\n\nUsing ``hexhamming`` is as simple as\n\n::\n\n    \u003e\u003e\u003e from hexhamming import hamming_distance_string\n    \u003e\u003e\u003e hamming_distance_string(\"deadbeef\", \"00000000\")\n    24\n\n**New in v2.0.0** : ``hexhamming`` now supports ``byte``s via ``hamming_distance_bytes``.\nYou use it in the exact same way as before, except you pass in a byte string.\n\n::\n\n    \u003e\u003e\u003e from hexhamming import hamming_distance_bytes\n    \u003e\u003e\u003e hamming_distance_bytes(b\"\\xde\\xad\\xbe\\xef\", b\"\\x00\\x00\\x00\\x00\")\n    24\n\nWe also provide a method for a quick boolean check of whether two hexadecimal strings\nare within a given Hamming distance.\n\n::\n\n    \u003e\u003e\u003e from hexhamming import check_hexstrings_within_dist\n    \u003e\u003e\u003e check_hexstrings_within_dist(\"ffff\", \"fffe\", 2)\n    True\n    \u003e\u003e\u003e check_hexstrings_within_dist(\"ffff\", \"0000\", 2)\n    False\n\nSimilarly, ``hexhamming`` supports byte arrays via ``check_bytes_arrays_within_dist``, which has\na similar API as ``check_hexstrings_within_dist``, except it expects a byte array. Additionally,\nit will check if any element of a byte array is within a specified Hamming Distance of another\nbyte array.\n\nBenchmark\n---------\n\nBelow is a benchmark using ``pytest-benchmark`` with hexhamming==v1.3.2\nmy 2020 2.0 GHz quad-core Intel Core i5 16 GB 3733 MHz LPDDR4 macOS Catalina (10.15.5)\nwith Python 3.7.3 and Apple clang version 11.0.3 (clang-1103.0.32.62).\n\n=======================================  ===========  ==========  =============  ========  ============\nName                                       Mean (ns)    Std (ns)    Median (ns)    Rounds    Iterations\n=======================================  ===========  ==========  =============  ========  ============\ntest_hamming_distance_bench_3                93.8        10.5          94.3         53268           200\ntest_hamming_distance_bench_3_same           94.2        15.2          94.9        102146           100\ntest_check_hexstrings_within_dist_bench      231.9      104.2         216.5        195122            22\ntest_hamming_distance_bench_256              97.5        34.1          94.0        195122            22\ntest_hamming_distance_bench_1000             489.8      159.4         477.5         94411            20\ntest_hamming_distance_bench_1000_same        497.8       87.8         496.6         18971            20\ntest_hamming_distance_bench_1024             509.9      299.5         506.7         18652            10\ntest_hamming_distance_bench_1024_same        467.4      205.9         450.4        181819            10\n=======================================  ===========  ==========  =============  ========  ============\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrecachinas%2Fhexhamming","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmrecachinas%2Fhexhamming","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrecachinas%2Fhexhamming/lists"}