{"id":27631169,"url":"https://github.com/habedi/hsdlib","last_synced_at":"2026-04-18T03:32:35.895Z","repository":{"id":288838223,"uuid":"968241070","full_name":"habedi/hsdlib","owner":"habedi","description":"Hardware-accelerated distance metrics and similarity measures for high-dimensional data","archived":false,"fork":false,"pushed_at":"2025-11-12T08:54:14.000Z","size":195,"stargazers_count":42,"open_issues_count":8,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-11-12T09:06:46.433Z","etag":null,"topics":["c-library","c-programming-language","distance-metrics","hardware-acceleration","python","python-library","simd","similarity-measures","similarity-search","vector-operations","vector-search"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/habedi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-04-17T18:34:53.000Z","updated_at":"2025-08-20T14:23:15.000Z","dependencies_parsed_at":"2025-04-23T17:42:50.327Z","dependency_job_id":"774fb6ab-d407-4320-950a-417f0e0d9532","html_url":"https://github.com/habedi/hsdlib","commit_stats":null,"previous_names":["habedi/hsdlib"],"tags_count":6,"template":false,"template_full_name":"habedi/template-c-project","purl":"pkg:github/habedi/hsdlib","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhsdlib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhsdlib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhsdlib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhsdlib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/habedi","download_url":"https://codeload.github.com/habedi/hsdlib/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhsdlib/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31955712,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T00:39:45.007Z","status":"online","status_checked_at":"2026-04-18T02:00:07.018Z","response_time":103,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-library","c-programming-language","distance-metrics","hardware-acceleration","python","python-library","simd","similarity-measures","similarity-search","vector-operations","vector-search"],"created_at":"2025-04-23T17:42:40.533Z","updated_at":"2026-04-18T03:32:35.889Z","avatar_url":"https://github.com/habedi.png","language":"C","readme":"\u003cdiv align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003cimg alt=\"Hsdlib Logo\" src=\"logo.svg\" height=\"50%\" width=\"50%\"\u003e\n  \u003c/picture\u003e\n\u003cbr\u003e\n\n\u003ch2\u003eHsdlib\u003c/h2\u003e\n\n[![Tests](https://img.shields.io/github/actions/workflow/status/habedi/hsdlib/tests_amd64.yml?label=tests\u0026style=flat\u0026labelColor=282c34\u0026logo=github)](https://github.com/habedi/hsdlib/actions/workflows/tests_amd64.yml)\n[![Benches](https://img.shields.io/github/actions/workflow/status/habedi/hsdlib/benches_amd64.yml?label=benches\u0026style=flat\u0026labelColor=282c34\u0026logo=github)](https://github.com/habedi/hsdlib/actions/workflows/benches_amd64.yml)\n[![Code Coverage](https://img.shields.io/codecov/c/github/habedi/hsdlib?label=coverage\u0026style=flat\u0026labelColor=282c34\u0026logo=codecov)](https://codecov.io/gh/habedi/hsdlib)\n[![CodeFactor](https://img.shields.io/codefactor/grade/github/habedi/hsdlib?label=code%20quality\u0026style=flat\u0026labelColor=282c34\u0026logo=codefactor)](https://www.codefactor.io/repository/github/habedi/hsdlib)\n[![License](https://img.shields.io/badge/license-MIT-007ec6?label=license\u0026style=flat\u0026labelColor=282c34\u0026logo=open-source-initiative)](https://github.com/habedi/hsdlib)\n[![Release](https://img.shields.io/github/release/habedi/hsdlib.svg?label=release\u0026style=flat\u0026labelColor=282c34\u0026logo=github)](https://github.com/habedi/hsdlib/releases/latest)\n\nHardware-accelerated distance metrics and similarity measures for high-dimensional data\n\n\u003c/div\u003e\n\n---\n\nHsdlib is a C library that provides hardware-accelerated implementations of popular distance metrics and\nsimilarity measures for high-dimensional data.\nIt automatically picks the optimal implementation based on available SIMD instruction sets (backend) like\nAVX/AVX2/AVX512 for AMD64 or NEON/SVE for AArch64 CPUs at runtime.\n\n### Features\n\n- Simple unified API (see [hsdlib.h](include/hsdlib.h))\n- Support for popular distances and similarity measures\n    - Squared Euclidean, Manhattan, Hamming distances\n    - Dot product, cosine, Jaccard similarities\n- Support for the AMD, Intel, and ARM CPUs\n- Support for runtime dispatch with optional manual override\n- Bindings for Python (see [HsdPy](bindings/python)) 🐍\n- Compatible with C11 and later\n\n---\n\n### Getting Started\n\nTo get started with Hsdlib, you can clone the repository and build the library using the following commands:\n\n```bash\ngit clone --depth=1 https://github.com/habedi/hsdlib\ncd hsdlib\n\n# Install dependencies (for Debian-based systems)\nmake install-deps\n\n# Build the library (shared and static)\nmake build BUILD_TYPE=release # Default is `debug`\nls lib # Check the built library files (libhsd.so, libhsd.a, etc.)\n```\n\nAfter the build is complete, you can include the [hsdlib.h](include/hsdlib.h) header file in your C (or C++) code and\nlink against the library files in the `lib` directory.\n\n#### Python Bindings\n\nTo use Hsdlib in Python, you can install the [HsdPy](bindings/python) package using pip:\n\n```bash\npip install hsdpy\n```\n\n#### Examples\n\n| File                                          | Description                          |\n|:----------------------------------------------|:-------------------------------------|\n| [hsdlib_example.c](examples/hsdlib_example.c) | Example usages of Hsdlib API (C)     |\n| [hsdpy_example.py](examples/hsdpy_example.py) | Example usages of HsdPy API (Python) |\n\nTo compile and run the examples, use the `make example` command.\n\n---\n\n### Documentation\n\nAPI documentation can be generated using [Doxygen](https://www.doxygen.nl).\nTo generate the documentation, use the `make doc` command and then open the `docs/html/index.html` file in a web browser\nto see it.\n\n#### API Summary\n\n| Distance or Similarity Function | Description                                                                                                                                |\n|:--------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------|\n| `hsd_dist_sqeuclidean_f32(...)` | Compute squared Euclidean ($L_2^2$) distance between two float vectors.                                                                    |\n| `hsd_dist_manhattan_f32(...)`   | Compute Manhattan ($L_1$) distance between two float vectors.                                                                              |\n| `hsd_dist_hamming_u8(...)`      | Compute Hamming distance between two binary or non-binary byte (`uint8_t`) vectors.                                                        |\n| `hsd_sim_dot_f32(...)`          | Compute dot product similarity between two float vectors.                                                                                  |\n| `hsd_sim_cosine_f32(...)`       | Compute cosine similarity between two float vectors.                                                                                       |\n| `hsd_sim_jaccard_u16(...)`      | Compute Jaccard similarity between two binary vectors. If vectors are not binary (integer `uint16_t`), Tanimoto coefficient is calculated. |\n\nThe distance and similarity functions (functions that their names start with `hsd_dist_` or `hsd_sim_`) accept the\nfollowing parameters in order:\n\n- `a`: Pointer to the first input vector (array of floats or bytes).\n- `b`: Pointer to the second input vector (array of floats or bytes).\n- `n`: Number of elements in the input vectors.\n- `r`: Pointer to the output variable where the result will be stored.\n\nAll the distance and similarity functions return `hsd_status_t` as the return type.\nThe `HSD_SUCCESS` status indicates that the result is valid and stored in the output pointer `r`.\nAnything else indicates an error.\nCheck out the **Types and Enums** section for more details.\n\n\u003e [!NOTE]\n\u003e **N1**: Euclidean distance can easily be calculated from the squared Euclidean\n\u003e distance: $\\text{euclidean}(a, b) = \\sqrt{\\text{squared euclidean}(a, b)}$\n\u003e\n\u003e **N2**: The similarity measures can be used to calculate distances (or dissimilarities) as follows:\n\u003e - Cosine distance = $1 - \\text{cosine}(a, b)$\n\u003e - Jaccard distance = $1 - \\text{jaccard}(a, b)$\n\u003e - Negative dot product = $-\\text{dot}(a, b)$\n\u003e\n\u003e **N3**: The implementation of the Hamming distance works on byte (`uint8_t`) vectors.\n\u003e It calculates the total number of differing bits between the two sequences using the formula:\n\u003e `hamming(a, b) = Σᵢ popcount(a_byte[i] ⊕ b_byte[i])`, where `popcount` counts the set bits and `⊕` is\n\u003e the bitwise XOR operation. The function returns this total count.\n\u003e\n\u003e **N4**: Tanimoto coefficient formula is used to calculate the Jaccard similarity.\n\u003e Note that the formula gives the Jaccard similarity for binary vectors.\n\u003e However, for non-binary vectors, the calculated similarity measure would be Tanimoto coefficient rather than Jaccard\n\u003e similarity.\n\u003e\n\u003e **N5**: The implementation of the cosine similarity normalizes the input vectors to unit length ($L_2$ norm = 1)\n\u003e before\n\u003e calculating the cosine similarity.\n\n| Utility Function                   | Return Type       | Description                                                                                                                               |\n|:-----------------------------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------|\n| `hsd_get_backend()`                | `const char *`    | Return textual name of current backend (auto or forced).                                                                                  |\n| `hsd_has_avx512()`                 | `bool`            | Return true if AVX512F the CPU supports AVX512F (for AMD64).                                                                              |\n| `hsd_get_fp_mode_status()`         | `hsd_fp_status_t` | Get current floating-point flush-to-zero mode (FTZ) and denormals-are-zero mode (DAZ) status.                                             |\n| `hsd_set_manual_backend(backend)`  | `hsd_status_t`    | Override backend auto‑dispatch mechanism and force a specific backend to be used (e.g. AVX2 or NEON). `backend` is of type `HSD_Backend`. |\n| `hsd_get_current_backend_choice()` | `HSD_Backend`     | Get the current backend that is being used.                                                                                               |\n\n#### Types and Enums\n\nThe return type of the distance and similarity functions is `hsd_status_t`, which is defined as follows:\n\n```c\ntypedef enum {\n    HSD_SUCCESS               =  0,  // Operation was successful (e.g. result in *r is valid)\n    HSD_ERR_NULL_PTR          = -1,  // NULL pointer encountered (e.g. a or b is NULL)\n    HSD_ERR_INVALID_INPUT     = -3,  // NaN or Inf value encountered (e.g. a or b contains NaN or Inf)\n    HSD_ERR_CPU_NOT_SUPPORTED = -4,  // CPU does not support the required SIMD instruction set (backend)\n    HSD_FAILURE               = -99  // A generic failure occurred (e.g. unknown error)\n} hsd_status_t;\n```\n\nThe `hsd_fp_status_t` struct is defined as follows:\n\n```c\ntypedef struct {\n    bool ftz_enabled; // True if FTZ mode is enabled, false otherwise\n    bool daz_enabled; // True if DAZ mode is enabled, false otherwise\n} hsd_fp_status_t;\n```\n\n\u003e [!NOTE]\n\u003e FTZ and DAZ modes are used to flush denormal numbers to zero in floating-point calculations.\n\u003e If enabled, they can improve performance on some CPUs, especially when dealing with small floating-point numbers.\n\u003e However, they can also lead to less accurate results.\n\nThe `HSD_Backend` enum is defined as follows:\n\n```c\ntypedef enum {\n    HSD_BACKEND_AUTO = 0, // Backend is automatically selected at runtime (default)\n    HSD_BACKEND_SCALAR, // Fallback scalar backend (no SIMD instructions)\n\n    /* AMD64 (AKA X86_64) backends */\n    HSD_BACKEND_AVX, // AVX backend\n    HSD_BACKEND_AVX2, // AVX2 backend\n    HSD_BACKEND_AVX512F, // AVX512F backend\n    HSD_BACKEND_AVX512BW, // AVX512BW backend\n    HSD_BACKEND_AVX512DQ, // AVX512DQ backend\n    HSD_BACKEND_AVX512VPOPCNTDQ, // AVX512VPOPCNTDQ backend\n\n    /* AArch64 (AKA ARM64) backends */\n    HSD_BACKEND_NEON, // NEON backend\n    HSD_BACKEND_SVE // SVE backend\n} HSD_Backend;\n```\n\n#### Backend Selection\n\nHsdlib automatically detects the best backend to use based on the CPU features available at runtime.\nNevertheless, `hsd_set_manual_backend(backend)` can be used to force a specific `backend` like `HSD_BACKEND_AVX2` or\n`HSD_BACKEND_NEON`.\nIn case the CPU does not support the required instruction set, the function will return `HSD_ERR_CPU_NOT_SUPPORTED` and\n`BACKEND_SCALAR` will be used as the fallback backend.\n\n\u003e [!NOTE]\n\u003e Normally, using `HSD_BACKEND_AUTO` is recommended because it allows the library to select the best backend for\n\u003e the CPU in most cases automatically at runtime.\n\n---\n\n### Tests and Benchmarks\n\n| File                  | Description               |\n|:----------------------|:--------------------------|\n| [`tests`](tests/)     | Unit tests for Hsdlib API |\n| [`benches`](benches/) | Benchmarks for Hsdlib API |\n\nTo run the tests and benchmarks, use the `make test` and `make bench` commands.\n\n### Compatibility\n\nHsdlib is compatible with C11 standard and later.\nIt was built and tested on Linux, macOS, and Windows for CPUs with AMD64 and AArch64 architectures.\nGCC (12.2 and newer) was used for building the library, but other compilers like Clang should work as well.\n\n---\n\n### Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to make a contribution.\n\n### License\n\nThis project is licensed under the MIT License ([LICENSE](LICENSE)).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhabedi%2Fhsdlib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhabedi%2Fhsdlib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhabedi%2Fhsdlib/lists"}