{"id":26648957,"url":"https://github.com/habedi/hann","last_synced_at":"2025-09-20T23:04:12.902Z","repository":{"id":283743325,"uuid":"951217715","full_name":"habedi/hann","owner":"habedi","description":"A fast approximate nearest neighbor search library for Go","archived":false,"fork":false,"pushed_at":"2025-08-20T10:30:51.000Z","size":116,"stargazers_count":200,"open_issues_count":6,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-08-20T11:33:46.070Z","etag":null,"topics":["approximate-nearest-neighbor-search","go","golang","indexing-algorithms","nearest-neighbor-search","search-algorithms","similarity-search","vector-search"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/habedi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-19T10:41:53.000Z","updated_at":"2025-07-31T09:37:09.000Z","dependencies_parsed_at":"2025-03-21T22:29:47.139Z","dependency_job_id":"e9b84691-81e0-4f3e-8e23-33f9633a787c","html_url":"https://github.com/habedi/hann","commit_stats":null,"previous_names":["habedi/hann"],"tags_count":6,"template":false,"template_full_name":"habedi/template-go-project","purl":"pkg:github/habedi/hann","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhann","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhann/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhann/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhann/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/habedi","download_url":"https://codeload.github.com/habedi/hann/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/habedi%2Fhann/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276169664,"owners_count":25596956,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-20T02:00:10.207Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-nearest-neighbor-search","go","golang","indexing-algorithms","nearest-neighbor-search","search-algorithms","similarity-search","vector-search"],"created_at":"2025-03-25T00:47:28.100Z","updated_at":"2025-09-20T23:04:12.896Z","avatar_url":"https://github.com/habedi.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cpicture\u003e\n    \u003cimg alt=\"Hiking Gopher\" src=\"logo.svg\" height=\"30%\" width=\"30%\"\u003e\n\u003c/picture\u003e\n\u003cbr\u003e\n\n\u003ch1\u003eHann\u003c/h1\u003e\n\n[![Tests](https://img.shields.io/github/actions/workflow/status/habedi/hann/tests.yml?label=tests\u0026style=flat\u0026labelColor=282c34\u0026logo=github)](https://github.com/habedi/hann/actions/workflows/tests.yml)\n[![Lints](https://img.shields.io/github/actions/workflow/status/habedi/hann/lints.yml?label=lints\u0026style=flat\u0026labelColor=282c34\u0026logo=github)](https://github.com/habedi/hann/actions/workflows/lints.yml)\n[![Code Coverage](https://img.shields.io/codecov/c/github/habedi/hann?label=coverage\u0026style=flat\u0026labelColor=282c34\u0026logo=codecov)](https://codecov.io/gh/habedi/hann)\n[![CodeFactor](https://img.shields.io/codefactor/grade/github/habedi/hann?label=code%20quality\u0026style=flat\u0026labelColor=282c34\u0026logo=codefactor)](https://www.codefactor.io/repository/github/habedi/hann)\n[![Go Reference](https://img.shields.io/badge/reference-docs-3776ab?style=flat\u0026labelColor=282c34\u0026logo=go)](https://pkg.go.dev/github.com/habedi/hann)\n[![License](https://img.shields.io/badge/license-MIT-00acc1?label=license\u0026style=flat\u0026labelColor=282c34\u0026logo=open-source-initiative)](LICENSE)\n[![Release](https://img.shields.io/github/release/habedi/hann.svg?label=release\u0026style=flat\u0026labelColor=282c34\u0026logo=github\u0026color=f06623)](https://github.com/habedi/hann/releases/latest)\n\nA fast approximate nearest neighbor search library for Go\n\n\u003c/div\u003e\n\n---\n\nHann is a high-performance approximate nearest neighbor search (ANN) library for Go.\nIt provides a collection of index data structures for efficient similarity search in high-dimensional spaces.\nSupported indexes include Hierarchical Navigable Small World (HNSW),\nProduct Quantization Inverted File (PQIVF), and Random Projection Tree (RPT).\n\nHann can be seen as a core component of a vector database (like Milvus, Pinecone, Weaviate, Qdrant, etc.).\nIt can be used to add fast in-memory similarity search capabilities to your Go applications.\n\n### Features\n\n- Unified interface for different indexes (see [core/index.go](core/index.go))\n- Support for indexing and searching vectors of arbitrary dimension\n- Fast distance computation using SIMD (AVX) instructions (see [core/simd_distance.c](core/simd_distance.c))\n- Support for bulk insertion, deletion, and updates\n- Support for saving indexes to disk and loading them back\n\n### Indexes\n\n| Index Name                                            | Space Complexity | Build Complexity | Search Complexity                             |\n|-------------------------------------------------------|------------------|------------------|-----------------------------------------------|\n| [HNSW](https://arxiv.org/abs/1603.09320)              | $O(nd + nM)$     | $O(n\\log n)$     | $O(\\log n)$ average case\u003cbr\u003e$O(n)$ worst case |\n| [PQIVF](https://ieeexplore.ieee.org/document/5432202) | $O(nk + kd)$     | $O(nki)$         | $O(\\frac{n}{k})$                              |\n| [RPT](https://dl.acm.org/doi/10.1145/1374376.1374452) | $O(nd)$          | $O(n\\log n)$     | $O(\\log n)$ average case\u003cbr\u003e$O(n)$ worst case |\n\n- $n$: number of vectors\n- $d$: number of dimensions (vector length)\n- $M$: links per node (HNSW)\n- $k$: number of clusters (PQIVF)\n- $i$: iterations for clustering (PQIVF)\n\n#### Supported Distances\n\nThe HNSW index supports the use of Euclidean, squared Euclidean, Manhattan, and cosine distances.\nIf cosine distance is used, the vectors are normalized (L2-normalization) both at insertion and at query time.\nNote that squared Euclidean distance is slightly faster to compute than Euclidean distance\nand gives the same order of closest vectors as Euclidean distance.\nIt can be used in place of Euclidean distance if only the order of closest vectors to the query vector is needed, not\nthe actual distances.\n\nThe PQIVF and RPT indexes support Euclidean distance only.\n\n### Installation\n\nHann can be installed as a typical Go module using the following command:\n\n```bash\ngo get github.com/habedi/hann@main\n```\n\nHann requires Go 1.21 or later, a C (or C++) compiler, and a CPU that supports AVX instructions.\n\n### Examples\n\n| Example File                                 | Description                                                               |\n|----------------------------------------------|---------------------------------------------------------------------------|\n| [simple_hnsw.go](example/cmd/simple_hnsw.go) | Create and use an HNSW index with inline data                             |\n| [hnsw.go](example/cmd/hnsw.go)               | Create and use an HNSW index                                              |\n| [hnsw_large.go](example/cmd/hnsw_large.go)   | Create and use an HNSW index (using large datasets)                       |\n| [bench_hnsw.go](example/cmd/bench_hnsw.go)   | Local benchmarks for the HNSW index                                       |\n| [pqivf.go](example/cmd/pqivf.go)             | Create and use a PQIVF index                                              |\n| [pqivf_large.go](example/cmd/pqivf_large.go) | Create and use a PQIVF index (using large datasets)                       |\n| [bench_pqivf.go](example/cmd/bench_pqivf.go) | Local benchmarks for the PQIVF index                                      |\n| [rpt.go](example/cmd/rpt.go)                 | Create and use an RPT index                                               |\n| [rpt_large.go](example/cmd/rpt_large.go)     | Create and use an RPT index (using large datasets)                        |\n| [bench_rpt.go](example/cmd/bench_rpt.go)     | Local benchmarks for the RPT index                                        |\n| [load_data.go](example/load_data.go)         | Helper functions for loading example datasets                             |\n| [utils.go](example/utils.go)                 | Extra helper functions for the examples                                   |\n| [run_datasets.go](example/run_datasets.go)   | The code to create different indexes and try them with different datasets |\n\n#### Datasets\n\nUse the following commands to download the datasets used in the examples:\n\n```shell\nmake download-data\n```\n\n```shell\n# Only needed to run the examples that use large datasets\nmake download-data-large\n```\n\nNote that to run the examples using large datasets, possibly a machine with large amounts of memory is needed\n(like 32 GB of RAM or more).\n\nCheck the [data](example/data) directory for information about the datasets.\n\n---\n\n### Documentation\n\nThe detailed documentation for Hann packages is available on [pkg.go.dev](https://pkg.go.dev/github.com/habedi/hann).\n\n#### HNSW Index\n\nThe [`hnsw`](hnsw) package provides an implementation of the HNSW graph index introduced\nby [Malkov and Yashunin (2016)](https://arxiv.org/abs/1603.09320).\nHNSW organizes data into multiple layers of a proximity graph, which allows fast approximate nearest neighbor searches\nby greedily traversing the graph from top to bottom.\n\nThe index has the following configurable parameters:\n\n- **M**: Controls the maximum number of neighbor connections per node. Higher values improve accuracy but increase\n  memory and indexing time (typical range: 5–48).\n- **Ef**: Defines search breadth during insertion and searching. Higher values improve accuracy but\n  increase computational cost for indexing and searching (typical range: 10–200).\n\n#### PQIVF Index\n\nThe [`pqivf`](pqivf) package provides an implementation of the PQIVF index introduced\nby [Jegou et al. (2011)](https://ieeexplore.ieee.org/document/5432202).\nPQIVF first clusters data into coarse groups (inverted lists), then compresses vectors in each cluster using [product\nquantization](https://ieeexplore.ieee.org/document/5432202).\nThis allows fast approximate nearest neighbor searches by limiting queries to relevant clusters and\nefficiently comparing compressed vectors, which reduces search time and storage requirements.\n\n\u003e [!NOTE]\n\u003e Before searching, the index must be trained using the `Train()` method.\n\u003e This method should be called after adding data to the index.\n\u003e Any operation that invalidates the trained state of the index (like `BulkDelete`) will need the index to be retrained.\n\nThe index has the following configurable parameters:\n\n- **coarseK**: Controls the number of coarse clusters for initial quantization. Higher values improve search performance\n  but increase indexing time (typical range: 50–4096).\n- **numSubquantizers**: Determines the number of subspaces for product quantization. More subquantizers improve\n  compression and accuracy at the cost of increased indexing time (typical range: 4–16).\n- **pqK**: Sets the number of codewords per subquantizer. Higher values increase accuracy and storage usage (typical\n  value: 256).\n- **kMeansIters**: Number of iterations used to train the product quantization codebooks (recommended value: 25).\n\n#### RPT Index\n\nThe [`rpt`](rpt) package provides an implementation of the RPT index introduced\nby [Dasgupta and Freund (2008)](https://dl.acm.org/doi/10.1145/1374376.1374452).\nRPT recursively partitions data using randomly generated hyperplanes to build a tree structure, which allows efficient\napproximate nearest neighbor searches through a tree traversal process.\n\nThe index has the following configurable parameters:\n\n- **leafCapacity**: Controls the maximum number of vectors stored in each leaf node. Lower values increase tree depth,\n  improving search speed but slightly increasing indexing time (typical range: 5–50).\n- **candidateProjections**: Number of random projections considered at each tree split. Higher values improve split\n  quality at the cost of increased indexing time (typical range: 1–10).\n- **parallelThreshold**: Minimum number of vectors in a subtree to trigger parallel construction. Higher values lead to\n  better concurrency during indexing but use more memory (typical value: 100).\n- **probeMargin**: Margin used to determine additional branches probed during searches. Higher values improve recall but\n  increase search overhead because of additional distance computations (typical range: 0.1–0.5).\n\n#### Logging\n\nThe verbosity level of logs produced by Hann can be controlled using the `HANN_LOG` environment variable.\nPossible values include:\n\n- `0`, `false`, or `off` to disable logging altogether;\n- `full` or `all` to enable full logging (`DEBUG` level);\n- Use any other value (including not setting the `HANN_LOG` environment variable) to enable basic logging (`INFO` level;\n  default behavior).\n\n#### Random Seed\n\nFor more consistent indexing and search results across different runs, set the `HANN_SEED` environment variable to an\ninteger.\nThis will initialize the random number generator, but some variations are still possible (for example, due to\nmultithreading).\n\n#### Benchmarks\n\nLocal benchmarks can be run using the following command:\n\n```shell\nmake run-benches\n```\n\nTo run the benchmarks, the example datasets must be downloaded first using `make download-data`\nor manually (see [data](example/data)).\n\nSet the `HANN_BENCH_NTRD` environment variable to control how many threads are used for queries during benchmarks\n(default is 6).\n\n---\n\n### Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to make a contribution.\n\n### License\n\nHann is licensed under the MIT License ([LICENSE](LICENSE)).\n\n### Acknowledgments\n\n* The logo is named the \"Hiking Gopher\" and was created by [Egon Elbre](https://github.com/egonelbre/gophers).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhabedi%2Fhann","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhabedi%2Fhann","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhabedi%2Fhann/lists"}