{"id":46667641,"url":"https://github.com/nnethercott/hannoy","last_synced_at":"2026-04-07T13:01:09.967Z","repository":{"id":299308573,"uuid":"1002366715","full_name":"nnethercott/hannoy","owner":"nnethercott","description":"Production-ready KV-backed HNSW implementation in Rust using LMDB","archived":false,"fork":false,"pushed_at":"2026-04-07T11:21:47.000Z","size":1889,"stargazers_count":77,"open_issues_count":4,"forks_count":9,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-04-07T11:32:19.923Z","etag":null,"topics":["approximate-nearest-neighbor-search","diskann","hnsw","lmdb","python","rust","vector-database"],"latest_commit_sha":null,"homepage":"https://docs.rs/hannoy","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nnethercott.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-15T10:14:34.000Z","updated_at":"2026-04-07T11:21:13.000Z","dependencies_parsed_at":"2025-06-15T22:43:49.719Z","dependency_job_id":"138a9c75-253e-4442-9879-cc1be128d5ac","html_url":"https://github.com/nnethercott/hannoy","commit_stats":null,"previous_names":["nnethercott/hannoy"],"tags_count":21,"template":false,"template_full_name":null,"purl":"pkg:github/nnethercott/hannoy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nnethercott%2Fhannoy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nnethercott%2Fhannoy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nnethercott%2Fhannoy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nnethercott%2Fhannoy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nnethercott","download_url":"https://codeload.github.com/nnethercott/hannoy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nnethercott%2Fhannoy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31513382,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-nearest-neighbor-search","diskann","hnsw","lmdb","python","rust","vector-database"],"created_at":"2026-03-08T20:32:59.900Z","updated_at":"2026-04-07T13:01:09.961Z","avatar_url":"https://github.com/nnethercott.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\u003cimg width=\"280px\" title=\"this is a cowboy bebop ref\" src=\"assets/hanoi_new.png\"\u003e\u003c/a\u003e\n\u003ch1 align=\"center\"\u003ehannoy 🗼\u003c/h1\u003e\n\n[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n[![Crates.io](https://img.shields.io/crates/v/hannoy)](https://crates.io/crates/hannoy)\n[![dependency status](https://deps.rs/repo/github/nnethercott/hannoy/status.svg)](https://deps.rs/repo/github/nnethercott/hannoy)\n[![Build](https://github.com/nnethercott/hannoy/actions/workflows/rust.yml/badge.svg?event=pull_request)](https://github.com/nnethercott/hannoy/actions/workflows/rust.yml)\n[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/nnethercott/hannoy)\n\nhannoy is a key-value backed [HNSW](https://www.pinecone.io/learn/series/faiss/hnsw/) implementation based on [arroy](https://github.com/meilisearch/arroy).\n\n## Motivation\nMany popular HNSW libraries are built in memory, meaning you need enough RAM to store all the vectors you're indexing. Instead, `hannoy` uses [LMDB](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database) — a memory-mapped KV store — as a storage backend. This is more well-suited for machines running multiple programs, or cases where the dataset you're indexing won't fit in memory. LMDB also supports non-blocking concurrent reads by design, meaning its safe to query the index in multi-threaded environments.\n\n## Features\n- Supported metrics: [euclidean](https://en.wikipedia.org/wiki/Euclidean_distance#:~:text=In%20mathematics%2C%20the%20Euclidean%20distance,occasionally%20called%20the%20Pythagorean%20distance.), [cosine](https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_distance), [manhattan](https://en.wikipedia.org/wiki/Taxicab_geometry), [hamming](https://en.wikipedia.org/wiki/Hamming_distance), as well as quantized counterparts.\n- Python bindings with [maturin](https://github.com/PyO3/maturin) and [pyo3](https://github.com/PyO3/pyo3) \n- Multithreaded builds using rayon\n- Disk-backed storage to enable indexing datasets that won't fit in RAM using LMDB\n- [Compressed bitmaps](https://github.com/RoaringBitmap/roaring-rs) to store graph edges with minimal overhead, adding ~200 bytes per vector\n- Dynamic document insertions and deletions without full re-indexing\n\n## Missing Features\n- GPU-accelerated indexing\n\n## Usage\n### Rust 🦀\n```rust\nuse hannoy::{distances::Cosine, Database, Reader, Result, Writer};\nuse heed::EnvOpenOptions;\nuse rand::{rngs::StdRng, SeedableRng};\n\nfn main() -\u003e Result\u003c()\u003e {\n    let env = unsafe {\n        EnvOpenOptions::new()\n            .map_size(1024 * 1024 * 1024) // 1GiB\n            .open(\"./\")\n    }\n    .unwrap();\n\n    let mut wtxn = env.write_txn()?;\n    let db: Database\u003cCosine\u003e = env.create_database(\u0026mut wtxn, None)?;\n    let writer: Writer\u003cCosine\u003e = Writer::new(db, 0, 3);\n\n    // build\n    writer.add_item(\u0026mut wtxn, 0, \u0026[1.0, 0.0, 0.0])?;\n    writer.add_item(\u0026mut wtxn, 0, \u0026[0.0, 1.0, 0.0])?;\n\n    let mut rng = StdRng::seed_from_u64(42);\n    let mut builder = writer.builder(\u0026mut rng);\n    builder.ef_construction(100).build::\u003c16,32\u003e(\u0026mut wtxn)?;\n    wtxn.commit()?;\n\n    // search\n    let rtxn = env.read_txn()?;\n    let reader = Reader::\u003cCosine\u003e::open(\u0026rtxn, 0, db)?;\n\n    let query = vec![0.0, 1.0, 0.0];\n    let nns = reader.nns(1).ef_search(10).by_vector(\u0026rtxn, \u0026query)?.into_nns();\n\n    dbg!(\"{:?}\", \u0026nns);\n    Ok(())\n}\n```\n\n### Python 🐍\n```python\nimport hannoy\nfrom hannoy import Metric\nimport tempfile\n\ntmp_dir = tempfile.gettempdir()\ndb = hannoy.Database(tmp_dir, Metric.COSINE)\n\nwith db.writer(3, m=4, ef=10) as writer:\n    writer.add_item(0, [1.0, 0.0, 0.0])\n    writer.add_item(1, [0.0, 1.0, 0.0])\n\nreader = db.reader()\nnns = reader.by_vec([0.0, 1.0, 0.0], n=2)\n\n(closest, dist) = nns[0]\n```\n\n## Tips and tricks\n### Reducing cold start latencies\nSearch in an hnsw always traverses from the top to bottom layers of the graph, so we know a priori some vectors will be needed. We can hint to the kernel that these vectors (and their neighbours) should be loaded into RAM using [`madvise`](https://man7.org/linux/man-pages/man2/madvise.2.html) to speed up search.\n\nDoing so can reduce cold-start latencies by several milliseconds, and is configured through the `HANNOY_READER_PREFETCH_MEMORY` environment variable.\n\nE.g. prefetching 10MiB of vectors into RAM.\n```bash\nexport HANNOY_READER_PREFETCH_MEMORY=10485760\n```\n\n\n\u003c!-- ## ideas for improvement --\u003e\n\u003c!-- - keep a counter of most frequently accessed nodes during build and make those entry points (e.g. use centroid-like) --\u003e\n\u003c!-- - merge upper layers of graph if they only have one element --\u003e\n\u003c!-- - product quantization `UnalignedVectorCodec` --\u003e\n\u003c!-- - cache layers 1-\u003eL in RAM (speeds up M*(L-1) reads) using a hash table storing raw byte offsets and lengths --\u003e\n\u003c!-- - *threadpool for `Reader` to parallelize searching neighbours --\u003e\n\u003c!----\u003e\n\u003c!-- - change Metadata.entry_points from `Vec\u003cu32\u003e` to a `RoaringBitmap` to avoid manually deduplicating entries --\u003e\n\u003c!----\u003e\n\u003c!-- - TODO: check if using \\alpha sng improves recall on incremental builds, e.g. with alpha=1.2 or something (single pass not twice over) --\u003e\n\u003c!--   - id *does* but it also increases build time (if used for entire build). also not a magic bullet. --\u003e\n\u003c!-- - ask what's wrong with a global pool for doing vector-vector ops and sending back to search thread ? --\u003e\n\u003c!-- - could we also reindex points on levels \u003e 0 during incremental build ? --\u003e\n\u003c!-- - need to try building whole index, then deleting \u0026 inserting instead of 2-phase build --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnnethercott%2Fhannoy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnnethercott%2Fhannoy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnnethercott%2Fhannoy/lists"}