{"id":48015401,"url":"https://github.com/stffns/snapvec","last_synced_at":"2026-04-07T16:01:03.518Z","repository":{"id":348329316,"uuid":"1197532233","full_name":"stffns/snapvec","owner":"stffns","description":"Fast compressed ANN search via randomized Hadamard transform + Lloyd-Max quantization. Pure NumPy.","archived":false,"fork":false,"pushed_at":"2026-03-31T17:34:41.000Z","size":23,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-04T13:48:28.541Z","etag":null,"topics":["ann","embeddings","hadamard","numpy","quantization","rag","vector-search"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/snapvec/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stffns.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-31T16:53:27.000Z","updated_at":"2026-03-31T17:34:45.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/stffns/snapvec","commit_stats":null,"previous_names":["stffns/snapvec"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/stffns/snapvec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stffns%2Fsnapvec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stffns%2Fsnapvec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stffns%2Fsnapvec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stffns%2Fsnapvec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stffns","download_url":"https://codeload.github.com/stffns/snapvec/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stffns%2Fsnapvec/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31437927,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T13:13:19.330Z","status":"ssl_error","status_checked_at":"2026-04-05T13:13:17.778Z","response_time":75,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ann","embeddings","hadamard","numpy","quantization","rag","vector-search"],"created_at":"2026-04-04T13:43:24.763Z","updated_at":"2026-04-05T14:00:59.715Z","avatar_url":"https://github.com/stffns.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# snapvec\n\n**Fast compressed approximate nearest-neighbor search.  Pure NumPy.  No heavy dependencies.**\n\n`snapvec` implements the TurboQuant compression pipeline — randomized Hadamard transform followed by optimal Gaussian scalar quantization (Lloyd-Max) — as a self-contained Python library for embedding vector search.  It achieves **8–12× compression** with **\u003e0.92 recall@10** against float32 brute-force, using only NumPy.\n\n```\npip install snapvec\n```\n\n---\n\n## Quick start\n\n```python\nimport numpy as np\nfrom snapvec import SnapIndex\n\n# Build index\nidx = SnapIndex(dim=384, bits=4)          # 4-bit, ~8x compression\nidx.add_batch(ids=list(range(N)), vectors=embeddings)\n\n# Query\nresults = idx.search(query_vector, k=10)     # [(id, score), ...]\n\n# Persist\nidx.save(\"my_index.snpv\")\nidx2 = SnapIndex.load(\"my_index.snpv\")   # atomic save, v1/v2 compatible\n```\n\n---\n\n## Technical background\n\n### The problem: embedding vectors are expensive\n\nModern embedding models produce float32 vectors of dimension `d ∈ {384, 768, 1536}`.\nStoring N vectors requires `4·N·d` bytes; brute-force search costs `O(N·d)` per query.\nFor N = 1M, d = 384: **1.5 GB RAM**, with inner products dominating inference time.\n\nProduct Quantization (PQ) splits vectors into M sub-vectors and quantizes each\nindependently. It is effective but requires training a K-means codebook per dataset.\nRandom Binary Quantization (RaBitQ, 1-bit) is fast but coarse.\n\n**TurboQuant** (Zandieh et al., ICLR 2026, [arXiv:2504.19874](https://arxiv.org/abs/2504.19874))\nachieves near-optimal distortion at b bits per coordinate **without training codebooks**,\nby first rotating the space with a randomized Hadamard transform to make coordinates\napproximately Gaussian, then quantizing each coordinate independently with the\noptimal scalar quantizer for N(0,1).\n\n---\n\n### Algorithm\n\n#### Step 1 — Normalize\n\nGiven a raw embedding `v ∈ ℝᵈ`, compute the unit vector `v̂ = v / ‖v‖` and store\n`‖v‖` separately (float32, 4 bytes per vector).\n\n#### Step 2 — Randomized Hadamard Transform (RHT)\n\nPad `v̂` to the next power of 2 (`d' = 2^⌈log₂ d⌉`), then apply:\n\n```\nx = (1/√d') · H · D · v̂\n```\n\nwhere:\n- `D = diag(σ₁, …, σ_d')` — diagonal matrix of i.i.d. ±1 random signs (seed-deterministic)\n- `H` — unnormalized Walsh-Hadamard matrix (butterfly pattern)\n\nBy the Johnson-Lindenstrauss lemma, each coordinate `xᵢ ≈ N(0, 1/d')`.\nAfter rescaling `x̃ = x · √d'`, the coordinates are approximately `N(0,1)`\nregardless of the original distribution of `v`.\n\n**Complexity:** O(d log d) — no matrix multiplication, no codebook training.\n\n#### Step 3 — Lloyd-Max scalar quantization\n\nThe optimal scalar quantizer for N(0,1) at b bits partitions ℝ into 2^b intervals\nand assigns each the conditional mean as reconstruction value.\nThese boundaries and centroids are precomputed and hardcoded in `snapvec._codebooks`\n(no scipy required at runtime):\n\n| bits | levels | distortion (MSE) | bytes/coord (disk) |\n|------|--------|------------------|--------------------|\n| 2    | 4      | 0.1175           | 0.25               |\n| 3    | 8      | 0.0311           | 0.375              |\n| 4    | 16     | 0.0077           | 0.50               |\n\nThe quantized vector is stored as a `uint8` index matrix, bit-packed to `b/8`\nbytes per coordinate on disk.\n\n#### Step 4 — Approximate inner product\n\nAt search time the query `q` is rotated (not quantized) and the approximate\ncosine similarity is computed as:\n\n```\nscore(q, v) = (1/d') · Σᵢ centroid[idx_qᵢ] · centroid[idx_vᵢ]\n```\n\nThis is a single float16 matrix–vector product against the cached centroid expansions.\n\n---\n\n### TurboQuant_prod: unbiased estimator with QJL correction\n\nThe MSE quantizer introduces a small systematic downward bias. The `use_prod=True` mode\ncorrects this using a **Quantized Johnson-Lindenstrauss (QJL)** residual:\n\n**Build time (per stored vector):**\n\n1. Quantize at `(b-1)` bits MSE, compute residual `r = x̃ - x̃_MSE`\n2. Store `sign(S·r)` as a 1-bit vector (int8 ±1 in practice),\n   where `S ∈ ℝ^(d'×d')` is a fixed random Gaussian matrix\n3. Store `‖r‖ / √d'` (one float32 per vector)\n\n**Query time (correction term):**\n\n```\ncorrectionᵢ = √(π/2) / d' · ‖rᵢ‖ · dot(S·q̂, sign(S·rᵢ))\nfinal_scoreᵢ = mse_scoreᵢ + correctionᵢ\n```\n\nThis follows from Lemma 4 of Zandieh et al. (2025):\n`E[sign(S·r)] = √(2/π) · S·r / ‖S·r‖`, giving an unbiased estimate of `⟨r, q̂⟩`.\n\n**When to use `use_prod=True`:**\n- When you need accurate inner product magnitudes (KV-cache, attention approximation)\n- **Not** recommended for pure ranking/NNS — the added QJL variance degrades recall@k\n  relative to MSE-only at equal total bits\n\n---\n\n### Compression ratios\n\nFor N vectors of dimension `d = 384` (BGE-small):\n\n| Backend        | Bytes/vector (disk) | Ratio vs float32 |\n|----------------|---------------------|------------------|\n| float32        | 1 536               | 1.0×             |\n| 4-bit snapvec  | 192 + 4             | **7.9×**         |\n| 3-bit snapvec  | 144 + 4             | **10.4×**        |\n| 2-bit snapvec  | 96 + 4              | **15.4×**        |\n| int8 (naïve)   | 384 + 4             | 3.9×             |\n\nThe 4-byte overhead is the per-vector norm. In RAM, indices are stored as uint8\n(~3× vs float32); bit-packing applies on disk (~8× vs float32 at 4-bit).\n\n---\n\n### Recall benchmarks\n\nMeasured on synthetic unit-sphere vectors (`d=384`, `N=10 000`, 100 queries).\n**Baseline: exact cosine float32 brute-force.**\n\n| bits | recall@1 | recall@10 | recall@50 |\n|------|----------|-----------|-----------|\n| 2    | 0.72     | 0.83      | 0.91      |\n| 3    | 0.81     | 0.91      | 0.96      |\n| 4    | 0.86     | 0.93      | 0.95      |\n\nRecall improves with clustered (real-world) data. On BGE-small-en embeddings\nfrom mixed document corpora, 4-bit achieves **recall@10 ≈ 0.95**.\n\n\u003e **Note on published results:** The TurboQuant paper (Zandieh et al., 2025) reports\n\u003e recall up to 0.99, measured against HNSW graph navigation (not brute-force float32),\n\u003e on GloVe `d=200` data, using recall@1 with large `k_probe`. These conditions differ\n\u003e from the above; both results are correct under their respective definitions.\n\n---\n\n### File format (`.snpv`)\n\n```\nOffset  Size   Field\n──────────────────────────────────────────────────\n0       4 B    magic: \"HDMX\"\n4       4 B    version: uint32 (1 or 2)\n8       4 B    dim: uint32  — original embedding dimension\n12      4 B    bits: uint32 — total bits (2, 3, or 4)\n16      4 B    seed: uint32 — rotation seed\n20      4 B    n: uint32    — number of stored vectors\n24      4 B    flags: uint32 — bit-0: use_prod  [v2 only]\n──────────────────────────────────────────────────\n28      4 B    packed_len: uint32\n32      *      indices: bit-packed uint8 MSE indices\n       n×4 B   norms: float32 per-vector original norms\n[prod only]\n       n×d' B  qjl_signs: int8 sign(S·r) per vector\n       n×4 B   rnorms: float32 ‖r‖/√d per vector\n──────────────────────────────────────────────────\n       n×(2+L) ids: uint16-length-prefixed UTF-8 strings\n```\n\nSaves are **atomic** on POSIX: writes to `.snpv.tmp` then `os.replace()`.\nBackward compatible: v1 files (mse-only) load correctly in any version.\n\n---\n\n## API reference\n\n### `SnapIndex(dim, bits=4, seed=0, use_prod=False)`\n\n| Parameter  | Type  | Default | Description |\n|------------|-------|---------|-------------|\n| `dim`      | int   | —       | Embedding dimension |\n| `bits`     | int   | 4       | Bits per coordinate: 2, 3, or 4 |\n| `seed`     | int   | 0       | Rotation seed — must be consistent across build and query |\n| `use_prod` | bool  | False   | Enable QJL unbiased estimator (requires bits ≥ 3) |\n\n### Methods\n\n```python\nidx.add(id, vector)                    # Add one vector\nidx.add_batch(ids, vectors)            # Add N vectors (~50x faster than loop)\nidx.delete(id) -\u003e bool                 # Remove by id, O(1) lookup\nidx.search(query, k=10) -\u003e list        # [(id, score), ...] descending\nidx.save(path)                         # Atomic binary save to .snpv\nSnapIndex.load(path)                # Load from .snpv file\nidx.stats() -\u003e dict                    # Compression / memory diagnostics\nlen(idx)                               # Number of stored vectors\nrepr(idx)                              # SnapIndex(dim=384, bits=4, mode=mse, n=1000)\n```\n\n---\n\n## Relation to TurboQuant / PolarQuant\n\n`snapvec` implements the core compression pipeline from:\n\n\u003e Zandieh, A., Daliri, M., Hadian, A., \u0026 Mirrokni, V. (2025).\n\u003e **TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate.**\n\u003e *ICLR 2026.* [arXiv:2504.19874](https://arxiv.org/abs/2504.19874)\n\nThe same algorithm was published concurrently as \"PolarQuant\" at AISTATS 2026.\nBoth names were already taken on PyPI; `snapvec` = **Hada**mard + Lloyd-**Max**,\nnamed after its two core operations.\n\nKey contributions of this implementation over the reference:\n\n- **No scipy** — codebooks hardcoded, numpy is the only runtime dependency\n- **Batch WHT** — single O(n·d·log d) call for bulk inserts (~50x faster than loop)\n- **Float16 cache** — centroid expansions in half precision, ~2x faster matmul\n- **O(1) delete** — `_id_to_pos` dict + position compaction\n- **Atomic saves** — `.snpv.tmp` → `os.replace()` pattern\n- **Versioned format** — v1/v2 both loadable, forward-compatible flags field\n\n---\n\n## Installation\n\n```bash\npip install snapvec\n```\n\n**Requirements:** Python \u003e= 3.10, NumPy \u003e= 1.24.  No other runtime dependencies.\n\nFor development:\n\n```bash\ngit clone https://github.com/stffns/snapvec\ncd snapvec\npip install -e .\npytest tests/ -v\n```\n\n---\n\n## License\n\nMIT © 2025 Jayson Steffens.\n\nThe TurboQuant algorithm is described in [arXiv:2504.19874](https://arxiv.org/abs/2504.19874)\nby Zandieh et al. (Google Research / ICLR 2026). This package is an independent implementation.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstffns%2Fsnapvec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstffns%2Fsnapvec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstffns%2Fsnapvec/lists"}