{"id":48911844,"url":"https://github.com/ilyajob05/turboquant-space","last_synced_at":"2026-04-19T23:10:57.773Z","repository":{"id":350755775,"uuid":"1208135140","full_name":"ilyajob05/turboquant-space","owner":"ilyajob05","description":"SIMD-accelerated 4/8-bit vector quantization for approximate nearest neighbor search, based on TurboQuant (ICLR 2026). Standalone C++17 library with Python bindings","archived":false,"fork":false,"pushed_at":"2026-04-15T22:50:47.000Z","size":259,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-17T00:39:37.244Z","etag":null,"topics":["avx","cpp17","header-only","python","quantization","simd","turbo-quant","turboquant","vector-quantization"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ilyajob05.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-11T21:38:56.000Z","updated_at":"2026-04-15T22:50:52.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ilyajob05/turboquant-space","commit_stats":null,"previous_names":["ilyajob05/turboquant-space"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ilyajob05/turboquant-space","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ilyajob05%2Fturboquant-space","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ilyajob05%2Fturboquant-space/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ilyajob05%2Fturboquant-space/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ilyajob05%2Fturboquant-space/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ilyajob05","download_url":"https://codeload.github.com/ilyajob05/turboquant-space/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ilyajob05%2Fturboquant-space/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32025817,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T20:23:30.271Z","status":"online","status_checked_at":"2026-04-19T02:00:07.110Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avx","cpp17","header-only","python","quantization","simd","turbo-quant","turboquant","vector-quantization"],"created_at":"2026-04-17T00:04:16.240Z","updated_at":"2026-04-19T23:10:57.758Z","avatar_url":"https://github.com/ilyajob05.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# turboquant-space\n\n![License](https://img.shields.io/pypi/l/turboquant-space)\n![Build](https://img.shields.io/github/actions/workflow/status/ilyajob05/turboquant-space/publish.yml)\n![Python](https://img.shields.io/pypi/pyversions/turboquant-space)\n![PyPI](https://img.shields.io/pypi/v/turboquant-space)\n\n\n\nThis library was inspired by the article https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/. The library is optimized for efficient data allocation in memory for 3+1 and 7+1 bit quantization schemes.\n\nSIMD-accelerated 4/8-bit vector quantization for approximate nearest neighbor\nsearch, based on **TurboQuant** (ICLR 2026). Standalone C++17 library with\nPython bindings.\n```bash\npip install turboquant-space\n```\n\n```python\nfrom turboquant import TurboQuantSpace\nimport numpy as np\n\nspace = TurboQuantSpace(dim=128, bits_per_coord=8, num_threads=4)\nX = np.random.randn(100_000, 128).astype(np.float32)\nq = np.random.randn(128).astype(np.float32)\n\ncodes = space.encode_batch(X)              # (100_000, code_size) uint8\ndists = space.distance_1_to_n(q, codes)    # (100_000,) float32\n```\n\nThat is the whole mental model: `encode` once, then `distance_*` against the\ncodes. No index to build, no state to persist beyond `codes`.\n\n### Torch Example\n```python\nfrom turboquant import TurboQuantSpace\nimport numpy as np\ntorch.manual_seed(42)\nn, dim = 10000, 768\nx = torch.randn(n, dim)\nx = torch.nn.functional.normalize(x, dim=1)\n\ntq = TurboQuantSpace(dim=dim, bits_per_coord=8)\n\nx_np = x.detach().cpu().numpy().astype(np.float32, copy=False)\nx_np = np.ascontiguousarray(x_np)\ncodes = tq.encode_batch(x_np)\n\nraw_bytes = x.numel() * 4\ncomp_bytes = codes.nbytes\nprint(f\"raw   : {raw_bytes / 1e6:.2f} MB\")\nprint(f\"codes : {comp_bytes / 1e6:.2f} MB  ({raw_bytes / comp_bytes:.1f}x)\")\nprint(f\"code_size_bytes = {tq.code_size_bytes()}\")\n\nq = x_np[0]\nd = tq.distance_1_to_n(q, codes)\nprint(\"top-5 nearest:\", np.argsort(d)[:5])\n\n```\n---\n\n## What it does, briefly\n\nTurboQuant encodes each float32 vector into a compact code of\n`bits_per_coord` bits per coordinate using a randomized Walsh–Hadamard\nrotation followed by Lloyd–Max scalar quantization, plus one QJL sign bit per\ncoordinate for an unbiased residual correction. Distances between a raw query\nand a packed code (asymmetric) or between two packed codes (symmetric) are\ncomputed directly on the quantized representation with hand-written NEON /\nSSE / AVX kernels.\n\n**Concretely you get:**\n\n| bits_per_coord | layout               | bytes / vec (dim=128) | compression vs fp32 |\n|----------------|----------------------|-----------------------|---------------------|\n| 4              | nibble-packed        | 76                    | 6.7×                |\n| 8              | one byte per coord   | 140                   | 3.7×                |\n\n(Plus 12 bytes of metadata — norm, γ, σ — per code.)\n\n**What it is not:** not a graph index, not an IVF, not a drop-in replacement\nfor FAISS. It is the *distance* layer. Plug it into your own index, or use\n`distance_1_to_n` as brute-force search on batches up to a few million.\n\n---\n\n## Install\n\n```bash\npip install turboquant-space\n```\n\nPrebuilt wheels are published for CPython 3.11–3.13 on Linux (x86\\_64,\naarch64), macOS (x86\\_64, arm64), and Windows (AMD64). They target a\nconservative CPU baseline — **x86-64-v3** (AVX2 + FMA + BMI2) on x64 and\n**armv8-a** (NEON) on arm64 — so a single wheel runs on anything produced in\nthe last ~8 years. A C++ compiler is **not** required for this path.\n\n### Build from source for maximum performance\n\nThe prebuilt wheels trade a few percent for portability. If you have a C++\ncompiler and want the binary tuned to *your* CPU (AVX-512 on Zen4 / Ice Lake,\nSVE on Graviton, etc.), force pip to skip the wheel and compile from sdist:\n\n```bash\npip install turboquant-space --no-binary turboquant-space\n```\n\nThis invokes CMake with `-march=native`, so every available instruction set\non the build machine is enabled. Requires CMake ≥ 3.18 and a C++17 compiler;\non macOS also `brew install libomp` for multi-threaded batch ops.\n\n### From a git checkout\n\n```bash\ngit clone https://github.com/ilyajob05/turboquant-space\ncd turboquant-space\nuv sync                       # or: pip install -e .\n```\n\nSame story: local builds use `-march=native` by default. Pass\n`-DTURBOQUANT_PORTABLE=ON` to CMake if you need a portable baseline instead.\n\n---\n\n## API\n\nEverything lives on a single class, `TurboQuantSpace`. All numpy arrays are\n`float32`, C-contiguous; all codes are `uint8`.\n\n```python\nTurboQuantSpace(\n    dim: int,                    # input dimensionality (any positive integer)\n    bits_per_coord: int = 4,     # 2..9 — nibble-packed for bits\u003c=4\n    rot_seed: int = 42,          # Hadamard rotation seed\n    qjl_seed: int = 137,         # QJL sign seed\n    num_threads: int = 0,        # 0 = use OMP_NUM_THREADS / all cores\n)\n```\n\n| method                                    | shape in                     | shape out                   |\n|-------------------------------------------|------------------------------|-----------------------------|\n| `encode(x)`                               | `(dim,)`                     | `(code_size_bytes,)` uint8  |\n| `encode_batch(X)`                         | `(n, dim)`                   | `(n, code_size_bytes)` uint8|\n| `encode_into(x, out)` / `encode_batch_into` | in-place into caller buffer | —                           |\n| `distance(query, code)`                   | `(dim,)`, `(code_size,)`     | `float`                     |\n| `distance_symmetric(code_a, code_b)`      | `(code_size,)` ×2            | `float`                     |\n| `distance_1_to_n(q, codes)`               | `(dim,)`, `(n, code_size)`   | `(n,)` float32              |\n| `distance_m_to_n(Q, codes)`               | `(m, dim)`, `(n, code_size)` | `(m, n)` float32            |\n| `distance_m_to_n_symmetric(codes_a, b)`   | `(m, cs)`, `(n, cs)`         | `(m, n)` float32            |\n\nAccessors: `dim()`, `padded_dim()`, `padded()`, `num_threads()`,\n`code_size_bytes()`, `bits_per_coord()`.\n\n### Dimensionality padding\n\nInternally every operation works in a power-of-two dimension (a requirement\nof the Walsh–Hadamard transform). If you pass `dim=100`, the space rounds up\nto 128 and zero-pads on the fly; a one-time warning is printed, and\n`space.padded_dim()` reports the internal size. Correctness is preserved —\nzero-padding in ℝᵈ does not change L2 distances — but encode/query cost is\ndetermined by `padded_dim()`, not `dim()`.\n\n### Threading\n\nAll batch methods (`encode_batch`, `distance_1_to_n`, `distance_m_to_n`,\n`distance_m_to_n_symmetric`) parallelize the outer loop with OpenMP,\n`schedule(static)`, so each thread owns a contiguous range of codes —\nprefetcher-friendly, no false sharing on output rows. Set `num_threads` in\nthe constructor, or leave it `0` to respect `OMP_NUM_THREADS`. For small\nbatches (≤ 64) execution stays single-threaded to avoid fork/join overhead.\n\nObserved scaling on Apple M-series, dim=512, 50k codes × 128 queries, bits=8:\n**1→2 = 1.94×, 1→4 = 3.49×, 1→8 = 4.50×** — see `python/benchmarks/` for the\nfull reproduction.\n\n---\n\n## Benchmarks\n\n```bash\nuv run python python/benchmarks/run_benchmark.py\n```\n\nOn first run this downloads SIFT1M (~170 MB) to\n`~/.cache/turboquant/sift/`; subsequent runs reuse the cache. The script\nsweeps `bits_per_coord × num_threads` on SIFT1M (with recall@{1,10,100}\nagainst the shipped ground truth) and on synthetic Gaussian data across\nseveral dimensions, writes\n`python/benchmarks/results/results_\u003ctimestamp\u003e.csv`, and produces seaborn\nplots under `results/plots/`:\n\n- `threading_scaling.png` — M-to-N throughput vs `num_threads`, faceted by dim.\n- `sift_recall.png` — recall@{1,10,100} vs bits on SIFT1M.\n- `synthetic_throughput.png` — encode / 1-to-N / M-to-N vs dim.\n\nUseful flags: `--skip-sift`, `--skip-synthetic`, `--threads 1,4,8`,\n`--bits 4,8`, `--no-show` (for headless CI).\n\nMeasured numbers from real hardware (Apple M3 and more as they come in)\nlive in [`docs/benchmarks.md`](docs/benchmarks.md). Headline from M3,\n`dim=128, batch=10000, bits=8`: **~88M symmetric M-to-N ops/sec** and\n**~2.8M encode/sec** on a single laptop.\n\n---\n\n## Layout and build\n\n```\ninclude/turboquant/\n  turbo_quant.h          # Hadamard, Lloyd–Max, TurboQuantCode\n  space_turbo_quant.h    # TurboQuantSpace + SIMD distance kernels\npython/turboquant/\n  bindings.cpp           # pybind11 bindings\n  __init__.py\npython/tests/            # pytest suite\npython/benchmarks/       # run_benchmark.py (CSV + seaborn plots)\nCMakeLists.txt           # scikit-build-core entry point\npyproject.toml\n```\n\nThe library is header-only in spirit — all algorithmic code is in\n`include/turboquant/`. Only the Python module (`bindings.cpp`) is compiled as\na shared object. A C++ consumer can depend on the headers alone and call the\nsame API directly.\n\nBuild flags worth knowing:\n\n- `-DTURBOQUANT_HAVE_OPENMP` — set by CMake when OpenMP is detected; enables\n  all `#pragma omp` blocks. Absent → sequential fallback, same API.\n- Release build uses `-O3 -ffast-math -fno-finite-math-only`. The\n  `fno-finite-math-only` is intentional: it keeps `inf`/`nan` handling sane\n  while preserving vectorization.\n\n### Recall Benchmark\n![recall](python/benchmarks/results/plots/recall.png)\n\n### Tests\n\n```bash\nuv run pytest python/tests/ -v\n```\n\nCovers asymmetric/symmetric distances across `bits ∈ {4, 8}` and\n`dim ∈ {32..4096}`, batch variants, zero-copy torch interop, and padding\ncorrectness.\n\n---\n\n## Roadmap\n\nThe immediate priorities, in order:\n\n1. **Publish wheels to PyPI** (cibuildwheel workflow in place; awaiting first tagged release)\n\nContributions welcome. The codebase is small (two headers, one bindings\nfile, ~2k lines) and deliberately kept that way — if a change makes it\nharder to read, that is a reason to push back on it.\n\n\n## Citation\n\nIf you use this library in academic work, please cite the original TurboQuant\npaper (ICLR 2026) in addition to this repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Filyajob05%2Fturboquant-space","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Filyajob05%2Fturboquant-space","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Filyajob05%2Fturboquant-space/lists"}