{"id":49283278,"url":"https://github.com/better-with-models/tinyquant","last_synced_at":"2026-04-28T01:05:43.267Z","repository":{"id":350584848,"uuid":"1205162992","full_name":"better-with-models/TinyQuant","owner":"better-with-models","description":"TinyQuant is a CPU-only vector quantization codec that compresses high-dimensional embedding vectors to low-bit representations while preserving cosine similarity rankings.","archived":false,"fork":false,"pushed_at":"2026-04-18T17:15:24.000Z","size":3624,"stargazers_count":5,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-18T17:37:49.726Z","etag":null,"topics":["embedding-models","embedding-vectors","embeddings","embeddings-similarity","pgvector"],"latest_commit_sha":null,"homepage":"https://www.alisonaquinas.com/projects/tinyquant","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/better-with-models.png","metadata":{"files":{"readme":".github/README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-08T17:39:59.000Z","updated_at":"2026-04-13T12:07:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/better-with-models/TinyQuant","commit_stats":null,"previous_names":["better-with-models/tinyquant"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/better-with-models/TinyQuant","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/better-with-models%2FTinyQuant","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/better-with-models%2FTinyQuant/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/better-with-models%2FTinyQuant/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/better-with-models%2FTinyQuant/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/better-with-models","download_url":"https://codeload.github.com/better-with-models/TinyQuant/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/better-with-models%2FTinyQuant/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32274982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T18:29:39.964Z","status":"ssl_error","status_checked_at":"2026-04-25T18:29:32.149Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embedding-models","embedding-vectors","embeddings","embeddings-similarity","pgvector"],"created_at":"2026-04-25T20:01:07.668Z","updated_at":"2026-04-25T20:01:08.583Z","avatar_url":"https://github.com/better-with-models.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cpicture\u003e\n  \u003csource\n    media=\"(prefers-color-scheme: dark)\"\n    srcset=\"../docs/assets/tinyquant-logo-dark-transparent.png\"\u003e\n  \u003cimg\n    src=\"../docs/assets/tinyquant-logo-light-transparent.png\"\n    alt=\"TinyQuant logo: a database with a downward arrow and theta-r annotation\"\n    width=\"420\"\u003e\n\u003c/picture\u003e\n\n# TinyQuant\n\n*Rust-native vector quantization codec for embedding compression — CPU SIMD, optional GPU acceleration, and Python/TypeScript bindings.*\n\n[![PyPI](https://img.shields.io/pypi/v/tinyquant-cpu.svg)](https://pypi.org/project/tinyquant-cpu/)\n[![CI](https://github.com/better-with-models/TinyQuant/actions/workflows/ci.yml/badge.svg)](https://github.com/better-with-models/TinyQuant/actions/workflows/ci.yml)\n[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Coverage](https://img.shields.io/badge/coverage-90.95%25-brightgreen.svg)](https://github.com/better-with-models/TinyQuant/actions/workflows/ci.yml)\n\n\u003c/div\u003e\n\n\u003e [!NOTE]\n\u003e **TinyQuant** is a Rust-native vector quantization codec that compresses\n\u003e high-dimensional embedding vectors to low-bit representations while\n\u003e preserving cosine similarity rankings. It combines random orthogonal\n\u003e preconditioning with two-stage scalar quantization and optional FP16\n\u003e residual correction to hit **8× compression at 4-bit** with Pearson\n\u003e ρ ≈ 0.998 and **95% top-5 recall** on real OpenAI embeddings.\n\u003e\n\u003e - **What it is:** a Rust library (with Python and TypeScript bindings) that\n\u003e   squeezes embedding vectors into 4-bit (or 2-bit) representations without\n\u003e   losing retrieval quality, with optional wgpu GPU acceleration for batch\n\u003e   workloads above 512 vectors.\n\u003e - **Who it's for:** teams running cosine-similarity search on embeddings and\n\u003e   paying for RAM or disk by the gigabyte.\n\u003e - **Headline number:** 8× compression at 95% top-5 recall on 1536-dim\n\u003e   OpenAI embeddings. 1 M vectors go from **5.7 GB to 732 MB**.\n\n---\n\n## At a glance\n\nOn a benchmark of **335 real embeddings** from OpenAI's\n`text-embedding-3-small` (1536 dimensions), TinyQuant 4-bit achieves\n**8× compression** with Pearson ρ = 0.998 and 95% top-5 recall —\nreducing a 6 KB embedding to 768 bytes while preserving the similarity\nrankings that drive retrieval quality.\n\n| Method                      |  Bytes/vec | Compression | Pearson ρ | Top-5 Recall |\n| :-------------------------- | ---------: | ----------: | --------: | -----------: |\n| FP32 (baseline)             |      6,144 |          1× |    1.0000 |         100% |\n| FP16                        |      3,072 |          2× |    1.0000 |         100% |\n| uint8 scalar                |      1,544 |          4× |    1.0000 |         100% |\n| **TinyQuant 4-bit**         |    **768** |      **8×** |**0.9981** |      **95%** |\n| **TinyQuant 2-bit**         |    **384** |     **16×** |**0.9643** |      **85%** |\n| TinyQuant 4-bit + residual  |      3,840 |        1.6× |    1.0000 |         100% |\n\nFor a corpus of 1 million 1536-dim vectors, TinyQuant 4-bit reduces\nstorage from **5.7 GB to 732 MB** with negligible loss in retrieval\nquality.\n\n![Compression vs. Fidelity](../experiments/quantization-benchmark/results/plots/compression_vs_fidelity.png)\n\nSee the [full benchmark report](../experiments/quantization-benchmark/REPORT.md)\nfor methodology, all 9 methods compared, throughput measurements, and\npublication-quality plots.\n\n---\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eContents\u003c/b\u003e\u003c/summary\u003e\n\n- [Installation](#installation)\n- [Language bindings](#language-bindings)\n- [Quickstart](#quickstart)\n- [How it works](#how-it-works)\n- [Recipes](#recipes)\n- [Key properties](#key-properties)\n- [Research lineage](#research-lineage)\n- [Repository layout](#repository-layout)\n- [Development](#development)\n- [Reproducing the benchmark](#reproducing-the-benchmark)\n- [Contributing](#contributing)\n- [License](#license)\n- [Related documentation](#related-documentation)\n\n\u003c/details\u003e\n\n---\n\n## Installation\n\nTinyQuant is published on PyPI as `tinyquant-cpu` and imports as\n`tinyquant_cpu`. The current release is a Rust-backed fat wheel — no\npure-Python fallback.\n\n| I want to...                                   | Install command                                    |\n| :--------------------------------------------- | :------------------------------------------------- |\n| Python (Rust-backed, current)                  | `pip install tinyquant-cpu`                        |\n| Python + PostgreSQL/pgvector support            | `pip install \"tinyquant-cpu[pgvector]\"`            |\n| Rust native crate                               | `cargo add tinyquant-core`                         |\n| TypeScript / Node / Bun                         | `npm install @tinyquant/core`                      |\n| Work on this repository                         | see the [Development](#development) section below  |\n\n\u003e [!TIP]\n\u003e The `[pgvector]` extra pulls in `psycopg[binary]\u003e=3.1` for talking to a\n\u003e live PostgreSQL database. Python **3.12+** is required; the Rust workspace\n\u003e MSRV is **1.81**, with the optional `tinyquant-gpu-wgpu` crate carved out\n\u003e at **1.87** in its own CI lane.\n\n---\n\n## Language bindings\n\nTinyQuant ships the same codec / corpus / backend surface across three\nlanguages, versioned in lockstep via `rust/Cargo.toml`\n`workspace.package.version`. All bindings delegate math to the shared\n`tinyquant-core` Rust crate — there is no per-language reimplementation.\n\n| Language    | Package                                                 | Install                          | Since |\n| :---------- | :------------------------------------------------------ | :------------------------------- | :---- |\n| Python      | [`tinyquant-cpu`](https://pypi.org/project/tinyquant-cpu/) ([![PyPI](https://img.shields.io/pypi/v/tinyquant-cpu.svg)](https://pypi.org/project/tinyquant-cpu/)) | `pip install tinyquant-cpu`      | Phase 24 |\n| Rust        | [`tinyquant-core`](https://crates.io/crates/tinyquant-core) ([![crates.io](https://img.shields.io/crates/v/tinyquant-core.svg)](https://crates.io/crates/tinyquant-core)) | `cargo add tinyquant-core`       | Phase 22 |\n| TypeScript  | [`@tinyquant/core`](https://www.npmjs.com/package/@tinyquant/core) ([![npm](https://img.shields.io/npm/v/@tinyquant/core.svg)](https://www.npmjs.com/package/@tinyquant/core)) | `npm install @tinyquant/core`    | Phase 25 |\n\nAll three packages guarantee byte-identical output on `config_hash`,\n`Codebook::to_bytes`, and `CompressedVector::to_bytes`. See\n[`COMPATIBILITY.md`](../COMPATIBILITY.md) for the supported cross-package\nversion pairs.\n\n---\n\n## Quickstart\n\n```python\nimport numpy as np\nfrom tinyquant_cpu.codec import Codec, CodecConfig\nfrom tinyquant_cpu.corpus import Corpus, CompressionPolicy\nfrom tinyquant_cpu.backend import BruteForceBackend\n\n# 1. Configure the codec: 4-bit quantization for 1536-dim vectors\nconfig = CodecConfig(bit_width=4, dimension=1536, seed=42)\ncodec = Codec()\n\n# 2. Train a codebook from representative vectors\ntraining_vectors = np.random.default_rng(0).standard_normal((1000, 1536)).astype(np.float32)\ncodebook = codec.build_codebook(training_vectors, config)\n\n# 3. Create a corpus that compresses on insert\ncorpus = Corpus(\"my-vectors\", config, codebook, CompressionPolicy.COMPRESS)\nfor i, vec in enumerate(training_vectors):\n    corpus.insert(f\"vec-{i}\", vec)\n\n# 4. Decompress and search\nbackend = BruteForceBackend()\nbackend.ingest(corpus.decompress_all())\nresults = backend.search(training_vectors[42], top_k=5)\nfor r in results:\n    print(f\"{r.vector_id}: {r.score:.4f}\")\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eWhat just happened?\u003c/b\u003e\u003c/summary\u003e\n\n1. **Configure** — `CodecConfig(bit_width=4, dimension=1536, seed=42)`\n   sets the bit width (`4` → 8× compression), the vector dimension, and\n   the RNG seed that controls the random rotation matrix. The seed makes\n   the codec **deterministic** — same inputs always produce byte-identical\n   output across all language bindings.\n2. **Train** — `codec.build_codebook(training_vectors, config)` fits a\n   small codebook on a representative sample of your data.\n3. **Insert** — `Corpus(..., CompressionPolicy.COMPRESS)` creates a domain\n   aggregate that compresses every vector on insert and tracks vector IDs.\n4. **Decompress** — `corpus.decompress_all()` produces `(vector_id,\n   fp32_vector)` pairs. The Rust core runs these in parallel via Rayon.\n5. **Search** — `BruteForceBackend` performs exact cosine search and returns\n   `SearchResult` objects with IDs and scores. Swap for `PgvectorAdapter`\n   in production, or use the GPU path for large corpora.\n\n\u003c/details\u003e\n\n---\n\n## How it works\n\n**The problem.** Naive scalar quantization crushes real embedding data because\ncoordinate distributions are skewed: a handful of dimensions carry most of\nthe signal and get mapped to the same bucket as noise.\n\n**The trick.** Pre-multiplying each vector by a **random orthogonal matrix**\n(derived via QR decomposition of a Gaussian matrix) uniformizes the coordinate\ndistribution without changing pairwise distances. After rotation, a single\nshared scalar quantizer works well across **all** dimensions. This is the core\ninsight from [TurboQuant][] and [PolarQuant][].\n\n**Two-stage refinement.** An optional **FP16 residual** on top of the 4-bit\ncoarse codebook gives you a separate point on the rate-distortion curve:\n8× compression and ρ ≈ 0.998 without the residual; 1.6× compression and\nρ = 1.000 with it enabled — useful for reranking stages.\n\n**Rust core with CPU and GPU paths.** The codec runs through\n`tinyquant-core`, which dispatches SIMD kernels at runtime (AVX2+FMA on\nx86_64, NEON on aarch64) and parallelizes batch compression with Rayon.\nFor workloads exceeding the **512-vector threshold**, the optional\n`tinyquant-gpu-wgpu` crate offloads rotate/quantize/dequantize/residual\nand corpus cosine search to WGSL compute shaders via wgpu, with lazy\npipeline caching to avoid per-call recompilation.\n\n**Backend-agnostic.** The codec produces `CompressedVector` bytes; search\nlives in a separate `SearchBackend` layer (`BruteForceBackend` for in-memory\nexact search, `PgvectorAdapter` for PostgreSQL + pgvector, `WgpuBackend` for\nGPU-accelerated corpus search), so you can plug TinyQuant into any retrieval\nstore without coupling storage to search.\n\n[TurboQuant]: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/\n[PolarQuant]: https://arxiv.org/abs/2503.20024\n\n---\n\n## Recipes\n\nPick the config that matches your rate-distortion target:\n\n| Config                                            | Bytes/vec | Compression |     ρ | Top-5 | When to use                |\n| :------------------------------------------------ | --------: | ----------: | ----: | ----: | :------------------------- |\n| `CodecConfig(bit_width=4)`                        |       768 |          8× | 0.998 |   95% | **Default** balance        |\n| `CodecConfig(bit_width=2)`                        |       384 |         16× | 0.964 |   85% | Aggressive, needs rerank   |\n| `CodecConfig(bit_width=4, residual_enabled=True)` |     3,840 |        1.6× | 1.000 |  100% | Reranking / exact-match    |\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eSingle-vector compression\u003c/b\u003e\u003c/summary\u003e\n\n```python\nimport numpy as np\nfrom tinyquant_cpu.codec import Codec, CodecConfig\n\nconfig = CodecConfig(bit_width=4, dimension=768, seed=42)\ncodec = Codec()\n\ntraining_data = np.random.default_rng(0).standard_normal((1000, 768)).astype(np.float32)\ncodebook = codec.build_codebook(training_data, config)\n\nvector = training_data[0]\ncompressed = codec.compress(vector, config, codebook)\nprint(f\"Original:   {vector.nbytes} bytes\")\nprint(f\"Compressed: {compressed.size_bytes} bytes\")\nprint(f\"Ratio:      {vector.nbytes / compressed.size_bytes:.1f}x\")\n\nrestored = codec.decompress(compressed, config, codebook)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eBatch compression (Rayon-parallel)\u003c/b\u003e\u003c/summary\u003e\n\n```python\n# Parallelized via Rayon in the Rust core — byte-identical to serial output\nvectors = np.random.default_rng(0).standard_normal((10_000, 768)).astype(np.float32)\ncompressed_batch = codec.compress_batch(vectors, config, codebook)\nrestored_batch = codec.decompress_batch(compressed_batch, config, codebook)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eTuning the rate–distortion tradeoff\u003c/b\u003e\u003c/summary\u003e\n\n```python\n# Maximum compression: 16x at 2-bit\nconfig_2bit = CodecConfig(bit_width=2, dimension=768, seed=42, residual_enabled=False)\n\n# Practical sweet spot: 8x at 4-bit (rho \u003e= 0.998)\nconfig_4bit = CodecConfig(bit_width=4, dimension=768, seed=42, residual_enabled=False)\n\n# Near-perfect fidelity: 4-bit + FP16 residual correction (1.6x, rho = 1.000)\nconfig_4bit_res = CodecConfig(bit_width=4, dimension=768, seed=42, residual_enabled=True)\n```\n\n\u003e [!WARNING]\n\u003e 2-bit compression drops top-5 recall to ~85%. Only use it when a\n\u003e reranking stage (FP16 residual, cross-encoder, exact search) sits\n\u003e downstream to recover the missing signal.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eCompression policies\u003c/b\u003e\u003c/summary\u003e\n\nA `Corpus` can store vectors in three modes:\n\n```python\nfrom tinyquant_cpu.corpus import Corpus, CompressionPolicy\n\ncorpus_compressed = Corpus(\"c\", config, codebook, CompressionPolicy.COMPRESS)\ncorpus_full       = Corpus(\"p\", config, codebook, CompressionPolicy.PASSTHROUGH)\ncorpus_fp16       = Corpus(\"h\", config, codebook, CompressionPolicy.FP16)\n```\n\nPolicies let one corpus mix hot data (PASSTHROUGH), cold data (COMPRESS),\nand middle-tier data (FP16) without rebuilding the codec.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eBinary serialization (TQCV format)\u003c/b\u003e\u003c/summary\u003e\n\n`CompressedVector` instances serialize to the TQCV versioned binary format\n(70-byte header + LSB-first packed indices + optional FP16 residual),\nsuitable for disk, network, or database storage. Mmap corpus files are\navailable via the Rust `tinyquant-io` crate for zero-copy access.\n\n```python\nfrom tinyquant_cpu.codec import CompressedVector\n\nraw_bytes = compressed.to_bytes()\nrestored  = CompressedVector.from_bytes(raw_bytes)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003ePostgreSQL + pgvector backend\u003c/b\u003e\u003c/summary\u003e\n\n```python\nimport psycopg\nfrom tinyquant_cpu.backend.adapters.pgvector import PgvectorAdapter\n\nadapter = PgvectorAdapter(\n    connection_factory=lambda: psycopg.connect(\"postgresql://user:pass@localhost/mydb\"),\n    table_name=\"embeddings\",\n)\nadapter.ingest(corpus.decompress_all())\nresults = adapter.search(query_vector, top_k=10)\n```\n\n\u003e [!IMPORTANT]\n\u003e Requires PostgreSQL with the `pgvector` extension installed. CI runs these\n\u003e tests against a live `pgvector/pgvector:pg17` container via testcontainers.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eGPU acceleration (Rust only — wgpu)\u003c/b\u003e\u003c/summary\u003e\n\nThe `tinyquant-gpu-wgpu` crate provides a `WgpuBackend` that offloads\nbatch compress/decompress and corpus cosine search to WGSL compute shaders.\nIt is workspace-internal (`publish = false`) and selected automatically\nwhen a batch exceeds `GPU_BATCH_THRESHOLD` (512 vectors).\n\n```rust\nuse tinyquant_gpu_wgpu::{WgpuBackend, BackendPreference};\n\n// Default adapter (auto-select highest-performance GPU)\nlet backend = WgpuBackend::new().await?;\n\n// Or select a specific backend:\nlet backend = WgpuBackend::new_with_preference(BackendPreference::Vulkan).await?;\n\n// Warm up pipeline cache explicitly (optional — lazy otherwise)\nbackend.load_pipelines().await;\n\n// GPU corpus search\nlet state = backend.prepare_corpus_for_device(\u0026corpus_vecs).await?;\nlet results = backend.cosine_topk(\u0026state, \u0026query_vec, top_k).await?;\n```\n\nAvailable `BackendPreference` variants: `Auto`, `Vulkan`, `Metal`, `Dx12`,\n`HighPerformance`, `LowPower`, `Software`.\n\n\u003c/details\u003e\n\n---\n\n## Key properties\n\n- **8× compression** at 4-bit without residuals (ρ = 0.998, 95% recall)\n- **16× compression** at 2-bit (ρ = 0.964, 85% recall)\n- **Perfect fidelity** with optional FP16 residual correction (ρ = 1.000)\n- **Deterministic** — same inputs produce byte-identical output across all language bindings and CPU architectures\n- **Rust-native core** — `tinyquant-core`; CPU SIMD dispatch (AVX2+FMA / NEON) via `is_x86_feature_detected!` / ARMv8 base-ISA guarantee; Rayon parallel batch with determinism contract\n- **Optional GPU acceleration** — `tinyquant-gpu-wgpu`; WGSL rotate/quantize/dequantize/residual and cosine-topk kernels; lazy `CachedPipelines`; `BackendPreference` adapter selection; auto-routes at ≥ 512 vectors\n- **Multi-language** — Python fat wheel (`tinyquant-cpu`), TypeScript/Node (`@tinyquant/core`), Rust native (`tinyquant-core`), C ABI (`tinyquant-sys`)\n- **Pluggable backends** — `BruteForceBackend` for in-process exact search; `PgvectorAdapter` for PostgreSQL + pgvector; `WgpuBackend` for GPU corpus search\n- **Three compression policies** — COMPRESS, PASSTHROUGH, FP16, mixable within a corpus\n- **TQCV serialization** — versioned 70-byte header + LSB-first bit-pack + optional FP16 residual; mmap corpus files via `tinyquant-io`\n- **Calibration gates** — Pearson ρ and mean recall-at-k measured against OpenAI calibration fixtures; Criterion benchmarks with 10% regression budget\n- **Fully typed** — `py.typed` marker, `mypy --strict` clean, TypeScript strict mode\n- **Apache-2.0 licensed**\n\n---\n\n## Research lineage\n\nTinyQuant adapts ideas from published research into a clean-room\nimplementation:\n\n| Source              | Year | Key contribution                                                  |\n| :------------------ | :--: | :---------------------------------------------------------------- |\n| [**TurboQuant**][1] | 2025 | Random rotation + scalar quantization, no per-block norms         |\n| [**PolarQuant**][2] | 2025 | QR-derived orthogonal preconditioning for coordinate uniformity   |\n| [**QJL**][3]        | 2024 | Inner-product preservation bounds under aggressive quantization   |\n\n[1]: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/\n[2]: https://arxiv.org/abs/2503.20024\n[3]: https://arxiv.org/abs/2406.03482\n\n---\n\n## Repository layout\n\n| Path                                          | Purpose                                                                     |\n| :-------------------------------------------- | :-------------------------------------------------------------------------- |\n| `rust/crates/tinyquant-core/`                 | Codec, corpus, backend trait, SIMD dispatch, Rayon parallel batch           |\n| `rust/crates/tinyquant-io/`                   | TQCV serialization format and mmap corpus files                             |\n| `rust/crates/tinyquant-gpu-wgpu/`             | Optional wgpu/WGSL GPU accelerator (`publish = false`, workspace-internal)  |\n| `rust/crates/tinyquant-py/`                   | pyo3 Python extension — the engine behind `tinyquant-cpu`                   |\n| `rust/crates/tinyquant-sys/`                  | C ABI via cbindgen                                                          |\n| `rust/crates/tinyquant-cli/`                  | Standalone CLI binary                                                       |\n| `rust/crates/tinyquant-js/`                   | napi-rs TypeScript/Node bindings (`@tinyquant/core`)                        |\n| `rust/crates/tinyquant-bruteforce/`           | `BruteForceBackend` reference implementation                                |\n| `rust/crates/tinyquant-pgvector/`             | PostgreSQL + pgvector ACL adapter                                           |\n| `rust/crates/tinyquant-bench/`                | Criterion benchmarks + calibration quality gates                            |\n| `tests/reference/tinyquant_py_reference/`     | Pure-Python frozen oracle — differential test reference (not shipped)       |\n| `tests/parity/`                               | Cross-implementation parity suite (`pytest -m parity`)                     |\n| `tests/`                                      | Python unit, integration, E2E, architecture, and calibration suites         |\n| `experiments/`                                | Benchmarks and empirical evaluations                                        |\n| `docs/`                                       | Obsidian wiki: design docs, research, SDLC plans, CI/CD specs               |\n\n---\n\n## Development\n\n```bash\ngit clone https://github.com/better-with-models/TinyQuant.git\ncd TinyQuant\n\n# Python dev dependencies\npip install pytest pytest-cov hypothesis numpy ruff mypy build\n\n# Lint and format\nruff check . \u0026\u0026 ruff format --check .\n\n# Strict type check\nmypy --strict .\n\n# Run the full Python suite\npytest --cov=tinyquant_py_reference\n\n# Cross-impl parity (Python ↔ Rust)\npytest -m parity -v\n\n# Rust: lint and test\ncd rust\ncargo clippy --workspace -- -D warnings\ncargo test --workspace\n```\n\nThe Python test suite includes **289 tests** covering unit, integration,\nend-to-end, calibration, parity (cross-impl Python ↔ Rust), and\narchitecture-enforcement scenarios. Coverage is held above **90%** by CI\n(**94%** for the codec subpackage). Live PostgreSQL + pgvector tests run\nagainst a Docker container in CI via `testcontainers`.\n\n\u003e [!TIP]\n\u003e CI enforces three strict gates: `ruff check` / `ruff format --check`,\n\u003e `mypy --strict`, and `markdownlint-cli2` for all markdown outside `docs/`.\n\u003e The `docs/` vault uses Obsidian-flavored markdown under its own rules —\n\u003e see [`AGENTS.md`](../AGENTS.md) for the policy.\n\n---\n\n## Reproducing the benchmark\n\n```bash\nexport OPENAI_API_KEY=\"your-key-here\"\npython experiments/quantization-benchmark/generate_embeddings.py\npython experiments/quantization-benchmark/run_benchmark.py\npython experiments/quantization-benchmark/generate_plots.py\n```\n\nThis fetches 335 embeddings via the OpenAI API, benchmarks 9 quantization\nmethods, and produces plots and JSON results in\n`experiments/quantization-benchmark/results/`.\n\n---\n\n## Contributing\n\nContributions are welcome. The short version:\n\n1. **Issues and design discussions** — open a GitHub issue before starting\n   non-trivial work so we can agree on scope.\n2. **Follow the repo SDLC** — architecture decisions, coding standards, and\n   pre-commit expectations live in [`AGENTS.md`](../AGENTS.md) and the\n   `docs/design/` vault. Read [`CLAUDE.md`](../CLAUDE.md) if you're driving\n   Claude Code or another LLM agent against this repo.\n3. **Run the full gate locally** before pushing:\n   `ruff check . \u0026\u0026 ruff format --check . \u0026\u0026 mypy --strict . \u0026\u0026 pytest --cov=tinyquant_cpu`\n4. **Keep prose aligned** — edits to the project tagline, elevator pitch, or\n   headline benchmark numbers must land in `README.md`, `.github/README.md`,\n   `AGENTS.md`, and `CLAUDE.md` in the same commit.\n\n---\n\n## License\n\nApache-2.0. See [LICENSE](../LICENSE).\n\n---\n\n## Related documentation\n\n- [Benchmark Report](../experiments/quantization-benchmark/REPORT.md) —\n  full empirical evaluation in CS-paper format\n- [CHANGELOG](../CHANGELOG.md) — release notes\n- [Design: Storage Codec Architecture](../docs/design/storage-codec-architecture.md)\n- [Design: GPU Acceleration](../docs/design/rust/gpu-acceleration.md)\n- [Research: Vector Quantization Paper Synthesis](../docs/research/vector-quantization-paper-synthesis.md)\n- [QA: Validation Plan](../docs/qa/validation-plan/README.md)\n- [CI Plan](../docs/CI-plan/README.md) and [CD Plan](../docs/CD-plan/README.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbetter-with-models%2Ftinyquant","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbetter-with-models%2Ftinyquant","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbetter-with-models%2Ftinyquant/lists"}