{"id":50686388,"url":"https://github.com/back2matching/turboquant-vectors","last_synced_at":"2026-06-08T23:04:33.196Z","repository":{"id":346877648,"uuid":"1191994246","full_name":"back2matching/turboquant-vectors","owner":"back2matching","description":"Compress embeddings 6x instantly with TurboQuant. First pip package using Google's TurboQuant (ICLR 2026) for vector search. 71.9% recall vs FAISS PQ 13.3%.","archived":false,"fork":false,"pushed_at":"2026-03-26T16:26:37.000Z","size":325,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-09T05:24:57.702Z","etag":null,"topics":["compression","embeddings","faiss","machine-learning","numpy","quantization","rag","turboquant","vector-search"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/back2matching.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-25T19:38:48.000Z","updated_at":"2026-04-11T23:48:20.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/back2matching/turboquant-vectors","commit_stats":null,"previous_names":["back2matching/turboquant-vectors"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/back2matching/turboquant-vectors","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/back2matching%2Fturboquant-vectors","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/back2matching%2Fturboquant-vectors/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/back2matching%2Fturboquant-vectors/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/back2matching%2Fturboquant-vectors/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/back2matching","download_url":"https://codeload.github.com/back2matching/turboquant-vectors/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/back2matching%2Fturboquant-vectors/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34083858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","embeddings","faiss","machine-learning","numpy","quantization","rag","turboquant","vector-search"],"created_at":"2026-06-08T23:04:32.388Z","updated_at":"2026-06-08T23:04:33.191Z","avatar_url":"https://github.com/back2matching.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# turboquant-vectors\n\nCompress and protect embeddings with TurboQuant.\n\nTwo tools in one package:\n- **PrivateEncoder** -- rotate embeddings with a secret key. Search works identically. Inversion attacks fail.\n- **compress/search** -- 8x compression, no training needed, instant.\n\n```python\nfrom turboquant_vectors import PrivateEncoder\n\nencoder = PrivateEncoder.generate(dim=1536)\nrotated = encoder.rotate(embeddings)       # search works identically\nencoder.save_key(\"secret.tqkey\")           # treat like an SSH key\n```\n\n## Embedding Privacy\n\nVec2Text recovers 92% of original text from unprotected embeddings (32-token inputs, GTR-base encoder). ALGEN needs only 1,000 leaked pairs. OWASP lists this as LLM08 in their 2025 Top 10.\n\nPrivateEncoder applies a secret orthogonal rotation before you send embeddings to a third-party vector DB. The math:\n\n```\n\u003cQx, Qy\u003e = x^T Q^T Q y = x^T y = \u003cx, y\u003e\n```\n\nCosine similarity, L2 distance, inner product -- all preserved exactly (up to float32 precision, ~1e-6 error).\n\n### Quick start\n\n```python\nfrom turboquant_vectors import PrivateEncoder\nimport numpy as np\n\n# Generate a secret key (uses OS entropy)\nencoder = PrivateEncoder.generate(dim=1536)\nencoder.save_key(\"secret.tqkey\")\n\n# Rotate before uploading to Pinecone/Weaviate/Qdrant\nrotated = encoder.rotate(embeddings)\n# pinecone_index.upsert(vectors=rotated.tolist(), ids=ids)\n\n# Rotate query too (same key)\nrotated_query = encoder.rotate(query)\n# results = pinecone_index.query(vector=rotated_query.tolist(), top_k=10)\n\n# Later, load the same key\nencoder = PrivateEncoder.load_key(\"secret.tqkey\")\n```\n\n### What it protects against\n\n- **Vec2Text** (92% text recovery from embeddings) -- fails completely on rotated vectors\n- **ALGEN** (few-shot inversion with 1K pairs) -- fails without the rotation key\n- **ZSinvert / Zero2Text** (zero-shot inversion) -- fails on rotated embedding space\n- **Attribute classifiers** (age, sex, medical conditions from embeddings) -- drop to random chance\n\nOur demo proves it on real sentence-transformer embeddings across 5 sensitive categories (medical, financial, legal, personal, neutral): a classifier achieves 88.9% accuracy on originals but drops to 11.1% on rotated vectors (below 20% random chance). See `demos/inversion_demo.py`.\n\nWe also tested the Wasserstein-Procrustes unsupervised alignment attack (the strongest known attack that doesn't require matched pairs). It fails completely: cosine recovery of 0.004, identical to a random guess. See `benchmarks/adversarial_self_test.py`.\n\n### What it does NOT protect against\n\nBe honest about the threat model:\n\n- **Known-plaintext attack**: d original-rotated pairs (e.g., 1,536 for OpenAI embeddings) fully recovers the key via SVD. Don't let anyone see both the original AND rotated versions of the same content.\n- **Pairwise distances are visible**: The server can see which documents are similar to each other, cluster structure, and query patterns. It just can't read what any document says.\n- **Key compromise**: If the key file leaks, all rotated vectors are trivially recoverable.\n- **RAG output attacks**: Membership inference via LLM output is not mitigated.\n\n### What it is NOT\n\n- Not encryption in the cryptographic sense\n- Not differential privacy (no epsilon-delta guarantee)\n- Not a substitute for access control on the vector database\n\n**Threat model**: honest-but-curious vector DB provider who sees only rotated vectors and has no access to your original texts or the rotation key.\n\n### What the server CAN learn\n\nEven with rotation, the server can observe:\n- Cluster structure (how many topics exist)\n- Document similarity graph (which docs are related)\n- Query patterns (which clusters you search most)\n- Duplicate/near-duplicate documents\n- Temporal patterns (when documents are added)\n\nThe server CANNOT determine what any document says, infer PII, or run published inversion attacks.\n\n### Comparison with other approaches\n\n| Property | Rotation (ours) | Differential Privacy | Homomorphic Encryption | IronCore Cloaked AI |\n|----------|----------------|---------------------|----------------------|-------------------|\n| Search quality | Identical (lossless) | 5-30% recall loss | Identical | ~5% recall loss |\n| Latency overhead | \u003c0.1ms per vector | Negligible | 1000-10000x | SDK overhead |\n| Deployment | One numpy matmul | Drop-in | Custom server | SDK + license |\n| License | Apache 2.0 | N/A | N/A | AGPL / $599+/mo |\n| Known-plaintext resistant | No (d pairs breaks it) | Yes | Yes | Partially |\n\n### Key management\n\nTreat `.tqkey` files like SSH private keys:\n- Don't commit to git (add `*.tqkey` to .gitignore)\n- Back up securely -- if lost, you can't unrotate (search still works)\n- Use `from_seed()` with a 128-bit seed to share keys without large files\n- Use `rekey_vectors()` to rotate to a new key without exposing originals\n\n### Benchmarks\n\n| Dimension | Single vector | Batch 10K | Key generation | Key file |\n|-----------|--------------|-----------|---------------|---------|\n| 384 | 0.03 ms | 8.7 ms | 31 ms | 0.6 MB |\n| 768 | 0.06 ms | 25 ms | 141 ms | 2.4 MB |\n| 1536 | 0.11 ms | 88 ms | 465 ms | 9.4 MB |\n\n### Integration examples\n\nWorks with any vector DB that accepts float arrays:\n\n```python\n# Pinecone\nrotated = encoder.rotate(embeddings)\nindex.upsert(vectors=[(id, vec.tolist(), meta) for id, vec, meta in zip(ids, rotated, metadata)])\n\n# ChromaDB\ncollection.add(embeddings=encoder.rotate(embeddings).tolist(), ids=ids)\n\n# LangChain (wrap any embedding model)\nclass PrivateEmbeddings(Embeddings):\n    def __init__(self, base, encoder):\n        self.base, self.encoder = base, encoder\n    def embed_documents(self, texts):\n        return self.encoder.rotate(np.array(self.base.embed_documents(texts))).tolist()\n    def embed_query(self, text):\n        return self.encoder.rotate(np.array(self.base.embed_query(text))).tolist()\n\n# sentence-transformers\nembeddings = model.encode(texts)\nrotated = encoder.rotate(embeddings)\n```\n\n### Privacy + compression\n\nCombine both: rotate for privacy, then quantize for 8x compression.\n\n```python\ncompressed = encoder.rotate_and_compress(embeddings, bits=4)\nidx, scores = compressed.search(encoder.rotate(query), top_k=10)\ncompressed.save(\"private_index.npz\")\n```\n\n---\n\n## Compression\n\n8x instant compression, no training needed.\n\nFirst open-source implementation of Google's TurboQuant ([ICLR 2026](https://arxiv.org/abs/2504.19874)) for vector search.\n\n```python\nfrom turboquant_vectors import compress, search\n\ncompressed = compress(embeddings, bits=4)  # 307 MB -\u003e 38 MB\nindices, scores = search(compressed, query, top_k=10)\n```\n\n### Why\n\nFAISS Product Quantization requires k-means training per dataset. TurboQuant is instant (data-oblivious), compresses 2-2.5x faster, and gets up to +8pp better recall at the same storage budget.\n\n### Benchmarks on real OpenAI embeddings (10K vectors, 1536-dim)\n\nTested on Qdrant's `dbpedia-entities-openai3-text-embedding-3-small` dataset from HuggingFace. Real embeddings, not synthetic.\n\n| Bits | TurboQuant Recall@10 | FAISS PQ Recall@10 | Delta | TQ Compress Time |\n|------|---------------------|-------------------|-------|-----------------|\n| 2-bit | **90.6%** | 90.2% | **+0.4pp** | 1.2s (no training) |\n| 4-bit | **96.6%** | 96.1% | **+0.5pp** | 1.7s (no training) |\n| 8-bit | **99.3%** | 98.1% | **+1.2pp** | 9.5s (no training) |\n\nTurboQuant needs zero training (data-oblivious). FAISS PQ requires k-means training. Reproduce: `python benchmarks/real_data_benchmark.py`\n\n---\n\n## Install\n\n```bash\npip install turboquant-vectors\n```\n\nRequires only numpy. No torch, no scipy for the privacy module.\n\n## Full API\n\n### PrivateEncoder\n\n```python\nPrivateEncoder.generate(dim)           # New key from OS entropy\nPrivateEncoder.from_seed(dim, seed)    # Deterministic key (seed \u003e= 2^64)\nPrivateEncoder.load_key(path)          # Load from .tqkey file\n\nencoder.rotate(vectors)                # Apply rotation\nencoder.unrotate(vectors)              # Reverse rotation (needs key)\nencoder.save_key(path)                 # Save to .tqkey file\nencoder.fingerprint()                  # 16-char hex key ID\nencoder.rekey_vectors(vecs, old_enc)   # Switch keys without unrotating\nencoder.rotate_and_compress(vecs, 4)   # Privacy + compression\nencoder.make_canary() / verify_canary()  # Key verification without originals\n```\n\n### Compression\n\n```python\ncompress(vectors, bits=4)              # Compress vectors\ndecompress(compressed)                 # Restore to float32\nsearch(compressed, query, top_k=10)    # Search compressed vectors\ncompressed.save(path) / .load(path)    # Persistence\n```\n\n## Paper\n\n**TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate**\nZandieh, Daliri, Hadian, Mirrokni (Google Research)\nICLR 2026 | [arXiv:2504.19874](https://arxiv.org/abs/2504.19874)\n\nIndependent implementation, not affiliated with Google Research.\n\n## License\n\nApache 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fback2matching%2Fturboquant-vectors","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fback2matching%2Fturboquant-vectors","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fback2matching%2Fturboquant-vectors/lists"}