https://github.com/back2matching/turboquant-vectors

Compress embeddings 6x instantly with TurboQuant. First pip package using Google's TurboQuant (ICLR 2026) for vector search. 71.9% recall vs FAISS PQ 13.3%.
https://github.com/back2matching/turboquant-vectors

compression embeddings faiss machine-learning numpy quantization rag turboquant vector-search

Last synced: 14 days ago
JSON representation

Compress embeddings 6x instantly with TurboQuant. First pip package using Google's TurboQuant (ICLR 2026) for vector search. 71.9% recall vs FAISS PQ 13.3%.

Host: GitHub
URL: https://github.com/back2matching/turboquant-vectors
Owner: back2matching
Created: 2026-03-25T19:38:48.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-03-26T16:26:37.000Z (3 months ago)
Last Synced: 2026-05-09T05:24:57.702Z (about 1 month ago)
Topics: compression, embeddings, faiss, machine-learning, numpy, quantization, rag, turboquant, vector-search
Language: Python
Size: 317 KB
Stars: 1
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # turboquant-vectors

Compress and protect embeddings with TurboQuant.

Two tools in one package:

- **PrivateEncoder** -- rotate embeddings with a secret key. Search works identically. Inversion attacks fail.

- **compress/search** -- 8x compression, no training needed, instant.

```python

from turboquant_vectors import PrivateEncoder

encoder = PrivateEncoder.generate(dim=1536)

rotated = encoder.rotate(embeddings)       # search works identically

encoder.save_key("secret.tqkey")           # treat like an SSH key

```

## Embedding Privacy

Vec2Text recovers 92% of original text from unprotected embeddings (32-token inputs, GTR-base encoder). ALGEN needs only 1,000 leaked pairs. OWASP lists this as LLM08 in their 2025 Top 10.

PrivateEncoder applies a secret orthogonal rotation before you send embeddings to a third-party vector DB. The math:

```

 = x^T Q^T Q y = x^T y = 

```

Cosine similarity, L2 distance, inner product -- all preserved exactly (up to float32 precision, ~1e-6 error).

### Quick start

```python

from turboquant_vectors import PrivateEncoder

import numpy as np

# Generate a secret key (uses OS entropy)

encoder = PrivateEncoder.generate(dim=1536)

encoder.save_key("secret.tqkey")

# Rotate before uploading to Pinecone/Weaviate/Qdrant

rotated = encoder.rotate(embeddings)

# pinecone_index.upsert(vectors=rotated.tolist(), ids=ids)

# Rotate query too (same key)

rotated_query = encoder.rotate(query)

# results = pinecone_index.query(vector=rotated_query.tolist(), top_k=10)

# Later, load the same key

encoder = PrivateEncoder.load_key("secret.tqkey")

```

### What it protects against

- **Vec2Text** (92% text recovery from embeddings) -- fails completely on rotated vectors

- **ALGEN** (few-shot inversion with 1K pairs) -- fails without the rotation key

- **ZSinvert / Zero2Text** (zero-shot inversion) -- fails on rotated embedding space

- **Attribute classifiers** (age, sex, medical conditions from embeddings) -- drop to random chance

Our demo proves it on real sentence-transformer embeddings across 5 sensitive categories (medical, financial, legal, personal, neutral): a classifier achieves 88.9% accuracy on originals but drops to 11.1% on rotated vectors (below 20% random chance). See `demos/inversion_demo.py`.

We also tested the Wasserstein-Procrustes unsupervised alignment attack (the strongest known attack that doesn't require matched pairs). It fails completely: cosine recovery of 0.004, identical to a random guess. See `benchmarks/adversarial_self_test.py`.

### What it does NOT protect against

Be honest about the threat model:

- **Known-plaintext attack**: d original-rotated pairs (e.g., 1,536 for OpenAI embeddings) fully recovers the key via SVD. Don't let anyone see both the original AND rotated versions of the same content.

- **Pairwise distances are visible**: The server can see which documents are similar to each other, cluster structure, and query patterns. It just can't read what any document says.

- **Key compromise**: If the key file leaks, all rotated vectors are trivially recoverable.

- **RAG output attacks**: Membership inference via LLM output is not mitigated.

### What it is NOT

- Not encryption in the cryptographic sense

- Not differential privacy (no epsilon-delta guarantee)

- Not a substitute for access control on the vector database

**Threat model**: honest-but-curious vector DB provider who sees only rotated vectors and has no access to your original texts or the rotation key.

### What the server CAN learn

Even with rotation, the server can observe:

- Cluster structure (how many topics exist)

- Document similarity graph (which docs are related)

- Query patterns (which clusters you search most)

- Duplicate/near-duplicate documents

- Temporal patterns (when documents are added)

The server CANNOT determine what any document says, infer PII, or run published inversion attacks.

### Comparison with other approaches

| Property | Rotation (ours) | Differential Privacy | Homomorphic Encryption | IronCore Cloaked AI |

|----------|----------------|---------------------|----------------------|-------------------|

| Search quality | Identical (lossless) | 5-30% recall loss | Identical | ~5% recall loss |

| Latency overhead | <0.1ms per vector | Negligible | 1000-10000x | SDK overhead |

| Deployment | One numpy matmul | Drop-in | Custom server | SDK + license |

| License | Apache 2.0 | N/A | N/A | AGPL / $599+/mo |

| Known-plaintext resistant | No (d pairs breaks it) | Yes | Yes | Partially |

### Key management

Treat `.tqkey` files like SSH private keys:

- Don't commit to git (add `*.tqkey` to .gitignore)

- Back up securely -- if lost, you can't unrotate (search still works)

- Use `from_seed()` with a 128-bit seed to share keys without large files

- Use `rekey_vectors()` to rotate to a new key without exposing originals

### Benchmarks

| Dimension | Single vector | Batch 10K | Key generation | Key file |

|-----------|--------------|-----------|---------------|---------|

| 384 | 0.03 ms | 8.7 ms | 31 ms | 0.6 MB |

| 768 | 0.06 ms | 25 ms | 141 ms | 2.4 MB |

| 1536 | 0.11 ms | 88 ms | 465 ms | 9.4 MB |

### Integration examples

Works with any vector DB that accepts float arrays:

```python

# Pinecone

rotated = encoder.rotate(embeddings)

index.upsert(vectors=[(id, vec.tolist(), meta) for id, vec, meta in zip(ids, rotated, metadata)])

# ChromaDB

collection.add(embeddings=encoder.rotate(embeddings).tolist(), ids=ids)

# LangChain (wrap any embedding model)

class PrivateEmbeddings(Embeddings):

    def __init__(self, base, encoder):

        self.base, self.encoder = base, encoder

    def embed_documents(self, texts):

        return self.encoder.rotate(np.array(self.base.embed_documents(texts))).tolist()

    def embed_query(self, text):

        return self.encoder.rotate(np.array(self.base.embed_query(text))).tolist()

# sentence-transformers

embeddings = model.encode(texts)

rotated = encoder.rotate(embeddings)

```

### Privacy + compression

Combine both: rotate for privacy, then quantize for 8x compression.

```python

compressed = encoder.rotate_and_compress(embeddings, bits=4)

idx, scores = compressed.search(encoder.rotate(query), top_k=10)

compressed.save("private_index.npz")

```

---

## Compression

8x instant compression, no training needed.

First open-source implementation of Google's TurboQuant ([ICLR 2026](https://arxiv.org/abs/2504.19874)) for vector search.

```python

from turboquant_vectors import compress, search

compressed = compress(embeddings, bits=4)  # 307 MB -> 38 MB

indices, scores = search(compressed, query, top_k=10)

```

### Why

FAISS Product Quantization requires k-means training per dataset. TurboQuant is instant (data-oblivious), compresses 2-2.5x faster, and gets up to +8pp better recall at the same storage budget.

### Benchmarks on real OpenAI embeddings (10K vectors, 1536-dim)

Tested on Qdrant's `dbpedia-entities-openai3-text-embedding-3-small` dataset from HuggingFace. Real embeddings, not synthetic.

| Bits | TurboQuant Recall@10 | FAISS PQ Recall@10 | Delta | TQ Compress Time |

|------|---------------------|-------------------|-------|-----------------|

| 2-bit | **90.6%** | 90.2% | **+0.4pp** | 1.2s (no training) |

| 4-bit | **96.6%** | 96.1% | **+0.5pp** | 1.7s (no training) |

| 8-bit | **99.3%** | 98.1% | **+1.2pp** | 9.5s (no training) |

TurboQuant needs zero training (data-oblivious). FAISS PQ requires k-means training. Reproduce: `python benchmarks/real_data_benchmark.py`

---

## Install

```bash

pip install turboquant-vectors

```

Requires only numpy. No torch, no scipy for the privacy module.

## Full API

### PrivateEncoder

```python

PrivateEncoder.generate(dim)           # New key from OS entropy

PrivateEncoder.from_seed(dim, seed)    # Deterministic key (seed >= 2^64)

PrivateEncoder.load_key(path)          # Load from .tqkey file

encoder.rotate(vectors)                # Apply rotation

encoder.unrotate(vectors)              # Reverse rotation (needs key)

encoder.save_key(path)                 # Save to .tqkey file

encoder.fingerprint()                  # 16-char hex key ID

encoder.rekey_vectors(vecs, old_enc)   # Switch keys without unrotating

encoder.rotate_and_compress(vecs, 4)   # Privacy + compression

encoder.make_canary() / verify_canary()  # Key verification without originals

```

### Compression

```python

compress(vectors, bits=4)              # Compress vectors

decompress(compressed)                 # Restore to float32

search(compressed, query, top_k=10)    # Search compressed vectors

compressed.save(path) / .load(path)    # Persistence

```

## Paper

**TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate**

Zandieh, Daliri, Hadian, Mirrokni (Google Research)

ICLR 2026 | [arXiv:2504.19874](https://arxiv.org/abs/2504.19874)

Independent implementation, not affiliated with Google Research.

## License

Apache 2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/back2matching/turboquant-vectors

Awesome Lists containing this project

README