https://github.com/codepawl/turboquant-torch
Unofficial PyTorch implementation of TurboQuant (Google Research, ICLR 2026). Near-optimal vector quantization for KV cache compression and vector search. 3-bit with zero accuracy loss.
https://github.com/codepawl/turboquant-torch
compression inference kv-cache llm pytorch quantization
Last synced: 2 months ago
JSON representation
Unofficial PyTorch implementation of TurboQuant (Google Research, ICLR 2026). Near-optimal vector quantization for KV cache compression and vector search. 3-bit with zero accuracy loss.
- Host: GitHub
- URL: https://github.com/codepawl/turboquant-torch
- Owner: codepawl
- License: mit
- Created: 2026-03-25T06:06:52.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-25T20:20:37.000Z (3 months ago)
- Last Synced: 2026-03-26T10:39:23.778Z (3 months ago)
- Topics: compression, inference, kv-cache, llm, pytorch, quantization
- Language: Python
- Homepage:
- Size: 381 KB
- Stars: 9
- Watchers: 0
- Forks: 0
- Open Issues: 0