https://github.com/codepawl/turboquant-torch

Unofficial PyTorch implementation of TurboQuant (Google Research, ICLR 2026). Near-optimal vector quantization for KV cache compression and vector search. 3-bit with zero accuracy loss.
https://github.com/codepawl/turboquant-torch

compression inference kv-cache llm pytorch quantization

Last synced: 3 months ago
JSON representation

Unofficial PyTorch implementation of TurboQuant (Google Research, ICLR 2026). Near-optimal vector quantization for KV cache compression and vector search. 3-bit with zero accuracy loss.

Host: GitHub
URL: https://github.com/codepawl/turboquant-torch
Owner: codepawl
License: mit
Created: 2026-03-25T06:06:52.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-03-25T20:20:37.000Z (4 months ago)
Last Synced: 2026-03-26T10:39:23.778Z (4 months ago)
Topics: compression, inference, kv-cache, llm, pytorch, quantization
Language: Python
Homepage:
Size: 381 KB
Stars: 9
Watchers: 0
Forks: 0
Open Issues: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/codepawl/turboquant-torch

Awesome Lists containing this project