https://github.com/iamwavecut/llama-cpp-triattention
llama.cpp fork with TurboQuant quantization (turbo2/3/4) and TriAttention GPU-accelerated KV cache pruning. 75 tok/s on Qwen3-8B / RTX 3080.
https://github.com/iamwavecut/llama-cpp-triattention
Last synced: about 2 months ago
JSON representation
llama.cpp fork with TurboQuant quantization (turbo2/3/4) and TriAttention GPU-accelerated KV cache pruning. 75 tok/s on Qwen3-8B / RTX 3080.
- Host: GitHub
- URL: https://github.com/iamwavecut/llama-cpp-triattention
- Owner: iamwavecut
- License: mit
- Created: 2026-04-09T19:55:24.000Z (2 months ago)
- Default Branch: triattention
- Last Pushed: 2026-04-09T20:26:30.000Z (2 months ago)
- Last Synced: 2026-04-09T22:17:23.458Z (2 months ago)
- Language: C++
- Size: 290 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: CODEOWNERS
- Security: SECURITY.md
- Authors: AUTHORS
- Agents: AGENTS.md