https://github.com/iamwavecut/llama-cpp-triattention

llama.cpp fork with TurboQuant quantization (turbo2/3/4) and TriAttention GPU-accelerated KV cache pruning. 75 tok/s on Qwen3-8B / RTX 3080.
https://github.com/iamwavecut/llama-cpp-triattention

Last synced: 3 months ago
JSON representation

llama.cpp fork with TurboQuant quantization (turbo2/3/4) and TriAttention GPU-accelerated KV cache pruning. 75 tok/s on Qwen3-8B / RTX 3080.

Host: GitHub
URL: https://github.com/iamwavecut/llama-cpp-triattention
Owner: iamwavecut
License: mit
Created: 2026-04-09T19:55:24.000Z (4 months ago)
Default Branch: triattention
Last Pushed: 2026-04-09T20:26:30.000Z (4 months ago)
Last Synced: 2026-04-09T22:17:23.458Z (4 months ago)
Language: C++
Size: 290 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 6
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: CODEOWNERS
- Security: SECURITY.md
- Authors: AUTHORS
- Agents: AGENTS.md

Awesome Lists containing this project