An open API service indexing awesome lists of open source software.

https://github.com/iamwavecut/llama-cpp-triattention

llama.cpp fork with TurboQuant quantization (turbo2/3/4) and TriAttention GPU-accelerated KV cache pruning. 75 tok/s on Qwen3-8B / RTX 3080.
https://github.com/iamwavecut/llama-cpp-triattention

Last synced: about 2 months ago
JSON representation

llama.cpp fork with TurboQuant quantization (turbo2/3/4) and TriAttention GPU-accelerated KV cache pruning. 75 tok/s on Qwen3-8B / RTX 3080.

Awesome Lists containing this project