Projects in Awesome Lists tagged with flashmla
A curated list of projects in awesome lists tagged with flashmla .
https://github.com/bruce-lee-ly/decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
cuda cuda-core decoding-attention flash-attention flashinfer flashmla gpu gqa inference large-language-model llm mha mla mqa multi-head-attention nvidia
Last synced: 19 Aug 2025
https://github.com/cat-gawr/deepseek-flashmla
DeepSeek Flash MLA - DeepSeek - copy manual
deepseek flashmla nvidia-cuda windows
Last synced: 02 May 2026