Projects in Awesome Lists tagged with flashinfer
A curated list of projects in awesome lists tagged with flashinfer .
https://github.com/bruce-lee-ly/decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
cuda cuda-core decoding-attention flash-attention flashinfer flashmla gpu gqa inference large-language-model llm mha mla mqa multi-head-attention nvidia
Last synced: 05 May 2025
https://github.com/sgl-project/whl
Kernel Library Wheel for SGLang
cu118 cuda cutlass flashinfer sglang
Last synced: 02 Feb 2025