An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with flashinfer

A curated list of projects in awesome lists tagged with flashinfer .

https://github.com/bruce-lee-ly/decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

cuda cuda-core decoding-attention flash-attention flashinfer flashmla gpu gqa inference large-language-model llm mha mla mqa multi-head-attention nvidia

Last synced: 05 May 2025

https://github.com/sgl-project/whl

Kernel Library Wheel for SGLang

cu118 cuda cutlass flashinfer sglang

Last synced: 02 Feb 2025