Projects in Awesome Lists tagged with cutlass
A curated list of projects in awesome lists tagged with cutlass .
https://github.com/xlite-dev/cuda-learn-notes
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm
Last synced: 15 Apr 2025
https://github.com/xlite-dev/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm
Last synced: 26 Mar 2025
https://github.com/DefTruth/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm
Last synced: 20 Mar 2025
https://github.com/bytedance/flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Last synced: 15 May 2025
https://github.com/bruce-lee-ly/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core
Last synced: 13 Apr 2025
https://github.com/bruce-lee-ly/cutlass_gemm
Multiple GEMM operators are constructed with cutlass to support LLM inference.
cublas cublaslt cutlass gemm gpu llm matrix-multiply nvidia tensor-core
Last synced: 13 Apr 2025
https://github.com/yashassamaga/convolutionbuildingblocks
GEMM and Winograd based convolutions using CUTLASS
convolution cuda cutlass deep-learning
Last synced: 03 Dec 2024
https://github.com/sgl-project/whl
Kernel Library Wheel for SGLang
cu118 cuda cutlass flashinfer sglang
Last synced: 02 Feb 2025