An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with cutlass

A curated list of projects in awesome lists tagged with cutlass .

https://github.com/xlite-dev/cuda-learn-notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 15 Apr 2025

https://github.com/xlite-dev/CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 26 Mar 2025

https://github.com/DefTruth/CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 20 Mar 2025

https://github.com/bytedance/flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

cuda cutlass gpu pytorch

Last synced: 15 May 2025

https://github.com/bruce-lee-ly/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core

Last synced: 13 Apr 2025

https://github.com/bruce-lee-ly/cutlass_gemm

Multiple GEMM operators are constructed with cutlass to support LLM inference.

cublas cublaslt cutlass gemm gpu llm matrix-multiply nvidia tensor-core

Last synced: 13 Apr 2025

https://github.com/yashassamaga/convolutionbuildingblocks

GEMM and Winograd based convolutions using CUTLASS

convolution cuda cutlass deep-learning

Last synced: 03 Dec 2024

https://github.com/sgl-project/whl

Kernel Library Wheel for SGLang

cu118 cuda cutlass flashinfer sglang

Last synced: 02 Feb 2025