Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kuangjux/cudakernels
🎉My Collections of CUDA Kernels~
https://github.com/kuangjux/cudakernels
Last synced: 3 months ago
JSON representation
🎉My Collections of CUDA Kernels~
- Host: GitHub
- URL: https://github.com/kuangjux/cudakernels
- Owner: KuangjuX
- Created: 2024-06-10T08:56:57.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-25T07:44:46.000Z (6 months ago)
- Last Synced: 2024-06-25T09:22:27.875Z (6 months ago)
- Language: C++
- Homepage:
- Size: 142 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CUDAKernels
## Introduction
The Project is used to collect the CUDA kernels that I have written by hand, for the purpose of learning various CUDA techniques and conducting performance evaluation.
## Abstractions
- [CUDA Vector Registers](include/memory/types/register.hpp): CUDA 向量化寄存器抽象 / CUDA Vectorize Register Abstractions.
- [CUDA Tile Registers](include/memory/types/register.hpp): CUDA Tile 寄存器抽象 / CUDA Tile Register Abstractions.## Kernels
- [Reduce Sum](src/kernels/reduce.cu): 使用 Warp Reduce 实现的 reduce sum 操作 / Reduce Sum Operation with **Warp Reduce**.
- [Reduce Max](src/kernels/reduce.cu): 使用 Warp Reduce 实现的 reduce max 操作 / Reduce Max Operation with **Warp Reduce**.
- [Softmax](src/kernels/softmax.cu): 使用 Warp Reduce 实现的未分块的 softmax 操作 / Softmax Operation with **Warp Reduce**.
- [Vectorize Load/Store](src/kernels/memory/vec.cu): 向量化加载与存储优化,包括 Global 到 Shared,Shared 到 RF 以及 Global 到 RF / Vectorize Load/Store Optimization.
- [Tile Load/Store](src/kernels/memory/tile.cu): 2D Tile 加载与存储优化,包括 Global 到 Shared,Shared 到 RF 以及 Global 到 RF / 2D Tile Load/Store Optimization.
- [FlashAttention](src/kernels/flash_attn/flash_attn_f32.cu): FlashAttention 的 CUDA 实现 / FlashAttention Implementation with CUDA.## Notes
- [Vectorized Memory Access](notes/memory/vec.md): 向量化内存访问笔记 / Notes about Vectorized Memory Access.
- [Memory Coalescing](notes/memory/coalescing.md): 内存合并访问笔记 / Notes about Memory Coalescing.
- [Warp-Level Primitives](notes/warp.md): Warp 原语笔记 / Notes about Warp-Level Primitives.
- [FlashAttention](notes/flash_attn.md): FlashAttention 笔记 / Notes about FlashAttention.### Thunder Kittens
- [Vectorized Memory Access in TK](notes/TK/memory/vec.md): TK 中向量化内存访问实现笔记 / Notes about Vectorized Memory Access in Tk.
- [2D Tile Memory Access in TK](notes/TK/memory/tile.md): TK 中 2D Tile 内存访问实现笔记 / Notes about 2D Tile Memory Access in Tk.## References
- [CUDA-Learn-Notes: 🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.](https://github.com/DefTruth/CUDA-Learn-Notes)
- [ThunderKittens: Tile primitives for speedy kernels](https://github.com/HazyResearch/ThunderKittens)