https://github.com/jiaau/kernels
This repository showcases common optimization techniques for kernels.
https://github.com/jiaau/kernels
cpp cuda cute cutlass hpc kernel
Last synced: about 2 months ago
JSON representation
This repository showcases common optimization techniques for kernels.
- Host: GitHub
- URL: https://github.com/jiaau/kernels
- Owner: jiaau
- License: gpl-3.0
- Created: 2025-05-22T16:22:56.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-09T12:53:37.000Z (about 1 year ago)
- Last Synced: 2025-06-15T06:01:51.430Z (12 months ago)
- Topics: cpp, cuda, cute, cutlass, hpc, kernel
- Language: Cuda
- Homepage:
- Size: 51.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Kernels
## 关注点
- [reduce](./src/reduce)
- CUDA Warp-Level Primitives
- Parallel reduction
- [transpose](./src/transpose)
- Memory Coalescing
- Shared Memory
- Bank Conflict
- Swizzling
- CuTe
- [sgemm](./src/sgemm)
- Tile Size Tuning
- Shared Memory
- Bank Conflict
- Double Buffer
- Warp Divergence
- Vectorized memory access
## 编译与运行
### 编译项目
```bash
make build
make install
```
### 运行测试
```bash
make run
```
### 使用NVIDIA Compute Profiler进行性能分析
```bash
make ncu
```
### 清理构建文件
```bash
make clean
```
## 命令行选项
运行SGEMM测试时支持以下选项:
- `--bench`: 启用基准测试模式
- `--times N`: 指定基准测试迭代次数(默认:3)
- `--help`: 显示帮助信息
例如:
```bash
make run -- --bench --times 10
```
## Acknowledgments
- 本项目使用了 [Chtholly-Boss/swizzle](https://github.com/Chtholly-Boss/swizzle) 的一些工具函数