https://github.com/skailasa/cutlass-gemm
https://github.com/skailasa/cutlass-gemm
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/skailasa/cutlass-gemm
- Owner: skailasa
- Created: 2025-06-03T13:39:01.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-06-03T16:06:40.000Z (4 months ago)
- Last Synced: 2025-06-04T00:55:53.568Z (4 months ago)
- Language: CMake
- Size: 42 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CUTLASS Experiments
Cutlass provides templates/abstractions for high-performance GEMM and CONV.
I try out a number of different techniques, and document as much as I can on the way.
There are different levels of API, related to the GPU memory hierarchy of NVIDIA GPUs.
- Threadblocklevel abstractions, for matrix multiply-accumulate operations.
- Warp level, for matrix multiply-accumulate operations.
- Epilogue components, for tensor ops/saving
- Loading/saving utilities.### CUTE
- A template library for tensors, built on top of cutlass, and provides more flexibility for specifying the layout of tensors with reference to GPU memory hierarchies.
- Introduced in latest major release of CUTLASS (3.0)### Run Example Scipts
Dependencies:
- CUDA
- clang format (optional)```bash
# configure
cmake --preset nvidia-release# Examples:
# 1. Simple CUTLASS Based GEMM based on the tutorial
# build
cmake --build --preset nvidia-release --target simple_cutlass_gemm
# run the example directly and build
cmake --build --preset nvidia-release --target run_simple_cutlass_gemm# Other Commands:
# 1. Format
cmake --build --preset nvidia-release --target format
```