https://github.com/skailasa/cutlass-gemm

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/skailasa/cutlass-gemm
Owner: skailasa
Created: 2025-06-03T13:39:01.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-06-03T16:06:40.000Z (4 months ago)
Last Synced: 2025-06-04T00:55:53.568Z (4 months ago)
Language: CMake
Size: 42 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# CUTLASS Experiments

Cutlass provides templates/abstractions for high-performance GEMM and CONV.

I try out a number of different techniques, and document as much as I can on the way.

There are different levels of API, related to the GPU memory hierarchy of NVIDIA GPUs.

- Threadblocklevel abstractions, for matrix multiply-accumulate operations.
- Warp level, for matrix multiply-accumulate operations.
- Epilogue components, for tensor ops/saving
- Loading/saving utilities.

### CUTE

- A template library for tensors, built on top of cutlass, and provides more flexibility for specifying the layout of tensors with reference to GPU memory hierarchies.
- Introduced in latest major release of CUTLASS (3.0)

### Run Example Scipts

Dependencies:
- CUDA
- clang format (optional)

```bash
# configure
cmake --preset nvidia-release

# Examples:

# 1. Simple CUTLASS Based GEMM based on the tutorial

# build
cmake --build --preset nvidia-release --target simple_cutlass_gemm
# run the example directly and build
cmake --build --preset nvidia-release --target run_simple_cutlass_gemm

# Other Commands:
# 1. Format
cmake --build --preset nvidia-release --target format
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/skailasa/cutlass-gemm

Awesome Lists containing this project

README