Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/alexkranias/triton_vs_cuda

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
https://github.com/alexkranias/triton_vs_cuda

cuda cuda-kernels gpu gpu-programming parallel-programming python triton

Last synced: 8 days ago
JSON representation

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

Host: GitHub
URL: https://github.com/alexkranias/triton_vs_cuda
Owner: alexkranias
License: mit
Created: 2024-08-25T21:42:18.000Z (6 months ago)
Default Branch: main
Last Pushed: 2024-09-07T18:21:05.000Z (5 months ago)
Last Synced: 2024-12-10T22:41:19.669Z (2 months ago)
Topics: cuda, cuda-kernels, gpu, gpu-programming, parallel-programming, python, triton
Language: Cuda
Homepage:
Size: 25.4 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# triton_vs_cuda
**Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.**

Lately I've been learning Triton, its strengths, and its weaknesses. Inspired by [SiBohem's blog](https://siboehm.com/articles/22/CUDA-MMM), I thought I would show how we can attempt to build a Triton kernel as performant as a near-cuBLAS performant CUDA kernel. In this endeavor I hope to highlight a few things about Triton:
- what are the limitations of a Triton's block level programming paradigm?
- as a kernel engineer, how much control do we retain in Triton to squeeze more performance out?
- where does the Triton compiler take over and attempt to fill in? How successful is it at this task? Where is work still needed at the compiler level?
- when should you _actually_ use Triton v.s. CUDA?

## Getting Started
I've divided this project into two branches:
- `main`: template kernel files
- `solutions`: solution kernel files

I've included dockerfiles in each `/triton` and `/cuda` directory to make enviornment setup quick and easy. Open those directories and you'll find `README.md`s explaining how to get going.

### In Progress
I'll have a blog on the subject posted at some point on my personal website: [**alexkranias.com**](https://alexkranias.com)

_I'm actively working on that piece._

In the meantime, you can `clone` this repo to work on this on your own and follow SiBohem's blog.