Projects in Awesome Lists tagged with cuda-kernel
A curated list of projects in awesome lists tagged with cuda-kernel .
https://github.com/deftruth/cuda-learn-notes
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
cuda cuda-12 cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-toolkit flash-attention hgemm learn-cuda leet-cuda
Last synced: 14 May 2025
https://github.com/ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
cuda cuda-kernel pytorch transformer triton
Last synced: 01 May 2025
https://github.com/els-rd/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
cuda cuda-kernel pytorch transformer triton
Last synced: 08 Apr 2025
https://github.com/teddykoker/torchsort
Fast, differentiable sorting and ranking in PyTorch
cuda-kernel pytorch ranking sort
Last synced: 08 May 2025
https://github.com/tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
arm64 armv7 cuda cuda-kernel gemm-optimization int4 ptx vulkan
Last synced: 04 Apr 2025
https://github.com/webis-de/pytorch-window-matmul
a custom CUDA kernel for windowed matrix multiplication
Last synced: 16 Feb 2025
https://github.com/shikha-code36/cuda-programming-beginner-guide
A beginner's guide to CUDA programming
cuda cuda-basic cuda-basics cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-programming cuda-support cuda-toolkit
Last synced: 23 Mar 2025
https://github.com/shreyansh26/mlsys-experiments
A collection of scripts on experimenting and implementing MLSys-related stuff
cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton
Last synced: 03 Mar 2025
https://github.com/programmergnome/cuda-codes
Snippet repository for learning parallel GPU programming with CUDA.
c cpp-programming cuda cuda-kernel gpu-programming learning-materials parallel-programming parallelization
Last synced: 15 Mar 2025