An open API service indexing awesome lists of open source software.

https://github.com/gpu-mode/lectures

Material for gpu-mode lectures
https://github.com/gpu-mode/lectures

Last synced: 1 day ago
JSON representation

Material for gpu-mode lectures

Awesome Lists containing this project

README

        

# Supplementary Material for Lectures
[![](https://dcbadge.vercel.app/api/server/gpumode?style=flat)](https://discord.gg/gpumode)

[YouTube Channel](https://www.youtube.com/@GPUMODE)

The PMPP Book: [Programming Massively Parallel Processors: A Hands-on Approach](https://a.co/d/2S2fVzt) (Amazon link)

## Lecture 1: Profiling and Integrating CUDA kernels in PyTorch
- Speaker: [Mark Saroufim](https://twitter.com/marksaroufim)
- Notebook and slides in [lecture_001](./lecture_001/) folder

## Lecture 2: Recap Ch. 1-3 from the PMPP book
- Speaker: [Andreas Koepf](https://twitter.com/neurosp1ke)
- Slides: The powerpoint file [lecture_002/cuda_mode_lecture2.pptx](./lecture_002/cuda_mode_lecture2.pptx) can be found in the root directory of this repository. Alternatively [here](https://docs.google.com/presentation/d/1deqvEHdqEC4LHUpStO6z3TT77Dt84fNAvTIAxBJgDck/edit#slide=id.g2b1444253e5_1_75) as Google docs presentation.

## Lecture 3: Getting Started With CUDA
- Speaker: [Jeremy Howard](https://twitter.com/jeremyphoward)
- Notebook: See the [lecture_003](./lecture_003/) folder, or run the [Colab version](https://colab.research.google.com/drive/180uk6frvMBeT4tywhhYXmz3PJaCIA_uk?usp=sharing)

## Lecture 4: Intro to Compute and Memory Architecture
- Speaker: [Thomas Viehmann](https://lernapparat.de/)
- Notebook and slides in the [lecture_004](./lecture_004/) folder.

## Lecture 5: Going Further with CUDA for Python Programmers
- Speaker: [Jeremy Howard](https://twitter.com/jeremyphoward)
- Notebook in the [lecture_005](./lecture_005/) folder.

## Lecture 6: Optimizing PyTorch Optimizers
- Speaker: [Jane Xu](https://github.com/janeyx99)
- [Slides](https://docs.google.com/presentation/d/13WLCuxXzwu5JRZo0tAfW0hbKHQMvFw4O/edit#slide=id.p1)

## Lecture 7: Advanced Quantization
- Speaker: [Charles Hernandez](https://github.com/HDCharles)
- [Slides](https://www.dropbox.com/scl/fi/hzfx1l267m8gwyhcjvfk4/Quantization-Cuda-vs-Triton.pdf?rlkey=s4j64ivi2kpp2l0uq8xjdwbab&dl=0)

## Lecture 8: CUDA Performance Checklist
- Speaker: [Mark Saroufim](https://github.com/msaroufim)
- Code in the [lecture_008](./lecture_008/) folder
- [Slides](https://docs.google.com/presentation/d/1cvVpf3ChFFiY4Kf25S4e4sPY6Y5uRUO-X-A4nJ7IhFE/edit?usp=sharing)

## Lecture 9: Reductions
- Speaker: [Mark Saroufim](https://github.com/msaroufim)
- Code in the [lecture_009](./lecture_009/) folder
- [Slides](https://docs.google.com/presentation/d/1s8lRU8xuDn-R05p1aSP6P7T5kk9VYnDOCyN5bWKeg3U/edit?usp=drive_link)

## Lecture 10: Build a Prod Ready CUDA Library
* Speaker: [Oscar Amoros Huguet](https://github.com/morousg)
* [slides](https://drive.google.com/drive/folders/158V8BzGj-IkdXXDAdHPNwUzDLNmr971_?usp=drive_link)

## Lecture 11: Sparsity
* Speaker: [Jesse Cai](https://github.com/jcaip)
* [Slides](./lecture_011/sparsity.pptx)

## Lecture 12: Flash Attention
- Speaker: [Thomas Viehmann](https://lernapparat.de/)

## Lecture 13: Ring Attention
- Speaker: [Andreas Koepf](https://twitter.com/neurosp1ke)
- [Slides](./lecture_013/ring_attention.pptx)

## Lecture 14: Practitioner's Guide to Triton
- Date: 2024-04-13, Speaker: [Umer Adil](https://twitter.com/UmerHAdil)
- [Notebook](./lecture_014/A_Practitioners_Guide_to_Triton.ipynb)

## Lecture 15: CUTLASS
- Speaker: [Eric Auld](https://github.com/ericauld)

## Lecture 16: On Hands profiling
- Speaker: [Taylor Robbie](https://www.linkedin.com/in/taylor-robie/)

## Bonus Lecture: CUDA C++ llm.cpp
- Speaker: [Jake Hemstad & Georgii Evtushenko]()
- [Slides](https://drive.google.com/drive/folders/1T-t0d_u0Xu8w_-1E5kAwmXNfF72x-HTA)

## Lecture 17: GPU Collective Communication (NCCL)
- Speaker: [Dan Johnson](https://physbam.stanford.edu/~dansj/)
- Code in the [lecture_017](./lecture_017/) folder

## Lecture 18: Fused Kernels
- Speaker: [Kapil Sharma](https://www.kapilsharma.dev/)
- Code in the [lecture_018](./lecture_018/) folder

## Lecture 19: Data Processing on GPUs
- Speaker: [Devavret Makkar](https://github.com/devavret)

## Lecture 20: Scan Algorithm
- Speaker: [Izzat El Haj](https://ielhajj.github.io/)
- [Slides](https://docs.google.com/presentation/d/1MEMsE5LKi6ush_60hlYu3-cz4DUCFzSL/edit?usp=sharing&ouid=106222972308395582904&rtpof=true&sd=true)

## Lecture 21: Scan Algorithm Part 2
- Speaker: [Izzat El Haj](https://ielhajj.github.io/)
- [Slides](https://docs.google.com/presentation/d/1MEMsE5LKi6ush_60hlYu3-cz4DUCFzSL/edit?usp=sharing&ouid=106222972308395582904&rtpof=true&sd=true)

## Lecture 22: Hacker's Guide to Speculative Decoding in VLLM
- Speaker: [Cade Daniel](https://x.com/cdnamz)
- [Slides](https://docs.google.com/presentation/d/1p1xE-EbSAnXpTSiSI0gmy_wdwxN5XaULO3AnCWWoRe4/edit#slide=id.p)

## Lecture 23: Tensor Cores
- Speaker: Vijay Thakkar & Pradeep Ramani
- [Slides](https://drive.google.com/file/d/18sthk6IUOKbdtFphpm_jZNXoJenbWR8m/view)

## Lecture 24: Scan at the Speed of Light
- Speaker: Jake Hemstad & Georgii Evtushenko

## Lecture 25: Speaking Composable Kernel
- Speaker: Haocong Wang
- [Slides](./lecture_025/AMD_ROCm_Speaking_Composable_Kernel_July_20_2024.pdf)

## Lecture 26: SYCL MODE (Intel GPU)
- Speaker: Patric Zhao
- [Slides](https://docs.google.com/presentation/d/1SW4XKomAJhhJSH5-jpZI9Qlwp7TEunbV/edit?usp=sharing&ouid=106222972308395582904&rtpof=true&sd=true)

## Lecture 27: gpu.cpp
- Speaker: [Austin Huang](https://x.com/austinvhuang)
- [Slides](https://gpucpp-presentation.answer.ai/)

## Lecture 28: Liger Kernel
- Speaker: [Byron Hsu](https://x.com/hsu_byron)
- [Slides](https://docs.google.com/presentation/d/1CGTV-uKw9crrBo13q1jAzAFCFzlpZFjeL4bnK67pTd8/edit?usp=sharing)
- Hands-on Notebooks
1. [RMSNorm: Verifying Correctness and Performance](https://colab.research.google.com/drive/1CQYhul7MVG5F0gmqTBbx1O1HgolPgF0M?usp=sharing)
2. [FusedLinearCrossEntropy: Verifying Memory Reduction](https://colab.research.google.com/drive/1Z2QtvaIiLm5MWOs7X6ZPS1MN3hcIJFbj?usp=sharing)
3. [Convergence Comparison: Triton Kernel Patched vs. Original Model Layer-by-Layer](https://colab.research.google.com/drive/1e52FH0BcE739GZaVp-3_Dv7mc4jF1aif?usp=sharing)
4. [Contiguity is the hidden killer](https://colab.research.google.com/drive/1llnAdo0hc9FpxYRRnjih0l066NCp7Ylu?usp=sharing)
5. [Address int32 overflow](https://colab.research.google.com/drive/1WgaU_cmaxVzx8PcdKB5P9yHB6_WyGd4T?usp=sharing)

## Lecture 29: Triton Internals
- Speaker: [Kapil Sharma](https://www.kapilsharma.dev/)
- Code/presentation in the [lecture_029](./lecture_029/) folder

## Lecture 30: Quantized training
- Speaker: [Thien Tran](https://github.com/gau-nernst)
- Code/presentation in the [lecture_030](./lecture_030/) folder

## Lecture 31: Beginners Guide to Metal Kernels
- Speaker: [Nikita Shulga](https://github.com/gau-nernst)
- Code/presentation in the [lecture_031](./lecture_031/) folder

## Lecture 32: Unsloth - LLM Systems Engineering
- Speaker: [Daniel Han](https://x.com/danielhanchen)
- [Slides](https://docs.google.com/presentation/d/1BvgbDwvOY6Uy6jMuNXrmrz_6Km_CBW0f2espqeQaWfc/edit?usp=sharing)

## Lecture 33: BitBLAS
- Speaker: [Wang Lei](https://github.com/LeiWang1999)
- Code/presentation in the [lecture_033](./lecture_033/) folder

## Lecture 34: Low Bit Triton Kernels
- Speaker: [Hicham Badri](https://github.com/mobicham)
- [Slides](https://docs.google.com/presentation/d/1R9B6RLOlAblyVVFPk9FtAq6MXR1ufj1NaT0bjjib7Vc/edit)

## Lecture 35: SGLang Performance Optimization
- Speaker: [Yineng Zhang](https://linkedin.com/in/zhyncs)
- [Slides](https://github.com/zhyncs/lectures/blob/main/lecture_035/SGLang-Performance-Optimization-YinengZhang.pdf)

# Lecture 36: CUTLASS and Flash ATtention 3
- Speaker: [Jay Shah](https://research.colfax-intl.com/blog/)
- [Slides](lecture_036/)

# Lecture 37: Introduction to SASS & GPU Microarchitecture
- Speaker: [Arun Demeure](https://github.com/ademeure)
- [Slides](lecture_037/)

# Lecture 38: Lowbit kernels for ARM CPU
- Speaker: [Scott Roy](https://github.com/metascroy)
- [Slides](lecture_038/)