https://github.com/gpu-mode/lectures

Material for gpu-mode lectures
https://github.com/gpu-mode/lectures

Last synced: 11 days ago
JSON representation

Material for gpu-mode lectures

Host: GitHub
URL: https://github.com/gpu-mode/lectures
Owner: gpu-mode
License: apache-2.0
Created: 2024-01-20T19:28:02.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-02-09T17:36:02.000Z (3 months ago)
Last Synced: 2025-05-06T16:16:47.174Z (18 days ago)
Language: Jupyter Notebook
Homepage: https://www.youtube.com/@GPUMODE
Size: 102 MB
Stars: 4,359
Watchers: 64
Forks: 442
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-cuda-and-hpc - gpu-mode/lectures - mode/lectures?style=social"/> : Material for gpu-mode lectures. [www.youtube.com/@GPUMODE](https://www.youtube.com/@GPUMODE) (Learning Resources)

README

        # Supplementary Material for Lectures

[![](https://dcbadge.vercel.app/api/server/gpumode?style=flat)](https://discord.gg/gpumode)

[YouTube Channel](https://www.youtube.com/@GPUMODE)

The PMPP Book: [Programming Massively Parallel Processors: A Hands-on Approach](https://a.co/d/2S2fVzt) (Amazon link)

## Lecture 1: Profiling and Integrating CUDA kernels in PyTorch

- Speaker: [Mark Saroufim](https://twitter.com/marksaroufim)

- Notebook and slides in [lecture_001](./lecture_001/) folder

## Lecture 2: Recap Ch. 1-3 from the PMPP book

- Speaker: [Andreas Koepf](https://twitter.com/neurosp1ke)

- Slides: The powerpoint file [lecture_002/cuda_mode_lecture2.pptx](./lecture_002/cuda_mode_lecture2.pptx) can be found in the root directory of this repository. Alternatively [here](https://docs.google.com/presentation/d/1deqvEHdqEC4LHUpStO6z3TT77Dt84fNAvTIAxBJgDck/edit#slide=id.g2b1444253e5_1_75) as Google docs presentation.

## Lecture 3: Getting Started With CUDA

- Speaker: [Jeremy Howard](https://twitter.com/jeremyphoward)

- Notebook: See the [lecture_003](./lecture_003/) folder, or run the [Colab version](https://colab.research.google.com/drive/180uk6frvMBeT4tywhhYXmz3PJaCIA_uk?usp=sharing)

## Lecture 4: Intro to Compute and Memory Architecture

- Speaker: [Thomas Viehmann](https://lernapparat.de/)

- Notebook and slides in the [lecture_004](./lecture_004/) folder.

## Lecture 5: Going Further with CUDA for Python Programmers

- Speaker: [Jeremy Howard](https://twitter.com/jeremyphoward)

- Notebook in the [lecture_005](./lecture_005/) folder.

## Lecture 6: Optimizing PyTorch Optimizers

- Speaker: [Jane Xu](https://github.com/janeyx99)

- [Slides](https://docs.google.com/presentation/d/13WLCuxXzwu5JRZo0tAfW0hbKHQMvFw4O/edit#slide=id.p1)

## Lecture 7: Advanced Quantization

- Speaker: [Charles Hernandez](https://github.com/HDCharles)

- [Slides](https://www.dropbox.com/scl/fi/hzfx1l267m8gwyhcjvfk4/Quantization-Cuda-vs-Triton.pdf?rlkey=s4j64ivi2kpp2l0uq8xjdwbab&dl=0)

## Lecture 8: CUDA Performance Checklist

- Speaker: [Mark Saroufim](https://github.com/msaroufim)

- Code in the [lecture_008](./lecture_008/) folder

- [Slides](https://docs.google.com/presentation/d/1cvVpf3ChFFiY4Kf25S4e4sPY6Y5uRUO-X-A4nJ7IhFE/edit?usp=sharing)

## Lecture 9: Reductions

- Speaker: [Mark Saroufim](https://github.com/msaroufim)

- Code in the [lecture_009](./lecture_009/) folder

- [Slides](https://docs.google.com/presentation/d/1s8lRU8xuDn-R05p1aSP6P7T5kk9VYnDOCyN5bWKeg3U/edit?usp=drive_link)

## Lecture 10: Build a Prod Ready CUDA Library

* Speaker: [Oscar Amoros Huguet](https://github.com/morousg)

* [slides](https://drive.google.com/drive/folders/158V8BzGj-IkdXXDAdHPNwUzDLNmr971_?usp=drive_link)

## Lecture 11: Sparsity

* Speaker: [Jesse Cai](https://github.com/jcaip)

* [Slides](./lecture_011/sparsity.pptx)

## Lecture 12: Flash Attention

- Speaker: [Thomas Viehmann](https://lernapparat.de/)

## Lecture 13: Ring Attention

- Speaker: [Andreas Koepf](https://twitter.com/neurosp1ke)

- [Slides](./lecture_013/ring_attention.pptx)

## Lecture 14: Practitioner's Guide to Triton

- Date: 2024-04-13, Speaker: [Umer Adil](https://twitter.com/UmerHAdil)

- [Notebook](./lecture_014/A_Practitioners_Guide_to_Triton.ipynb)

## Lecture 15: CUTLASS

- Speaker: [Eric Auld](https://github.com/ericauld)

## Lecture 16: On Hands profiling

- Speaker: [Taylor Robbie](https://www.linkedin.com/in/taylor-robie/)

## Bonus Lecture: CUDA C++ llm.cpp

- Speaker: [Jake Hemstad & Georgii Evtushenko]()

- [Slides](https://drive.google.com/drive/folders/1T-t0d_u0Xu8w_-1E5kAwmXNfF72x-HTA)

## Lecture 17: GPU Collective Communication (NCCL)

- Speaker: [Dan Johnson](https://physbam.stanford.edu/~dansj/)

- Code in the [lecture_017](./lecture_017/) folder

## Lecture 18: Fused Kernels

- Speaker: [Kapil Sharma](https://www.kapilsharma.dev/)

- Code in the [lecture_018](./lecture_018/) folder

## Lecture 19: Data Processing on GPUs

- Speaker: [Devavret Makkar](https://github.com/devavret)

## Lecture 20: Scan Algorithm

- Speaker: [Izzat El Haj](https://ielhajj.github.io/)

- [Slides](https://docs.google.com/presentation/d/1MEMsE5LKi6ush_60hlYu3-cz4DUCFzSL/edit?usp=sharing&ouid=106222972308395582904&rtpof=true&sd=true)

## Lecture 21: Scan Algorithm Part 2

- Speaker: [Izzat El Haj](https://ielhajj.github.io/)

- [Slides](https://docs.google.com/presentation/d/1MEMsE5LKi6ush_60hlYu3-cz4DUCFzSL/edit?usp=sharing&ouid=106222972308395582904&rtpof=true&sd=true)

## Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

- Speaker: [Cade Daniel](https://x.com/cdnamz)

- [Slides](https://docs.google.com/presentation/d/1p1xE-EbSAnXpTSiSI0gmy_wdwxN5XaULO3AnCWWoRe4/edit#slide=id.p)

## Lecture 23: Tensor Cores

- Speaker: Vijay Thakkar & Pradeep Ramani

- [Slides](https://drive.google.com/file/d/18sthk6IUOKbdtFphpm_jZNXoJenbWR8m/view)

## Lecture 24: Scan at the Speed of Light

- Speaker: Jake Hemstad & Georgii Evtushenko

## Lecture 25: Speaking Composable Kernel

- Speaker: Haocong Wang

- [Slides](./lecture_025/AMD_ROCm_Speaking_Composable_Kernel_July_20_2024.pdf)

## Lecture 26: SYCL MODE (Intel GPU)

- Speaker: Patric Zhao

- [Slides](https://docs.google.com/presentation/d/1SW4XKomAJhhJSH5-jpZI9Qlwp7TEunbV/edit?usp=sharing&ouid=106222972308395582904&rtpof=true&sd=true)

## Lecture 27: gpu.cpp

- Speaker: [Austin Huang](https://x.com/austinvhuang)

- [Slides](https://gpucpp-presentation.answer.ai/)

## Lecture 28: Liger Kernel

- Speaker: [Byron Hsu](https://x.com/hsu_byron)

- [Slides](https://docs.google.com/presentation/d/1CGTV-uKw9crrBo13q1jAzAFCFzlpZFjeL4bnK67pTd8/edit?usp=sharing)

- Hands-on  Notebooks

  1. [RMSNorm: Verifying Correctness and Performance](https://colab.research.google.com/drive/1CQYhul7MVG5F0gmqTBbx1O1HgolPgF0M?usp=sharing)

  2. [FusedLinearCrossEntropy: Verifying Memory Reduction](https://colab.research.google.com/drive/1Z2QtvaIiLm5MWOs7X6ZPS1MN3hcIJFbj?usp=sharing)

  3. [Convergence Comparison: Triton Kernel Patched vs. Original Model Layer-by-Layer](https://colab.research.google.com/drive/1e52FH0BcE739GZaVp-3_Dv7mc4jF1aif?usp=sharing)

  4. [Contiguity is the hidden killer](https://colab.research.google.com/drive/1llnAdo0hc9FpxYRRnjih0l066NCp7Ylu?usp=sharing)

  5. [Address int32 overflow](https://colab.research.google.com/drive/1WgaU_cmaxVzx8PcdKB5P9yHB6_WyGd4T?usp=sharing)

## Lecture 29: Triton Internals

- Speaker: [Kapil Sharma](https://www.kapilsharma.dev/)

- Code/presentation in the [lecture_029](./lecture_029/) folder

## Lecture 30: Quantized training

- Speaker: [Thien Tran](https://github.com/gau-nernst)

- Code/presentation in the [lecture_030](./lecture_030/) folder

## Lecture 31: Beginners Guide to Metal Kernels

- Speaker: [Nikita Shulga](https://github.com/gau-nernst)

- Code/presentation in the [lecture_031](./lecture_031/) folder

## Lecture 32: Unsloth - LLM Systems Engineering

- Speaker: [Daniel Han](https://x.com/danielhanchen)

- [Slides](https://docs.google.com/presentation/d/1BvgbDwvOY6Uy6jMuNXrmrz_6Km_CBW0f2espqeQaWfc/edit?usp=sharing)

## Lecture 33: BitBLAS

- Speaker: [Wang Lei](https://github.com/LeiWang1999)

- Code/presentation in the [lecture_033](./lecture_033/) folder

## Lecture 34: Low Bit Triton Kernels

- Speaker: [Hicham Badri](https://github.com/mobicham)

- [Slides](https://docs.google.com/presentation/d/1R9B6RLOlAblyVVFPk9FtAq6MXR1ufj1NaT0bjjib7Vc/edit)

## Lecture 35: SGLang Performance Optimization

- Speaker: [Yineng Zhang](https://linkedin.com/in/zhyncs)

- [Slides](https://github.com/zhyncs/lectures/blob/main/lecture_035/SGLang-Performance-Optimization-YinengZhang.pdf)

# Lecture 36: CUTLASS and Flash ATtention 3

- Speaker: [Jay Shah](https://research.colfax-intl.com/blog/)

- [Slides](lecture_036/)

# Lecture 37: Introduction to SASS & GPU Microarchitecture

- Speaker: [Arun Demeure](https://github.com/ademeure)

- [Slides](lecture_037/)

# Lecture 38: Lowbit kernels for ARM CPU

- Speaker: [Scott Roy](https://github.com/metascroy)

- [Slides](lecture_038/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gpu-mode/lectures

Awesome Lists containing this project

README