https://github.com/Infatoshi/cuda-course
https://github.com/Infatoshi/cuda-course
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/Infatoshi/cuda-course
- Owner: Infatoshi
- Created: 2024-07-08T14:20:35.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2025-01-04T06:11:10.000Z (over 1 year ago)
- Last Synced: 2025-01-04T07:26:02.917Z (over 1 year ago)
- Language: Cuda
- Size: 31.2 MB
- Stars: 726
- Watchers: 16
- Forks: 114
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - Infatoshi/cuda-course
README
# CUDA Course
GitHub Repo for CUDA Course on FreeCodeCamp
> Note: This course is designed for Ubuntu Linux. Windows users can use Windows Subsystem for Linux or Docker containers to simulate the ubuntu Linux environment.
## Table of Contents
1. [The Deep Learning Ecosystem](01_Deep_Learning_Ecosystem/README.md)
2. [Setup/Installation](02_Setup/README.md)
3. [C/C++ Review](03_C_and_C++_Review/README.md)
4. [Gentle Intro to GPUs](04_Gentle_Intro_to_GPUs/README.md)
5. [Writing Your First Kernels](05_Writing_your_First_Kernels/README.md)
6. [CUDA APIs (cuBLAS, cuDNN, etc)](06_CUDA_APIs/README.md)
7. [Optimizing Matrix Multiplication](07_Faster_Matmul/README.md)
8. [Triton](08_Triton/README.md)
9. [PyTorch Extensions (CUDA)](08_PyTorch_Extensions/README.md)
10. [Final Project](09_Final_Project/README.md)
11. [Extras](10_Extras/README.md)
## Course Philosophy
This course aims to:
- Lower the barrier to entry for HPC jobs
- Provide a foundation for understanding projects like Karpathy's [llm.c](https://github.com/karpathy/llm.c)
- Consolidate scattered CUDA programming resources into a comprehensive, organized course
## Overview
- Focus on GPU kernel optimization for performance improvement
- Cover CUDA, PyTorch, and Triton
- Emphasis on technical details of writing faster kernels
- Tailored for NVIDIA GPUs
- Culminates in a simple MLP MNIST project in CUDA
## Prerequisites
- Python programming (required)
- Basic differentiation and vector calculus for backprop (recommended)
- Linear algebra fundamentals (recommended)
## Key Takeaways
- Optimizing existing implementations
- Building CUDA kernels for cutting-edge research
- Understanding GPU performance bottlenecks, especially memory bandwidth
## Hardware Requirements
- Any NVIDIA GTX, RTX, or datacenter level GPU
- Cloud GPU options available for those without local hardware
## Use Cases for CUDA/GPU Programming
- Deep Learning (primary focus of this course)
- Graphics and Ray-tracing
- Fluid Simulation
- Video Editing
- Crypto Mining
- 3D modeling
- Anything that requires parallel processing with large arrays
## Resources
- GitHub repo (this repository)
- Stack Overflow
- NVIDIA Developer Forums
- NVIDIA and PyTorch documentation
- LLMs for navigating the space
- Cheatsheet [here](/11_Extras/assets/cheatsheet.md)
## Other Learning Material
- https://github.com/CoffeeBeforeArch/cuda_programming
- https://www.youtube.com/@GPUMODE
- https://discord.com/invite/gpumode
## Fun YouTube Videos:
- [How do GPUs works? Exploring GPU Architecture](https://www.youtube.com/watch?v=h9Z4oGN89MU)
- [But how do GPUs actually work?](https://www.youtube.com/watch?v=58jtf24uijw&ab_channel=Graphicode)
- [Getting Started With CUDA for Python Programmers](https://www.youtube.com/watch?v=nOxKexn3iBo&ab_channel=JeremyHoward)
- [Transformers Explained From The Atom Up](https://www.youtube.com/watch?v=7lJZHbg0EQ4&ab_channel=JacobRintamaki)
- [How CUDA Programming Works - Stephen Jones, CUDA Architect, NVIDIA](https://www.youtube.com/watch?v=QQceTDjA4f4&ab_channel=ChristopherHollinworth)
- [Parallel Computing with Nvidia CUDA - NeuralNine](https://www.youtube.com/watch?v=zSCdTOKrnII&ab_channel=NeuralNine)
- [CPU vs GPU vs TPU vs DPU vs QPU](https://www.youtube.com/watch?v=r5NQecwZs1A&ab_channel=Fireship)
- [Nvidia CUDA in 100 Seconds](https://www.youtube.com/watch?v=pPStdjuYzSI&ab_channel=Fireship)
- [How AI Discovered a Faster Matrix Multiplication Algorithm](https://www.youtube.com/watch?v=fDAPJ7rvcUw&t=1s&ab_channel=QuantaMagazine)
- [The fastest matrix multiplication algorithm](https://www.youtube.com/watch?v=sZxjuT1kUd0&ab_channel=Dr.TreforBazett)
- [From Scratch: Cache Tiled Matrix Multiplication in CUDA](https://www.youtube.com/watch?v=ga2ML1uGr5o&ab_channel=CoffeeBeforeArch)
- [From Scratch: Matrix Multiplication in CUDA](https://www.youtube.com/watch?v=DpEgZe2bbU0&ab_channel=CoffeeBeforeArch)
- [Intro to GPU Programming](https://www.youtube.com/watch?v=G-EimI4q-TQ&ab_channel=TomNurkkala)
- [CUDA Programming](https://www.youtube.com/watch?v=xwbD6fL5qC8&ab_channel=TomNurkkala)
- [Intro to CUDA (part 1): High Level Concepts](https://www.youtube.com/watch?v=4APkMJdiudU&ab_channel=JoshHolloway)
- [Intro to GPU Hardware](https://www.youtube.com/watch?v=kUqkOAU84bA&ab_channel=TomNurkkala)
## Find me
- [Twitter/X](https://x.com/elliotarledge)
- [LinkedIn](https://www.linkedin.com/in/elliot-arledge-a392b7243/)
- [YouTube](https://www.youtube.com/channel/UCjlt_l6MIdxi4KoxuMjhYxg)
- [Discord](https://discord.gg/JTTcFe7Pw2)