https://github.com/mu7annad0/100gpu
100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥
https://github.com/mu7annad0/100gpu
cuda gpu
Last synced: 4 months ago
JSON representation
100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥
- Host: GitHub
- URL: https://github.com/mu7annad0/100gpu
- Owner: Mu7annad0
- Created: 2025-03-13T01:15:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-05T21:34:46.000Z (about 1 year ago)
- Last Synced: 2025-04-05T22:24:39.073Z (about 1 year ago)
- Topics: cuda, gpu
- Language: Cuda
- Homepage:
- Size: 35.2 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 100 Days of GPU Challenge
This repository is a part of the 100 Days of GPU Challenge, a 100-day long challenge to learn GPU programming.
| Day | Kernel | Description |
| :---: | :------: | :---------------------- |
| 1 | Vector Addition | Implemented a basic element-wise addition kernel using CUDA to add two vectors.
Read the first two chapters from the PMPP Book. |
| 2 | Matrix Addition | Implemented a basic matrix Addition kernel using CUDA to add two matrices. |
| 3 | RGB to Grayscale Conversion | Implemented a RGB to Grayscale Conversion kernel using CUDA.
Read the first 2 sections from the third chapter of the PMPP Book. |
| 4 | Blur a RGB Image | Implemented a Blur rgb image conversion kernel using CUDA.
Read the section 3 from the PMPP Book, and also this [blog](https://michalpitr.substack.com/p/gpu-programming).|
| 5 | Matrix Multiplication | Implemented a Matrix Multiplication kernel using CUDA.
Finished chapter 3 of PMPP Book. |
| 6 | Matrix Transpose | Implemented a Matrix Transpose kernel using CUDA.
Started reading Chapter 4 and gained a comprehensive understanding of the architecture of modern CUDA-capable GPUs, including block scheduling, synchronization, and transparent scalability.|
| 7 | Softmax | Implemnted Softmax Function with CUDA. |
| 8 | ReLU | Implemented a ReLU kernel using CUDA.
Finished Chapter 4. Gained an understanding of warp scheduling, latency tolerance, and control divergence. |
| 9 | Tiled Matrix Multiplication | Implemented Matrix Multiplication kernel using Shared Memory |
| 10 | GeLU | Implemented GeLU Kernel using CUDA.
Finished Chapter 5 and get to know the different types of CUDA memory and how tiling helps reduce memory traffic.|
| 11 | Conv1D | Implemented 1D Convolution with shared memory. |
| 12 | Online Softmax | Implemented Online Softmax. |
| 13 | Softmax (Shared Memory) | Implemented Softmax with shared-memory using CUDA. |