https://github.com/mu7annad0/100gpu

100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥
https://github.com/mu7annad0/100gpu

cuda gpu

Last synced: 4 months ago
JSON representation

100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥

Host: GitHub
URL: https://github.com/mu7annad0/100gpu
Owner: Mu7annad0
Created: 2025-03-13T01:15:37.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-04-05T21:34:46.000Z (about 1 year ago)
Last Synced: 2025-04-05T22:24:39.073Z (about 1 year ago)
Topics: cuda, gpu
Language: Cuda
Homepage:
Size: 35.2 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # 100 Days of GPU Challenge

This repository is a part of the 100 Days of GPU Challenge, a 100-day long challenge to learn GPU programming.

| Day | Kernel | Description |

| :---: | :------: | :---------------------- |

| 1 | Vector Addition | Implemented a basic element-wise addition kernel using CUDA to add two vectors. 
 Read the first two chapters from the PMPP Book. |

| 2 | Matrix Addition | Implemented a basic matrix Addition kernel using CUDA to add two matrices. |

| 3 | RGB to Grayscale Conversion | Implemented a RGB to Grayscale Conversion kernel using CUDA. 
 Read the first 2 sections from the third chapter of the PMPP Book. |

| 4 | Blur a RGB Image | Implemented a Blur rgb image conversion kernel using CUDA. 
 Read the section 3 from the PMPP Book, and also this [blog](https://michalpitr.substack.com/p/gpu-programming).|

| 5 | Matrix Multiplication | Implemented a Matrix Multiplication kernel using CUDA.
  Finished chapter 3 of PMPP Book. |

| 6 | Matrix Transpose | Implemented a Matrix Transpose kernel using CUDA. 
 Started reading Chapter 4 and gained a comprehensive understanding of the architecture of modern CUDA-capable GPUs, including block scheduling, synchronization, and transparent scalability.|

| 7 | Softmax | Implemnted Softmax Function with CUDA. |

| 8 | ReLU | Implemented a ReLU kernel using CUDA. 
 Finished Chapter 4. Gained an understanding of warp scheduling, latency tolerance, and control divergence. |

| 9 | Tiled Matrix Multiplication | Implemented Matrix Multiplication kernel using Shared Memory |

| 10 | GeLU | Implemented GeLU Kernel using CUDA. 
Finished Chapter 5 and get to know the different types of CUDA memory and how tiling helps reduce memory traffic.|

| 11 | Conv1D | Implemented 1D Convolution with shared memory. |

| 12 | Online Softmax | Implemented Online Softmax. |

| 13 | Softmax (Shared Memory) | Implemented Softmax with shared-memory using CUDA. |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mu7annad0/100gpu

Awesome Lists containing this project

README