An open API service indexing awesome lists of open source software.

https://github.com/mu7annad0/100gpu

100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥
https://github.com/mu7annad0/100gpu

cuda gpu

Last synced: 4 months ago
JSON representation

100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥

Awesome Lists containing this project

README

          

# 100 Days of GPU Challenge
This repository is a part of the 100 Days of GPU Challenge, a 100-day long challenge to learn GPU programming.

| Day | Kernel | Description |
| :---: | :------: | :---------------------- |
| 1 | Vector Addition | Implemented a basic element-wise addition kernel using CUDA to add two vectors.
Read the first two chapters from the PMPP Book. |
| 2 | Matrix Addition | Implemented a basic matrix Addition kernel using CUDA to add two matrices. |
| 3 | RGB to Grayscale Conversion | Implemented a RGB to Grayscale Conversion kernel using CUDA.
Read the first 2 sections from the third chapter of the PMPP Book. |
| 4 | Blur a RGB Image | Implemented a Blur rgb image conversion kernel using CUDA.
Read the section 3 from the PMPP Book, and also this [blog](https://michalpitr.substack.com/p/gpu-programming).|
| 5 | Matrix Multiplication | Implemented a Matrix Multiplication kernel using CUDA.
Finished chapter 3 of PMPP Book. |
| 6 | Matrix Transpose | Implemented a Matrix Transpose kernel using CUDA.
Started reading Chapter 4 and gained a comprehensive understanding of the architecture of modern CUDA-capable GPUs, including block scheduling, synchronization, and transparent scalability.|
| 7 | Softmax | Implemnted Softmax Function with CUDA. |
| 8 | ReLU | Implemented a ReLU kernel using CUDA.
Finished Chapter 4. Gained an understanding of warp scheduling, latency tolerance, and control divergence. |
| 9 | Tiled Matrix Multiplication | Implemented Matrix Multiplication kernel using Shared Memory |
| 10 | GeLU | Implemented GeLU Kernel using CUDA.
Finished Chapter 5 and get to know the different types of CUDA memory and how tiling helps reduce memory traffic.|
| 11 | Conv1D | Implemented 1D Convolution with shared memory. |
| 12 | Online Softmax | Implemented Online Softmax. |
| 13 | Softmax (Shared Memory) | Implemented Softmax with shared-memory using CUDA. |