{"id":26779165,"url":"https://github.com/mu7annad0/100gpu","last_synced_at":"2026-03-08T11:36:06.110Z","repository":{"id":282139473,"uuid":"947617051","full_name":"Mu7annad0/100GPU","owner":"Mu7annad0","description":"100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥","archived":false,"fork":false,"pushed_at":"2025-04-05T21:34:46.000Z","size":36,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-05T22:24:39.073Z","etag":null,"topics":["cuda","gpu"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mu7annad0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-13T01:15:37.000Z","updated_at":"2025-04-05T21:34:49.000Z","dependencies_parsed_at":"2025-03-27T22:26:58.234Z","dependency_job_id":"1498660e-1737-4c4c-86f3-ae99e556ef77","html_url":"https://github.com/Mu7annad0/100GPU","commit_stats":null,"previous_names":["mu7annad0/100gpu"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mu7annad0%2F100GPU","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mu7annad0%2F100GPU/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mu7annad0%2F100GPU/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mu7annad0%2F100GPU/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mu7annad0","download_url":"https://codeload.github.com/Mu7annad0/100GPU/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249418625,"owners_count":21268469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","gpu"],"created_at":"2025-03-29T06:15:02.477Z","updated_at":"2026-03-08T11:36:06.070Z","avatar_url":"https://github.com/Mu7annad0.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 100 Days of GPU Challenge\nThis repository is a part of the 100 Days of GPU Challenge, a 100-day long challenge to learn GPU programming.\n\n| Day | Kernel | Description |\n| :---: | :------: | :---------------------- |\n| 1 | Vector Addition | Implemented a basic element-wise addition kernel using CUDA to add two vectors. \u003cbr /\u003e Read the first two chapters from the PMPP Book. |\n| 2 | Matrix Addition | Implemented a basic matrix Addition kernel using CUDA to add two matrices. |\n| 3 | RGB to Grayscale Conversion | Implemented a RGB to Grayscale Conversion kernel using CUDA. \u003cbr /\u003e Read the first 2 sections from the third chapter of the PMPP Book. |\n| 4 | Blur a RGB Image | Implemented a Blur rgb image conversion kernel using CUDA. \u003cbr /\u003e Read the section 3 from the PMPP Book, and also this [blog](https://michalpitr.substack.com/p/gpu-programming).|\n| 5 | Matrix Multiplication | Implemented a Matrix Multiplication kernel using CUDA.\u003cbr /\u003e  Finished chapter 3 of PMPP Book. |\n| 6 | Matrix Transpose | Implemented a Matrix Transpose kernel using CUDA. \u003cbr /\u003e Started reading Chapter 4 and gained a comprehensive understanding of the architecture of modern CUDA-capable GPUs, including block scheduling, synchronization, and transparent scalability.|\n| 7 | Softmax | Implemnted Softmax Function with CUDA. |\n| 8 | ReLU | Implemented a ReLU kernel using CUDA. \u003cbr /\u003e Finished Chapter 4. Gained an understanding of warp scheduling, latency tolerance, and control divergence. |\n| 9 | Tiled Matrix Multiplication | Implemented Matrix Multiplication kernel using Shared Memory |\n| 10 | GeLU | Implemented GeLU Kernel using CUDA. \u003cbr /\u003eFinished Chapter 5 and get to know the different types of CUDA memory and how tiling helps reduce memory traffic.|\n| 11 | Conv1D | Implemented 1D Convolution with shared memory. |\n| 12 | Online Softmax | Implemented Online Softmax. |\n| 13 | Softmax (Shared Memory) | Implemented Softmax with shared-memory using CUDA. |","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmu7annad0%2F100gpu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmu7annad0%2F100gpu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmu7annad0%2F100gpu/lists"}