{"id":19486158,"url":"https://github.com/tgautam03/xgemm","last_synced_at":"2025-04-06T20:10:10.842Z","repository":{"id":254786454,"uuid":"841177504","full_name":"tgautam03/xGeMM","owner":"tgautam03","description":"Accelerated General (FP32) Matrix Multiplication from scratch in CUDA","archived":false,"fork":false,"pushed_at":"2025-01-09T21:13:38.000Z","size":6086,"stargazers_count":111,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-30T19:05:53.006Z","etag":null,"topics":["cuda-programming","gpu-programming","matrix-multiplication","sgemm"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tgautam03.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-11T21:36:15.000Z","updated_at":"2025-03-30T07:36:05.000Z","dependencies_parsed_at":"2024-12-08T08:20:18.062Z","dependency_job_id":"abb096b4-73d3-4635-a722-eba4e44c618f","html_url":"https://github.com/tgautam03/xGeMM","commit_stats":null,"previous_names":["tgautam03/xgemm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tgautam03%2FxGeMM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tgautam03%2FxGeMM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tgautam03%2FxGeMM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tgautam03%2FxGeMM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tgautam03","download_url":"https://codeload.github.com/tgautam03/xGeMM/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247543591,"owners_count":20955865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda-programming","gpu-programming","matrix-multiplication","sgemm"],"created_at":"2024-11-10T20:34:52.126Z","updated_at":"2025-04-06T20:10:10.814Z","avatar_url":"https://github.com/tgautam03.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# xGeMM\nAccelerated General (FP32) Matrix Multiplication. Tested on NVIDIA RTX 3090 using Ubuntu 24.04.1 LTS with nvidia-driver-550 and CUDA 12.4.   \n\n**Watch the YouTube video (click the image below)**\n\n[![VideoThumbnail](https://raw.githubusercontent.com/tgautam03/xGeMM/refs/heads/master/Thumbnail.png)](https://youtu.be/GetaI7KhbzM?si=i9sMAfGqO4zyJZhq)\n\n## Dependencies\n- [Eigen 3.4.0](https://gitlab.com/libeigen/eigen/-/releases/3.4.0) (Put it in `lib`)\n\n## Running Benchmarks\n### 1.  Eigen (CPU) matrix multiplication\n\n**Compile**: `make 00a_benchmark_cpu.out`\n\n**Execute**: `./00a_benchmark_cpu.out`\n\n### 2. cuBLAS (GPU) matrix multiplication: \n\n**Compile**: `make 00b_benchmark_cuBLAS.out`\n\n**Execute**: `./00b_benchmark_cuBLAS.out`\n\n### 3. Naive (GPU) matrix multiplication: \n\n**Compile**: `make 01_benchmark_naive.out`\n\n**Execute**: `./01_benchmark_naive.out`\n\n### 4. Coalesced (GPU) matrix multiplication: \n\n**Compile**: `make 02_benchmark_coalesced.out`\n\n**Execute**: `./02_benchmark_coalesced.out`\n\n### 5. Tiled (GPU) matrix multiplication: \n\n**Compile**: `make 03_benchmark_tiled.out`\n\n**Execute**: `./03_benchmark_tiled.out`\n\n### 6. 1D thread coarsening (GPU) matrix multiplication: \n\n**Compile**: `make 04_benchmark_coarse_1d.out`\n\n**Execute**: `./04_benchmark_coarse_1d.out`\n\n### 7. 2D thread coarsening (GPU) matrix multiplication: \n\n**Compile**: `make 05_benchmark_coarse_2d.out`\n\n**Execute**: `./05_benchmark_coarse_2d.out`\n\n### 8. Vectorized Mmemory accesses (GPU) matrix multiplication: \n\n**Compile**: `make 06_benchmark_coarse_2d_vec.out`\n\n**Execute**: `./06_benchmark_coarse_2d_vec.out`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftgautam03%2Fxgemm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftgautam03%2Fxgemm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftgautam03%2Fxgemm/lists"}