{"id":21125592,"url":"https://github.com/versi379/optimized-matrix-multiplication","last_synced_at":"2026-05-17T13:37:09.946Z","repository":{"id":263691509,"uuid":"891185715","full_name":"versi379/Optimized-Matrix-Multiplication","owner":"versi379","description":"This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.","archived":false,"fork":false,"pushed_at":"2024-11-19T22:32:24.000Z","size":5,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-21T05:41:42.230Z","etag":null,"topics":["cublas","cuda","cuda-programming","hpc","matrix-multiplication","parallel-computing","parallel-programming"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/versi379.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-19T22:02:24.000Z","updated_at":"2024-11-19T22:32:27.000Z","dependencies_parsed_at":"2024-11-19T23:39:14.852Z","dependency_job_id":null,"html_url":"https://github.com/versi379/Optimized-Matrix-Multiplication","commit_stats":null,"previous_names":["versi379/optimized-matrix-multiplication"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/versi379%2FOptimized-Matrix-Multiplication","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/versi379%2FOptimized-Matrix-Multiplication/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/versi379%2FOptimized-Matrix-Multiplication/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/versi379%2FOptimized-Matrix-Multiplication/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/versi379","download_url":"https://codeload.github.com/versi379/Optimized-Matrix-Multiplication/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243573163,"owners_count":20312879,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cublas","cuda","cuda-programming","hpc","matrix-multiplication","parallel-computing","parallel-programming"],"created_at":"2024-11-20T04:35:17.090Z","updated_at":"2026-05-17T13:37:09.886Z","avatar_url":"https://github.com/versi379.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Optimized-Matrix-Multiplication\n\nThis implementation leverages the NVIDIA CUDA framework and the cuBLAS library to optimize matrix multiplication using the cublasGemmEx function. By utilizing the power of GPU acceleration and advanced features like Tensor Cores (if supported), the computation is significantly faster compared to traditional CPU-based methods.\n\n# Overview\n\nThis application performs basic matrix multiplication: \\( A \\times B = C \\).\n\n- **Matrix dimensions:**\n  - \\( A \\): `rowsA x rank`  \n  - \\( B \\): `rank x colsB`  \n  - \\( C \\): `rowsA x colsB`\n\n- **Array representation:**  \n  Matrices are represented as 2D arrays using single raw pointers, e.g.:\n\n  ```c++\n  float* A = new float[sizeA];\n  ```\n\n- **Accessing elements:**  \n  Elements are accessed using the following pattern:\n\n  ```c++\n  for (size_t i = 0; i \u003c rows; ++i)\n  {\n      for (size_t j = 0; j \u003c cols; ++j)\n      {\n          cout \u003c\u003c A[j * rows + i] \u003c\u003c \" \";\n      }\n      cout \u003c\u003c endl;\n  }\n  ```\n\n- **Data types:**  \n  Matrices \\( A \\) and \\( B \\) can use either 16-bit or 32-bit floats, but the result matrix \\( C \\) is always 32-bit.\n\n- **Performance:**  \n  GPU (device) execution significantly outperforms CPU (host, single-threaded) execution.  \n  Starting with cuBLAS version 11, Tensor Cores are utilized automatically. More details are available in the [NVIDIA cuBLAS documentation](https://docs.nvidia.com/cuda/cublas/#tensor-core-usage).\n\n# Build Instructions\n\n## Linux\n\n1. **Install CUDA toolkit dependencies:**\n\n   ```bash\n   sudo apt install nvidia-cuda-toolkit\n   ```\n\n2. **Build the application:**\n\n   ```bash\n   make all\n   ```\n\n3. **Run the executable:**\n\n   ```bash\n   ./main.out\n   ```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fversi379%2Foptimized-matrix-multiplication","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fversi379%2Foptimized-matrix-multiplication","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fversi379%2Foptimized-matrix-multiplication/lists"}