{"id":18525057,"url":"https://github.com/enp1s0/cumpsgemm","last_synced_at":"2025-04-09T12:31:03.167Z","repository":{"id":55071400,"uuid":"484965200","full_name":"enp1s0/cuMpSGEMM","owner":"enp1s0","description":"Fast SGEMM emulation on Tensor Cores","archived":false,"fork":false,"pushed_at":"2025-02-16T05:35:40.000Z","size":487,"stargazers_count":10,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-04T04:32:39.421Z","etag":null,"topics":["cuda","fp32","gemm","gpu","half-precision","mixed-precision","tensorcore","tensorcores"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2303.08989","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/enp1s0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-24T08:09:27.000Z","updated_at":"2025-03-11T08:44:34.000Z","dependencies_parsed_at":"2024-08-19T17:15:13.394Z","dependency_job_id":null,"html_url":"https://github.com/enp1s0/cuMpSGEMM","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/enp1s0%2FcuMpSGEMM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/enp1s0%2FcuMpSGEMM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/enp1s0%2FcuMpSGEMM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/enp1s0%2FcuMpSGEMM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/enp1s0","download_url":"https://codeload.github.com/enp1s0/cuMpSGEMM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248040143,"owners_count":21037821,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","fp32","gemm","gpu","half-precision","mixed-precision","tensorcore","tensorcores"],"created_at":"2024-11-06T17:44:19.008Z","updated_at":"2025-04-09T12:31:03.161Z","avatar_url":"https://github.com/enp1s0.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# cuMpSGEMM - CUDA Mutable-precision SGEMM\n\nA library for executing SGEMM emulation on Tensor Cores intercepting the cuBLAS function calls for A100 GPU.\n\n## Supported functions\n- `cublasSgemm`\n- `cublasCgemm`\n- `cublasGemmEx` (Only for single precision)\n\n## Throughput\n\u003cimg alt='cumpsgemm throughput' src='./docs/sgemm-throughput.svg'\u003e\n\n## Build\n\n```\ngit clone https://github.com/enp1s0/cuMpSGEMM.git --recursive\ncd cuMpSGEMM\nmkdir build\ncd build\ncmake ..\n# It may take ~15 min\nmake -j4\n```\n\n## Usage\n\n### 1. Hijack cuBLAS library\nBefore executing a target application, set an environmental variable as follows.\n```bash\nexport LD_PRELOAD=/path/to/cumpsgemm/build/libcumpsgemm.so:$LD_PRELOAD\n```\n\n### 2. Control SGEMM computing mode\nBy the default rule, the SGEMM computing mode can be changed via an environmental variable as follows:\n\n```bash\nexport CUMPSGEMM_COMPUTE_MODE=FP16TCEC\n```\n\n#### Performance modes\n| mode name            | Tensor Core Type               | Error Correction |\n|:---------------------|:-------------------------------|:-----------------|\n|`FP16TCEC`            | FP16                           | Yes              |\n|`TF32TCEC`            | TF32                           | Yes              |\n|`CUBLAS_SIMT`         | (FP32 SIMT Core)               | No               |\n|`CUBLAS_FP16TC`       | FP16                           | No               |\n|`CUBLAS_TF32TC`       | TF32                           | No               |\n|`FP16TCEC_SCALING`    | FP16                           | Yes              |\n\n#### Debugging modes\n| mode name            | Tensor Core Type               | Error Correction |\n|:---------------------|:-------------------------------|:-----------------|\n|`FP16TC`              | FP16                           | No               |\n|`TF32TC`              | TF32                           | No               |\n|`CUBLAS`              | Depends on the cublas math mode| No               |\n|`AUTO`                | AUTO                           | Yes              |\n|`DRY_RUN`             | Nothing is computed            | No               |\n\n#### Custom rule\nBy defining a custom `cuMpSGEMM_get_compute_mode` function and including it in a shared library named `libcumpsgemm_rule.so`, the SGEMM mode can be changed as you want.\nThe default function definition is in [default_cumpsgemm_rule.cu](src/default_cumpsgemm_rule.cu).\nBefore executing a target application, set an environmental variable as follows.\n```bash\nexport LD_LIBRARY_PATH=/path/to/libcumpsgemm_rule.so/dir:$LD_LIBRARY_PATH\n```\n\n## How this library works\n\n![cuMpSGEMM flow](./docs/cumpsgemm.svg)\n\nWhen a supported cuBLAS function (e.g. `cublasSgemm`) is called, a function selector inside this library calls `cuMpSGEMM_get_compute_mode` function (1) to determine the backend SGEMM function (2).\nThen it calls an appropriate function (3).\n\n## Important note\nTo hijack the cuBLAS static library, the same name library is created.\nIn this process, the build script decomposes the cuBLAS static library and composes the TCEC SGEMM and decomposed modules except sgemm.o etc.\nThis is not the reverse engineering, decompiling or disassembling that is prohibited by [NVIDIA EULA](https://docs.nvidia.com/cuda/eula/index.html).\n\n## Test\n```\nUsage : ./build/cumpsgemm_test sgemm [exp2|seq] [min_N] [max_N] [interval]\n      : ./build/cumpsgemm_test cgemm [exp2|seq] [min_N] [max_N] [interval]\n      : ./build/cumpsgemm_test sgemm_strided_batch [exp2|seq] [min_N] [max_N] [interval] [batch_count]\n      : ./build/cumpsgemm_test cgemm_strided_batch [exp2|seq] [min_N] [max_N] [interval] [batch_count]\n      : ./build/cumpsgemm_test cublas_sgemm [exp2|seq] [min_N] [max_N] [interval]\n      : ./build/cumpsgemm_test cublas_cgemm [exp2|seq] [min_N] [max_N] [interval]\n      : ./build/cumpsgemm_test cublas_sgemm_strided_batch [exp2|seq] [min_N] [max_N] [interval] [batch_count]\n      : ./build/cumpsgemm_test cublas_cgemm_strided_batch [exp2|seq] [min_N] [max_N] [interval] [batch_count]\n      : ./build/cumpsgemm_test log [/path/to/log]\n```\n\n## Controlling environmental variables\n```bash\n# Select a GEMM implementation executing (See the table above)\nexport CUMPSGEMM_COMPUTE_MODE=FP16TCEC\n\n# Output debug information (default: 0)\nexport CUMPSGEMM_INFO=1\n\n# Output error message (default: 1)\nexport CUMPSGEMM_ERROR_LOG=0\n\n# Enable custom gemm_Mx2x2 (https://github.com/enp1s0/cuGEMM-Mx2x2)\nexport CUMPSGEMM_CUSTOM_GEMM_MX2X2=1\n```\n\n### CULiP integration\nTo output [CULiP](https://github.com/enp1s0/CULiP) logs, specify a following environmental variable.\n```bash\nexport CUMPSGEMM_ENABLE_CULIP_PROFILING=1\n```\n\n## Citation\n```bibtex\n@InProceedings{10.1007/978-3-031-32041-5_14,\n\tauthor=\"Ootomo, Hiroyuki\n\tand Manabe, Hidetaka\n\tand Harada, Kenji\n\tand Yokota, Rio\",\n\ttitle=\"Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection\",\n\tbooktitle=\"High Performance Computing\",\n\tyear=\"2023\",\n\tpublisher=\"Springer Nature Switzerland\",\n\taddress=\"Cham\",\n\tpages=\"259--276\",\n\tisbn=\"978-3-031-32041-5\"\n}\n```\n\n## License\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fenp1s0%2Fcumpsgemm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fenp1s0%2Fcumpsgemm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fenp1s0%2Fcumpsgemm/lists"}