{"id":18421961,"url":"https://github.com/spcl/smat","last_synced_at":"2026-03-10T20:36:50.456Z","repository":{"id":260889882,"uuid":"845462074","full_name":"spcl/smat","owner":"spcl","description":"Code for High Performance Unstructured SpMM Computation Using Tensor Cores","archived":false,"fork":false,"pushed_at":"2024-11-03T11:06:32.000Z","size":62746,"stargazers_count":18,"open_issues_count":1,"forks_count":3,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-07T14:39:24.373Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spcl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-21T09:53:14.000Z","updated_at":"2025-03-28T04:12:23.000Z","dependencies_parsed_at":"2024-11-03T12:17:55.461Z","dependency_job_id":"3c8ce1ec-2f62-4b4c-89b9-0d245552a803","html_url":"https://github.com/spcl/smat","commit_stats":null,"previous_names":["spcl/smat"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/spcl/smat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fsmat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fsmat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fsmat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fsmat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spcl","download_url":"https://codeload.github.com/spcl/smat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fsmat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30352942,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T15:55:29.454Z","status":"ssl_error","status_checked_at":"2026-03-10T15:54:58.440Z","response_time":106,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T04:27:27.991Z","updated_at":"2026-03-10T20:36:50.434Z","avatar_url":"https://github.com/spcl.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SMaT: (S)parse (Ma)trix Matrix (T)ensor Core-accelerated library ([PDF](https://arxiv.org/pdf/2408.11551))\n\n## Abstract\n\nHigh-performance sparse matrix–matrix (SpMM)\nmultiplication is paramount for science and industry, as the ever-\nincreasing sizes of data prohibit using dense data structures.\nYet, existing hardware, such as Tensor Cores (TC), is ill-suited for SpMM, as it imposes strict constraints on data structures\nthat cannot be met by unstructured sparsity found in many\napplications. To address this, we introduce (S)parse (Ma)trix\nMatrix (T)ensor Core-accelerated (SMaT): a novel SpMM library\nthat utilizes TCs for unstructured sparse matrices. Our block-\nsparse library leverages the low-level CUDA MMA (matrix-\nmatrix-accumulate) API, maximizing the performance offered by\nmodern GPUs. Algorithmic optimizations such as sparse matrix\npermutation, further improve performance by minimizing the\nnumber of non-zero blocks. The evaluation on NVIDIA A100\nshows that SMaT outperforms SotA libraries (DASP, cuSPARSE,\nand Magicube) by up to 125x (on average 2.6x). SMaT can be\nused to accelerate many workloads in scientific computing, large\nmodel training, inference, and others.\n\n## Requirements\n\n### Hardware \nWe run our experiments on the Swiss National Computing Center’s Ault compute cluster. Each node\nis equipped with a single NVIDIA A100-SXM4-40GB GPU,\nand AMD EPYC 7742 @ 2.25GHz CPU. The A100 driver\nversion is 530.30.02.\n\n### Software \nAll experiments were executed using the GCC\n12.3.0 compiler, NVIDIA nvcc v12.0, NVIDIA cuSPARSE\nv12.0, NVIDIA CUDA Toolkit v12.0, Python 3.9, and the\nfollowing Python libraries: Pandas, Matplotlib, Numpy, Scipy,\nand Seaborn\n\n\nTo create the environment:\n```bash\nconda env create -f smat_env.yml\nconda activate smat\nsudo apt-get install libgflags-dev\n```\n\n## Datasets\nFor preparing the matrices run the following:\n\n- SuiteSparse Collection:\n```bash\npython download_suitesparse.py\n```\n- Synthetic band matrices:\n```bash\npython generate_matrices.py\n```\n\n\n## Compiling\nIn order to compile the library:\n```bash\ncd src/cuda_hgemm\nsource compile.sh\n```\n\n## Running The Code\nPoint `\u003cpath\u003e` inside `src/run_smat.sh` to input matrix locations, and run:\n```bash\ncd src/\nsource run_smat.sh\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Fsmat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspcl%2Fsmat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Fsmat/lists"}