https://github.com/stckvrflw/pem-spgemm

pemSpGEMM - An Improved SpGEMM Algorithm
https://github.com/stckvrflw/pem-spgemm

cpp cuda

Last synced: 3 months ago
JSON representation

pemSpGEMM - An Improved SpGEMM Algorithm

Host: GitHub
URL: https://github.com/stckvrflw/pem-spgemm
Owner: stckvrflw
License: mit
Created: 2025-02-14T04:27:23.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-07-17T15:48:55.000Z (about 1 year ago)
Last Synced: 2025-07-17T18:04:10.044Z (about 1 year ago)
Topics: cpp, cuda
Language: C
Homepage:
Size: 1.56 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # pem-spgemm

#### **BETA**

Final Assignment Project - SpGEMM algorithm in CUDA  

By Petrus E. Manurung  

2025

An Improved Sparse General Matrix-Matrix Multiplication (SpGEMM) algorithm.  

Improving upon TileSpGEMM by eliminating atomics and better cache utilization on step 2 and step 3.  

Another improvement includes native GPU implementation of conversion from .mtx to Tiled CSR intermediate format. 

Libraries used:

* [thrust][thrust]

* [rmm][rapidsrmm]

* [fast_matrix_market][fmm]

* [nsparse][nsparse]

Other resources:

* Sparse matrices from [suiteSparse][suitesparse]

Reference:

1. [TileSpGEMM -- **Niu et al.**](https://doi.org/10.1145/3503221.3508431)

Environment:

* CPU       : 11th Gen Intel(R) Core(TM) i7-11800H

* GPU       : NVIDIA Corporation GA104M [GeForce RTX 3080 Mobile / Max-Q 8GB/16GB]

* OS        : Gentoo Linux

* Kernel    : 6.13.8-zen1

* CUDA      : 12.8

* driver    : 570.144

* gcc       : 14.2.1 20241221

How to compile:  

1. clone this repository

2. get rapidsrmm v24.12.00 from [rapidsrmm] and extract to pem-spgemm (cloned repo)

3. get fastmatrixmarket v1.7.6 from [fmm] and extract to pem-spgemm (cloned repo)

4. run "make"

How to use:

* A^2   : ./pemspgemm "path-to-.mtx-file" [0/1] 

* A*At  : ./pemspgemm "path-to-.mtx-file" [0/1] 1  

*** 0 to skip saving result (in COO) to file, 1 to save to /tmp  

*** since /tmp is in RAM, make sure there is enough space.  

(e.g. result from A^2 of webbase-1M can cost more than 1.5GiB)  

*** no quote on path to mtx-file  

To reproduce: GPU with sm_86  

if using different GPU, change the "code" part in NVCC_FLAGS in the Makefile.  

Keep "compute_61" unchanged.

Benchmark result is saved in 'pemspgemm_benchmark_result.csv' file  

header for the csv:  

matrix,flop,C_nnz,compression_ratio,A_conversion_kernel_time,B_conversion_kernel_time,total_conversion_overhead_time,step1_time,step2_time,step3_time,pem_spgemm_time,pem_spgemm_kernel_time,pem_spgemm_malloc_time,Gflops

[ansorge]: https://github.com/RichardAns/CUDA-Programs

[thrust]: https://developer.nvidia.com/thrust

[rapidsrmm]: https://github.com/rapidsai/rmm

[cusparse]: https://developer.nvidia.com/cusparse

[fmm]: https://github.com/alugowski/fast_matrix_market

[suitesparse]: https://sparse.tamu.edu

[nsparse]: https://github.com/EBD-CREST/nsparse

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/stckvrflw/pem-spgemm

Awesome Lists containing this project

README