Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/enp1s0/shgemm
Fast multiplication of single-precision and half-precision matrices on Tensor Cores
https://github.com/enp1s0/shgemm
cuda
Last synced: about 1 month ago
JSON representation
Fast multiplication of single-precision and half-precision matrices on Tensor Cores
- Host: GitHub
- URL: https://github.com/enp1s0/shgemm
- Owner: enp1s0
- Created: 2022-01-17T11:54:08.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-04-13T10:53:37.000Z (almost 2 years ago)
- Last Synced: 2024-12-20T00:53:12.925Z (about 2 months ago)
- Topics: cuda
- Language: Cuda
- Homepage: https://arxiv.org/abs/2304.04612
- Size: 187 KB
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SHGEMM - Single and Half precision GEMM on Tensor Cores
## Build
```
git clone https://github.com/enp1s0/shgemm
cd shgemm
git submodule update --init --recursivemkdir build
cd build
cmake ..
make -j4
```## Usage
```cuda
// sample.cu
// nvcc sample.cu ... -lshgemm ...
#includemtk::shgemm::shgemmHandle_t shgemm_handle;
mtk::shgemm::create(shgemm_handle);// Optional
mtk::shgemm::set_cuda_stream(shgemm_handle, cuda_stream);const auto compute_type = mtk::shgemm::tf32;
// SHGEMM (A=float, B=half)
mtk::shgemm::shgemm(
shgemm_handle,
mtk::shgemm::op_n, mtk::shgemm::op_n,
m, n, k,
&alpha_fp32,
a_fp32_ptr, lda,
b_fp16_ptr, ldb,
&beta_fp32,
c_fp32_ptr, ldc,
compute_type
);// HSGEMM (A=half, B=float) is also available
mtk::shgemm::hsgemm(
shgemm_handle,
mtk::shgemm::op_n, mtk::shgemm::op_n,
m, n, k,
&alpha_fp32,
a_fp16_ptr, lda,
b_fp32_ptr, ldb,
&beta_fp32,
c_fp32_ptr, ldc,
compute_type
);mtk::shgemm::destroy(shgemm_handle);
```## Test
Before building the library, please change `BUILD_SHGEMM_TEST` in CMakeLists.txt to `ON` and execute the building commonds again.
```
./build/shgemm.test
```## Publication
```bibtex
@inproceedings{ootomo_shgemm_2023,
author = {Ootomo, Hiroyuki and Yokota, Rio},
title = {Mixed-Precision Random Projection for RandNLA on Tensor Cores},
year = {2023},
series = {PASC '23}
}
```## License
MIT