Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/enp1s0/ozIMMU
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
https://github.com/enp1s0/ozIMMU
cuda gemm mixed-precision tensorcore tensorcores
Last synced: 4 days ago
JSON representation
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
- Host: GitHub
- URL: https://github.com/enp1s0/ozIMMU
- Owner: enp1s0
- License: mit
- Created: 2023-06-14T01:37:44.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-07T03:08:21.000Z (2 months ago)
- Last Synced: 2024-09-07T04:58:59.541Z (2 months ago)
- Topics: cuda, gemm, mixed-precision, tensorcore, tensorcores
- Language: Cuda
- Homepage: https://arxiv.org/abs/2306.11975
- Size: 176 KB
- Stars: 44
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-gemm - DGEMM on Int8 Tensor Core
- awesome-cuda-and-hpc - enp1s0/ozIMMU
- awesome-cuda-and-hpc - enp1s0/ozIMMU
README
# ozIMMU - DGEMM on Int8 Tensor Core
This library intercepts function calls for cuBLAS DGEMM functions and executes ozIMMU instead
## Build
```bash
git clone https://github.com/enp1s0/ozIMMU --recursive
cd ozIMMU
mkdir build
cd build
cmake ..
make -j4
```## Usage
1. Set an environmental variable to hijack the function calls
```bash
export LD_PRELOAD=/path/to/ozIMMU/build/libozimmu.so
```2. Set an environmental variable to choose the compute mode
```bash
export OZIMMU_COMPUTE_MODE=fp64_int8_9
```
The supported compute modes are [here](#supported-compute-mode).3. Execute the application
### Supported compute mode
| Mode | Tensor Core type | Num splits | |
|:--------------|:-----------------|:-----------|:------------------------|
|dgemm | -- | -- | Disable hijacking |
|sgemm | -- | -- | Use SGEMM internally |
|fp64_int8_3 | Int8 TC | 3 | |
|fp64_int8_4 | Int8 TC | 4 | |
|fp64_int8_5 | Int8 TC | 5 | |
|fp64_int8_6 | Int8 TC | 6 | |
|fp64_int8_7 | Int8 TC | 7 | |
|fp64_int8_8 | Int8 TC | 8 | |
|fp64_int8_9 | Int8 TC | 9 | |
|fp64_int8_10 | Int8 TC | 10 | |
|fp64_int8_11 | Int8 TC | 11 | |
|fp64_int8_12 | Int8 TC | 12 | |
|fp64_int8_13 | Int8 TC | 13 | |
|fp64_int8_14 | Int8 TC | 14 | |
|fp64_int8_15 | Int8 TC | 15 | |
|fp64_int8_16 | Int8 TC | 16 | |
|fp64_int8_17 | Int8 TC | 17 | |
|fp64_int8_18 | Int8 TC | 18 | |
|fp64_int8_auto | Int8 TC | AUTO | fp64_int8_3..18 / dgemm |### Optional environmental variables
```bash
# Show info log
export OZIMMU_INFO=1# Show error and warning log
export OZIMMU_ERROR=1# Show CULiP ( https://github.com/enp1s0/CULiP ) log
export OZIMMU_ENABLE_CULIP_PROFILING=1# Choose malloc mode
export OZIMMU_MALLOC_ASYNC=1# Set AUTO mode mantissa loss threshold
export OZIMMU_AUTO_AVG_MANTISSA_LOSS_THRESHOLD=1.5# Set ozIMMU intercept threshold.
export OZIMMU_INTERCEPT_THRESHOLD_M=128
export OZIMMU_INTERCEPT_THRESHOLD_N=128
export OZIMMU_INTERCEPT_THRESHOLD_K=128
# The ozIMMU gemm function is executed if `m`, `n`, and `k` are larger or equal to `OZIMMU_INTERCEPT_THRESHOLD_M`, `N`, and `K`.
# Otherwise, the original cuBLAS function is executed.
```## Citation
```bibtex
@article{ootomo2024dgemm,
author = {Hiroyuki Ootomo and Katsuhisa Ozaki and Rio Yokota},
title = {DGEMM on integer matrix multiplication unit},
journal = {The International Journal of High Performance Computing Applications},
year = {2024},
doi = {10.1177/10943420241239588},
URL = {https://doi.org/10.1177/10943420241239588}
}
```## License
MIT