Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/williamfgc/simple-gemm
Collection of simple General Matrix Multiplication - GEMM implementations
https://github.com/williamfgc/simple-gemm
Last synced: about 2 months ago
JSON representation
Collection of simple General Matrix Multiplication - GEMM implementations
- Host: GitHub
- URL: https://github.com/williamfgc/simple-gemm
- Owner: williamfgc
- License: mit
- Created: 2022-04-08T15:55:14.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-02-26T12:48:43.000Z (11 months ago)
- Last Synced: 2024-11-05T00:36:00.243Z (2 months ago)
- Language: Julia
- Size: 148 KB
- Stars: 13
- Watchers: 2
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-gemm - simple-gemm
README
# simple-gemm
Collection of simple General Matrix Multiplication - GEMM implementations```
C = a . A x B + C
if a = 1 and C = zeros
C = A x B
```A and B are initialized with random numbers
C is initialized with zerosArguments are always 3 matrix dimensions: `args = [A_rows, A_cols (= B_rows), B_cols]`
*e.g.* 5 5 5 or 10 10 10
CPU multithreading:
- `GemmDenseThreads`: native Julia Threads implementation
```
$ cd GemmDenseThreads
$ julia -t 4 gemm-dense-threads.jl 5 5 5
```- `GemmDenseThreads.py`: native Python Numba Threads implementation
```
$ cd python/GemmDenseThreads
$ NUMBA_NUM_THREADS=4 python3 GemmDenseThreads.py 5 5 5
```- `GemmDenseBlas`: uses `LinearAlgebra.jl` (super-fast), if compiled with `OpenBLAS` set `OPENBLAS_NUM_THREADS`
```
$ cd GemmDenseThreads
$ OPENBLAS_NUM_THREADS=4 julia gemm-dense-blas.jl 5 5 5
```GPU :
- `GemmDenseCUDA` : uses `CUDA.jl` which uses the optimized `cuBLAS` (very fast) on NVIDIA GPUs
```
$ cd GemmDenseCUDA
$ julia gemm-dense-cuda.jl 5 5 5
```
# Citation
If you find the repository useful, please cite the reference [2023 IPDPSW paper](https://doi.org/10.1109/IPDPSW59300.2023.00068):```
@INPROCEEDINGS{10196600,
author={Godoy, William F. and Valero-Lara, Pedro and Dettling, T. Elise and Trefftz, Christian and Jorquera, Ian and Sheehy, Thomas and Miller, Ross G. and Gonzalez-Tallada, Marc and Vetter, Jeffrey S. and Churavy, Valentin},
booktitle={2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
title={Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes},
year={2023},
volume={},
number={},
pages={373-382},
doi={10.1109/IPDPSW59300.2023.00068}}
```