Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bruce-lee-ly/matrix_multiply
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
https://github.com/bruce-lee-ly/matrix_multiply
coppersmith-winograd cpp11 cpu cublas cuda kahan matrix-multiply naive nvidia reordering shared-memory strassen tiling
Last synced: 3 months ago
JSON representation
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
- Host: GitHub
- URL: https://github.com/bruce-lee-ly/matrix_multiply
- Owner: Bruce-Lee-LY
- License: mit
- Created: 2022-06-02T11:17:19.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-02-08T15:23:45.000Z (almost 2 years ago)
- Last Synced: 2024-08-04T02:06:36.118Z (6 months ago)
- Topics: coppersmith-winograd, cpp11, cpu, cublas, cuda, kahan, matrix-multiply, naive, nvidia, reordering, shared-memory, strassen, tiling
- Language: C++
- Homepage:
- Size: 12.7 KB
- Stars: 13
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Matrix Multiply
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA. The performance benefits of each optimization method were simply tested.## CPU
- naive
- reordering
- tiling
- strassen
- coppersmith-winograd## Nvidia GPU
- cublas
- naive
- kahan
- shared_memory# Compile
## Environment
- OS: Linux
- Cmake Version: >= 3.8
- GCC Version: >= 4.8
- CUDA Version: 11.4 (best)
- CUDA Driver Version: 470.129.06 (best)## Clone
```
git clone https://github.com/Bruce-Lee-LY/matrix_multiply.git
```## Build
```
cd matrix_multiply
./build.sh -t Release -b OFF
./build.sh -t Debug -b ON
```# Run Sample
```
./run_sample.sh
```# Performance
- OS: Ubuntu 20.04.4
- CPU: i5-9400F
- GPU: NVIDIA GeForce GTX 1080 Ti
- CUDA Version: 11.4
- CUDA Driver Version: 470.129.06
- Matrix (float): A (512 * 512) * B (512 * 512) = C (512 * 512)## CPU
|Method|Cost / ms|
|:-:|:-:|
|naive|1238.647|
|reordering|984.445|
|tiling|1000.095|
|strassen|57429.407|
|coppersmith-winograd|77668.238|## Nvidia GPU
|Method|Cost / ms|
|:-:|:-:|
|cublas|0.100|
|naive|0.613|
|kahan|0.616|
|shared_memory|0.153|