https://github.com/ericlbuehler/simd_matmul
O(n^2) matmul with SIMD.
https://github.com/ericlbuehler/simd_matmul
Last synced: 12 months ago
JSON representation
O(n^2) matmul with SIMD.
- Host: GitHub
- URL: https://github.com/ericlbuehler/simd_matmul
- Owner: EricLBuehler
- License: mit
- Created: 2024-02-16T11:26:08.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-02-20T00:48:16.000Z (over 2 years ago)
- Last Synced: 2025-02-08T10:32:05.300Z (over 1 year ago)
- Language: C++
- Homepage:
- Size: 102 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SIMD Matmul
This is an optimization of the naive matmul algorithm which uses SIMD.
## Asymptotic complexity analysis
It is $O(n^2)$. For $A = (m,n)$, $B = (n,p)$ and $A*B = C = (m,p)$, the exact running time for SIMD matmul including addition and multiplication operations is $mp(1+n-1) = mpn$ while the running time for Naive matmul is $mp(2n-1) = 2mpn-mp$. Therefore, the precise ratio is $$\frac{n}{2n-1}$$
However, when calculating the Big-O complexity we ignore addition operations and as such the running time is $mp$ or $O(n^2)$.
## Mathematical Formulation
For $A = (m,n)$, $B = (n,p)$, I calculate $B^T$. This results in the inputs to the algorithm, $A = (m,n)$, $B = (p,n)$. I note that the transpose algorithm is also $O(n^2)$.
The output $C' = (m,p)$ and is equivalent to $C = A*B$.
## Advantages
- Far lower theoretical Big-O
- Header-only library
## Disadvantages
- $n$ is constrained by SIMD lanes
- Requires transpose or store matrices in transposed form
- Requires conversion of matrix rows to SIMD vectors