https://github.com/dssgabriel/arm-sve-benchmarks
Performance comparison between small hand-written SVE kernels and compiler-generated ones.
https://github.com/dssgabriel/arm-sve-benchmarks
arm64 assembly benchmarks compiler simd sve
Last synced: about 2 months ago
JSON representation
Performance comparison between small hand-written SVE kernels and compiler-generated ones.
- Host: GitHub
- URL: https://github.com/dssgabriel/arm-sve-benchmarks
- Owner: dssgabriel
- License: mit
- Created: 2022-05-24T19:01:07.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-06-17T18:19:51.000Z (about 4 years ago)
- Last Synced: 2025-07-07T18:52:43.734Z (12 months ago)
- Topics: arm64, assembly, benchmarks, compiler, simd, sve
- Language: C
- Homepage:
- Size: 52.7 KB
- Stars: 10
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ARM SVE Benchmarks
This repository gathers some small kernels to benchmark the performance of hand-written SVE code compared to compiler-generated one.
Implemented kernels are:
- Initialization (store);
- Copy (load, store);
- Reduction (load, add);
- Dot product (load, load, mul, add);
- DAXPY (load, load, load, mul, add, store);
- Vector sum (load, load, load, add, store);
- Vector scale (load, load, load, mul, store).
## Usage
*Note:* the provided Makefile uses the `armclang` compiler, however both `clang` and `gcc` have been tested and can be used as well.
Keep in mind that the architecture specific flags (`AFLAGS`) might need to be changed depending on the chosen compiler.
See the [comparison between compiler flags across architectures](https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu) for more information.
To build the benchmarks:
```
make build
```
You can then execute one of the benchmarks presented above and specify the vectors' size (in bytes), number of iterations and error tolerance through the provided option flags.
Example (reduction benchmark with 64KiB vectors, 100k iterations and an error tolerance of $10^{-14}$:
```
target/arm_bench -k reduc -s 8192 -i 100000 -e 1e-14
```