https://github.com/ivan-pi/streaming_kernels
Bakeoff streaming kernels
https://github.com/ivan-pi/streaming_kernels
Last synced: 4 months ago
JSON representation
Bakeoff streaming kernels
- Host: GitHub
- URL: https://github.com/ivan-pi/streaming_kernels
- Owner: ivan-pi
- Created: 2025-07-04T19:58:03.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-07-16T22:59:59.000Z (6 months ago)
- Last Synced: 2025-08-30T18:57:09.697Z (4 months ago)
- Language: Fortran
- Size: 46.9 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Streaming Kernels
## Description
This repository provides streaming benchmarks designed to measure memory bandwidth and implementation efficiency.
The benchmarks included are:
- **BS1:** Vector Copy
- **BS2:** Vector AXPBY
- **BS3:** Vector Norm
- **BS4:** Vector Inner Product
- **BS5:** Fused CG Update
These operations are commonly found in iterative solvers for linear systems, such as the conjugate-gradient method (CG). Benchmarks BS1–BS2 are similar to the classic STREAM benchmarks (copy & triad).
The benchmarks presented here follow the work
> Chalmers, N., & Warburton, T. (2020). Portable high-order finite element kernels I: Streaming Operations. https://arxiv.org/abs/2009.10917
The code accompanying the Chalmers & Warburton article can be found in the [StreamParanumal](https://github.com/paranumal/streamparanumal) repository.
## How to Build and Run the Benchmark
Currently, a simple Makefile setup is in place.
The default configuration assumes gfortran is available:
```sh
make
```
The default executable target is named `./streaming_kernels`.
See the Makefile for other available targets.
The executable has the following options available:
```txt
Usage: ./streaming_kernels [OPTIONS]
Options:
-d, --device Device number
-n Number of elements (mutually exclusive with --range, --log-range)
--float Enable FP32 stream test
-r, --range a:b[:step] Linear range (e.g. 10:100:5)
--log-range a:b:n Logarithmic range with n points (e.g. 1:1000:4)
-t, --tests Comma-separated list of tests in quotes (e.g. "BS1,BS3")
-k, --repeat Number of repetitions
--stats Show statistics
--help Show this help message
--version Show program version
```
(Note: The options are still subject of change.)
### Optional dependencies
- BLAS
* [OpenBLAS](https://github.com/OpenMathLib/OpenBLAS)
* [Arm Performance Libraries](https://developer.arm.com/documentation/101004/latest/)
* [Accelerate BLAS](https://developer.apple.com/documentation/accelerate/blas-library) (Apple-only)
- [BLIS](https://github.com/flame/blis)
- [Eigen++](https://eigen.tuxfamily.org/index.php?title=Main_Page)
Benchmarks 1-5 can also be formulated in terms of BLAS Level 1 operations.
By linking against different BLAS libraries, we can judge the implementation quality and
how they interact with OpenMP.
---
The remainder of this file is currently a "link dump".
## Related
- [STREAM](https://www.cs.virginia.edu/stream/)
- [BabelStream](https://github.com/UoB-HPC/BabelStream)
- [TheBandwidthBenchmark](https://github.com/RRZE-HPC/TheBandwidthBenchmark)
* Fortran version: https://github.com/RRZE-HPC/TheBandwidthBenchmark-F90
- [RaiderSTREAM](https://github.com/michael-beebe/RaiderSTREAM)
- [Livermore Loops](https://www.netlib.org/benchmark/livermorec)
Other BW-related benchmarks:
- [TeaLeaf](https://github.com/UK-MAC/TeaLeaf)
- [Himeno](https://i.riken.jp/en/supercom/documents/himenobmt/)
- https://github.com/paranumal/libparanumal
Vendor benchmarks:
- AMD:
* https://www.amd.com/en/developer/zen-software-studio/applications/spack/stream-benchmark.html
- Nvidia:
* https://docs.nvidia.com/nvidia-hpc-benchmarks/STREAM_Benchmark.html
* https://nvidia.github.io/grace-cpu-benchmarking-guide/foundations/STREAM/index.html
- Intel:
* https://github.com/intel/memory-bandwidth-benchmarks
* https://dgpu-docs.intel.com/solutions/max-sw/hpc/BabelSTREAM.html
## Literature
- https://arxiv.org/abs/2009.10917
- https://doi.org/10.1109/PMBS56514.2022.00013
- https://arxiv.org/pdf/2309.05445
- https://blogs.fau.de/hager/archives/8263
- https://www.cs.virginia.edu/~mccalpin/papers/balance/
- https://doi.org/10.1155/1996/208679
Related to array syntax in BS5:
- https://link.springer.com/chapter/10.1007/3-540-46423-9_15
- https://doi.org/10.1145/1186632.1186637
- https://doi.org/10.1023/B:SUPE.0000049323.47732.02