https://github.com/mattkretz/vir-simd
improve the usage experience of std::experimental::simd (Parallelism TS 2)
https://github.com/mattkretz/vir-simd
cpp cpp17-library parallelism-ts simd simd-library
Last synced: 12 months ago
JSON representation
improve the usage experience of std::experimental::simd (Parallelism TS 2)
- Host: GitHub
- URL: https://github.com/mattkretz/vir-simd
- Owner: mattkretz
- License: lgpl-3.0
- Created: 2022-09-15T09:04:32.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2025-06-13T13:32:31.000Z (about 1 year ago)
- Last Synced: 2025-06-13T14:44:56.038Z (about 1 year ago)
- Topics: cpp, cpp17-library, parallelism-ts, simd, simd-library
- Language: C++
- Homepage: https://mattkretz.github.io/vir-simd/master/
- Size: 888 KB
- Stars: 28
- Watchers: 6
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# vir::stdx::simd
[](https://conan.io/center/recipes/vir-simd)
[](https://github.com/mattkretz/vir-simd/actions/workflows/GCC.yml)
[](https://github.com/mattkretz/vir-simd/actions/workflows/Clang.yml)
[](https://github.com/mattkretz/vir-simd/actions/workflows/MSVC.yml)
[](https://github.com/mattkretz/vir-simd/actions/workflows/Emscripten.yml)
[](https://doi.org/10.5281/zenodo.7789153)
[](https://bestpractices.coreinfrastructure.org/projects/6916)
[](https://github.com/mattkretz/vir-simd/actions/workflows/reuse.yml)
[](https://fair-software.eu)
This project aims to provide a fallback std::experimental::simd (Parallelism TS 2)
implementation with additional features. Not every user can rely on GCC 11+
and its standard library to be present on all target systems. Therefore, the
header `vir/simd.h` provides a fallback implementation of the TS specification
that only implements the `scalar` and `fixed_size` ABI tags. Thus, your code
can still compile and run correctly, even if it is missing the performance
gains a proper implementation provides.
## Table of Contents
* [Installation](#installation)
* [Usage](#usage)
* [Options](#options)
* [Additional Features](#additional-features)
- [Simple iota `simd` constants](#simple-iota-simd-constants)
- [Making `simd` conversions more
convenient](#making-simd-conversions-more-convenient)
- [Permutations](#permutations-paper)
- [SIMD execution policy](#simd-execution-policy-p0350)
+ [Usable algorithms](#usable-algorithms)
+ [Example](#example)
+ [Execution policy modifiers](#execution-policy-modifiers)
- [Bitwise operators for floating-point
`simd`](#bitwise-operators-for-floating-point-simd)
- [Conversion between `std::bitset` and
`simd_mask`](#conversion-between-stdbitset-and-simd_mask)
- [vir::simd_resize and
vir::simd_size_cast](#virsimd_resize-and-virsimd_size_cast)
- [vir::simd_bit_cast](#virsimd_bit_cast)
- [Concepts](#concepts)
- [simdize type transformation](#simdize-type-transformation)
- [Benchmark support functions](#benchmark-support-functions)
- [`constexpr_wrapper`: function arguments as constant
expressions](#constexpr_wrapper-function-arguments-as-constant-expressions)
+ [Example](#example-1)
- [Testing for the version of the vir::stdx::simd (vir-simd)
library](#testing-for-the-version-of-the-virstdxsimd-vir-simd-library)
+ [Semantics of version numbers](#semantics-of-version-numbers)
- [Debugging](#debugging)
## Installation
This is a header-only library. Installation is a simple copy of the headers to
wherever you want them. Per default `make install` copies the headers into
`/usr/local/include/vir/`.
Examples:
```sh
# installs to $HOME/.local/include/vir
make install prefix=~/.local
# installs to $HOME/src/myproject/3rdparty/vir
make install includedir=~/src/myproject/3rdparty
```
## Usage
```c++
#include
namespace stdx = vir::stdx;
using floatv = stdx::native_simd;
// ...
```
The `vir/simd.h` header will include `` if it is available,
so you don't have to add any buildsystem support. It should just work.
## Options
* `VIR_SIMD_TS_DROPIN`: Define the macro `VIR_SIMD_TS_DROPIN` before including
`` to define everything in the namespace specified in the
Parallelism TS 2 (namely `std::experimental::parallelism_v2`).
* `VIR_DISABLE_STDX_SIMD`: Do not include `` even if it is
available. This allows compiling your code with the ``
implementation unconditionally. This is useful for testing.
## Additional Features
The TS curiously forgot to add `simd_cast` and `static_simd_cast` overloads for
`simd_mask`. With `vir::stdx::(static_)simd_cast`, casts will also work for
`simd_mask`. This does not require any additional includes.
### Simple iota `simd` constants
*Requires Concepts (C++20).*
```c++
#include
constexpr auto a = vir::iota_v> * 3; // 0, 3, 6, 9, ...
```
The variable template `vir::iota_v` can be instantiated with arithmetic
types, array types (`std::array` and C-arrays), and `simd` types. In all cases,
the elements of the variable will be initialized to `0, 1, 2, 3, 4, ...`,
depending on the number of elements in `T`. For arithmetic types
`vir::iota_v` is always just `0`.
### Making `simd` conversions more convenient
*Requires Concepts (C++20).*
The TS is way too strict about conversions, requiring verbose
`std::experimental::static_simd_cast(x)` instead of a concise `T(x)` or
`static_cast(x)`. (`std::simd` in C++26 will fix this.)
`vir::cvt(x)` provides a tool to make `x` implicitly convertible into whatever
the expression wants in order to be well-formed. This only works, if there is
an unambiguous type that is required.
```c++
#include
using floatv = stdx::native_simd;
using intv = stdx::rebind_simd_t;
void f(intv x) {
using vir::cvt;
// the floatv constructor and intv assignment operator clearly determine the
// destination type:
x = cvt(10 * sin(floatv(cvt(x))));
// without vir::cvt, one would have write:
x = stdx::static_simd_cast(10 * sin(stdx::static_simd_cast(x)));
// probably don't do this too often:
auto y = cvt(x); // y is a const-ref to x, but so much more convertible
// y is of type cvt
}
```
Note that `vir::cvt` also works for `simd_mask` and non-`simd` types. Thus,
`cvt` becomes an important building block for writing "`simd`-generic" code
(i.e. well-formed for `T` and `simd`).
### Permutations ([paper](https://wg21.link/P2664))
*Requires Concepts (C++20).*
```c++
#include
// v = {0, 1, 2, 3} -> {1, 0, 3, 2}
vir::simd_permute(v, vir::simd_permutations::swap_neighbors);
// v = {1, 2, 3, 4} -> {2, 2, 2, 2}
vir::simd_permute(v, [](unsigned) { return 1; });
// v = {1, 2, 3, 4} -> {3, 3, 3, 3}
vir::simd_permute(v, [](unsigned) { return -2; });
```
The following permutations are pre-defined:
* `vir::simd_permutations::duplicate_even`: copy values at even indices to
neighboring odd position
* `vir::simd_permutations::duplicate_odd`: copy values at odd indices to
neighboring even position
* `vir::simd_permutations::swap_neighbors`: swap `N` consecutive values with
the following `N` consecutive values
* `vir::simd_permutations::broadcast`: copy the value at index `Idx` to
all other values
* `vir::simd_permutations::broadcast_first`: alias for `broadcast<0>`
* `vir::simd_permutations::broadcast_last`: alias for `broadcast<-1>`
* `vir::simd_permutations::reverse`: reverse the order of all values
* `vir::simd_permutations::rotate`: positive `Offset` rotates values to
the left, negative `Offset` rotates values to the right (i.e.
`rotate` moves values from index `(i + Offset) % size` to `i`)
* `vir::simd_permutations::shift`: positive `Offset` shifts values to
the left, negative `Offset` shifts values to the right; shifting in zeros.
A `vir::simd_permute(x, idx_perm)` overload, where `x` is of *vectorizable*
type, is also included, facilitating generic code.
A special permutation `vir::simd_shift_in(x, ...)` shifts by N elements
shifting in elements from additional `simd` objects passed via the pack.
Example:
```c++
// v = {1, 2, 3, 4}, w = {5, 6, 7, 8} -> {2, 3, 4, 5}
vir::simd_shift_in<1>(v, w);
```
### SIMD execution policy ([P0350](https://wg21.link/P0350))
*Requires Concepts (C++20).*
Adds an execution policy `vir::execution::simd`. The execution policy can be
used with the algorithms implemented in the `vir` namespace. These algorithms
are additionally overloaded in the `std` namespace.
At this point, the implementation of the execution policy requires contiguous
ranges / iterators.
#### Usable algorithms
* `std::for_each` / `vir::for_each`
* `std::count_if` / `vir::count_if`
* `std::transform` / `vir::transform`
* `std::transform_reduce` / `vir::transform_reduce`
* `std::reduce` / `vir::reduce`
#### Example
```c++
#include
void increment_all(std::vector data) {
std::for_each(vir::execution::simd, data.begin(), data.end(),
[](auto& v) {
v += 1.f;
});
}
// or
void increment_all(std::vector data) {
vir::for_each(vir::execution::simd, data,
[](auto& v) {
v += 1.f;
});
}
```
#### Execution policy modifiers
The `vir::execution::simd` execution policy supports a few settings modifying
its behavior:
* `vir::execution::simd.prefer_size()`:
Start with chunking the range into parts of `N` elements, calling the
user-supplied function(s) with objects of type `resize_simd_t>`.
* `vir::execution::simd.unroll_by()`:
Iterate over the range in chunks of `simd::size() * M` instead of just
`simd::size()`. The algorithm will execute `M` loads (or stores) together
before/after calling the user-supplied function(s). The user-supplied
function may be called with `M` `simd` objects instead of one `simd` object.
Note that prologue and epilogue will typically still call the user-supplied
function with a single `simd` object.
Algorithms like `std::count_if` require a return value from the user-supplied
function and therefore still call the function with a single `simd` (to avoid
the need for returning an `array` or `tuple` of `simd_mask`). Such algorithms
will still make use of unrolling inside their implementation.
* `vir::execution::simd.assume_matching_size()`:
Add a precondition to the algorithm, that the given range size is a multiple
of the SIMD width (but not the SIMD width multiplied by the above unroll
factor). This modifier is only valid without prologue (the following two
modifiers). The algorithm consequently does not implement an epilogue and all
given callables are called with a single simd type (same width and ABI tag).
This can reduce code size significantly.
* `vir::execution::simd.prefer_aligned()`:
Unconditionally iterate using smaller chunks, until the main iteration can
load (and store) chunks from/to aligned addresses. This can be more efficient
if the range is large, avoiding cache-line splits. (e.g. with AVX-512,
unaligned iteration leads to cache-line splits on every iteration; with AVX
on every second iteration)
* `vir::execution::simd.auto_prologue()`
(still testing its viability, may be removed):
Determine from run-time information (i.e. add a branch) whether a prologue
for alignment of the main chunked iteration might be more efficient.
### Bitwise operators for floating-point `simd`
```c++
#include
using namespace vir::simd_float_ops;
```
Then the `&`, `|`, and `^` binary operators can be used with objects of type
`simd<`floating-point`, A>`.
### Conversion between `std::bitset` and `simd_mask`
```c++
#include
vir::stdx::simd_mask k;
std::bitset b = vir::to_bitset(k);
vir::stdx::simd_mask k2 = vir::to_simd_mask;
```
There are two overloads of `vir::to_simd_mask`:
```c++
to_simd_mask(bitset>)
```
and
```c++
to_simd_mask(bitset)
```
### vir::simd_resize and vir::simd_size_cast
The header
```c++
#include
```
declares the functions
* `vir::simd_resize(simd)`,
* `vir::simd_resize(simd_mask)`,
* `vir::simd_size_cast(simd)`, and
* `vir::simd_size_cast(simd_mask)`.
These functions can resize a given `simd` or `simd_mask` object. If the return
type requires more elements than the input parameter, the new elements are
default-initialized and appended at the end. Both functions do not allow a
change of the `value_type`. However, implicit conversions can happen on
parameter passing to `simd_size_cast`.
### vir::simd_bit_cast
The header
```c++
#include
```
declares the function `vir::simd_bit_cast(from)`. This function serves the
same purpose as `std::bit_cast` but additionally works in cases where a `simd`
type is not trivially copyable.
### Concepts
*Requires Concepts (C++20).*
The header
```c++
#include
```
defines the following concepts:
* `vir::arithmetic`: What `std::arithmetic` should be: satisfied if `T`
is an arithmetic type (as specified by the C++ core language).
* `vir::vectorizable`: Satisfied if `T` is a valid element type for
`stdx::simd` and `stdx::simd_mask`.
* `vir::simd_abi_tag`: Satisfied if `T` is a valid ABI tag for `stdx::simd`
and `stdx::simd_mask`.
* `vir::any_simd`: Satisfied if `V` is a specialization of `stdx::simd` and the types `T` and `Abi` satisfy `vir::vectorizable` and
`vir::simd_abi_tag`.
* `vir::any_simd_mask`: Analogue to `vir::any_simd` for `stdx::simd_mask`
instead of `stdx::simd`.
* `vir::typed_simd`: Satisfied if `vir::any_simd` and `T` is the
element type of `V`.
* `vir::sized_simd`: Satisfied if `vir::any_simd` and `Width` is
the width of `V`.
* `vir::sized_simd_mask`: Analogue to `vir::sized_simd` for
`stdx::simd_mask` instead of `stdx::simd`.
### simdize type transformation
*Requires Concepts (C++20).*
:warning: consider this interface under :construction:
The header
```c++
#include
```
defines the following types and constants:
* `vir::simdize`: `N` is optional. Type alias for a `simd` or
`vir::simd_tuple` type determined from the type `T`.
- If `vir::vectorizable` is satisfied, then `stdx::simd` is
produced. `Abi` is determined from `N` and will be `simd_abi::native` if
`N` was omitted.
- If `T` is a `std::tuple` or aggregate that can be reflected, then a
specialization of `vir::simd_tuple` is produced. If `T` is a template
specialization (without NTTPs), the metafunction tries vectorization via
applying `simdize` to all template arguments. If this doesn't yield the
same data structure layout as member-only vectorization, then the type
behaves similar to a `std::tuple` with additional API to make the type
similar to `stdx::simd` (see below).
This specialization will be derived from `std::tuple` and the tuple
elements will either be `vir::simd_tuple` or `stdx::simd` types.
`vir::simdize` is applied recursively to the `std::tuple`/aggregate data
members.
- Otherwise, `T` cannot be simdized (e.g. void, no data members,
`std::tuple<>`) then no transformation is applied and `simdize` is an
alias for `T`.
- If `N` was omitted, the resulting width of *all* `simd` types in the
resulting type will match the largest `native_simd` width.
Example: `vir::simdize>` produces a tuple with the
element types `stdx::rebind_simd_t>` and
`stdx::native_simd`.
* `vir::simd_tuple`: Don't use this class
template directly. Let `vir::simdize` instantiate specializations of this
class template. `vir::simd_tuple` mostly behaves like a `std::tuple` and adds
the following interface on top of `std::tuple`:
- `value_type`
- `mask_type`
- `size`
- tuple-like constructors
- broadcast and/or conversion constructors
- load constructor
- `as_tuple()`: Returns the data members as a `std::tuple`.
- `operator[](size_t)`: Copy of a single `T` stored in the `simd_tuple`. This
is not a cheap operation because there are no `T` objects stored in the
`simd_tuple`.
- `copy_from(std::contiguous_iterator)`: :construction: unoptimized load from
a contiguous array of struct (e.g. `std::vector`).
- `copy_to(std::contiguous_iterator)`: :construction: unoptimized store to a
contiguous array of struct.
* `vir::simd_tuple`: TODO
* `vir::get(simd_tuple)`: Access to the `I`-th data member (a `simd`).
* `vir::simdize_size`, `vir::simdize_size_v`
### Benchmark support functions
*Requires Concepts (C++20) and GNU compatible inline-asm.*
The header
```c++
#include
```
defines the following functions:
* `vir::fake_modify(...)`: Let the compiler assume that all arguments passed to
this functions are modified. This inhibits constant propagation, hoisting of
code sections, and dead-code elimination.
* `vir::fake_read(...)`: Let the compiler assume that all arguments passed to
this function are read (in the cheapest manner). This inhibits dead-code
elimination leading up to the results passed to this function.
### `constexpr_wrapper`: function arguments as constant expressions
The header
```c++
#include
```
defines the following tools:
* `vir::constexpr_value` (concept): Satisfied by any type with a static
`::value` member that can be used in a constant expression.
* `vir::constexpr_wrapper` (class template): A type storing the value of
its NTTP (non-type template parameter) and overloading all operators to
return another `constexpr_wrapper`. `constexpr_wrapper` objects are
implicitly convertible to their value type (a `constexpr_wrapper`
automatically unwraps its constant expression).
* `vir::cw` (variable template): Shorthand for producing
`constexpr_wrapper` objects with the given value.
* `vir::literals` (namespace with `_cw` UDL): Shorthand for producing
`constexpr_wrapper` objects of the integer literal in front of the `_cw`
suffix. The type will be deduced automatically from the value of the literal
to be the smallest signed integral type, or if the value is larger, `unsigned
long long`. If the value is too large for an `unsigned long long`, the
program is ill-formed.
`constexpr_wrapper` may appear unrelated to `simd`. However, it is an important
tool used in many places in the implementation and on interfaces of vir-simd
tools. `vir::constexpr_wrapper` is very similar to `std::integral_constant`,
which is used in the `simd` TS interface for generator constructors.
#### Example
```c++
#include
auto f(vir::constexpr_value auto N)
{
std::array x = {};
return x;
}
std::array a = f(vir::cw<4>); // array
using namespace vir::literals;
std::array b = f(10_cw); // array
```
This example cannot work with a signature `constexpr auto f(int n)` (or
`consteval`) because `n` will never be considered a constant expression in the
body of the function.
### Testing for the version of the vir::stdx::simd (vir-simd) library
The header
```c++
#include
```
(which is also included from ``) defines the type and constant
```c++
namespace vir
{
struct simd_version_t { int major, minor, patchlevel; };
constexpr simd_version_t simd_version;
}
```
in addition to the macros `VIR_SIMD_VERSION`, `VIR_SIMD_VERSION_MAJOR`,
`VIR_SIMD_VERSION_MINOR`, and `VIR_SIMD_VERSION_PATCHLEVEL`.
`simd_version_t` implements all comparison operators, allowing e.g.
```c++
static_assert(vir::simd_version >= vir::simd_version_t{0,4,0});
```
#### Semantics of version numbers
* An increment of the major version number implies a breaking change.
* An increment of the minor version number implies new features without
breaking changes.
* An increment of the patchlevel is used for bug fixes.
* Odd patchlevel numbers indicate a development (not released) version.
## Debugging
Compile with `-D _GLIBCXX_DEBUG_UB` to get runtime checks for undefined
behavior in the `simd` implementation(s). Otherwise, `-fsanitize=undefined`
without the macro definition will also find the problems, but without
additional error message.
Preconditions in the vir::stdx::simd implementation and extensions are
controlled via the `-D VIR_CHECK_PRECONDITIONS=N` macro, which defaults to `3`.
Compile-time diagnostics are only possible if the compiler's optimizer can
detect the precondition failure. If you get a bogus compile-time failure, you
need to introduce the necessary assumption into your calling function, which is
typically a missing precondition check in your function.
| **Option** | **at compile-time** | **at run-time** |
|:--------------------------|:-------------------:|:---------------:|
| `-DVIR_CHECK_PRECONDITIONS=0` | warning | invoke UB/unreachable |
| `-DVIR_CHECK_PRECONDITIONS=1` | error | invoke UB/unreachable |
| `-DVIR_CHECK_PRECONDITIONS=2` | warning | trap |
| `-DVIR_CHECK_PRECONDITIONS=3` | error | trap |
| `-DVIR_CHECK_PRECONDITIONS=4` | warning | print error and abort |
| `-DVIR_CHECK_PRECONDITIONS=5` | error | print error and abort |