https://github.com/mattkretz/vir-simd

improve the usage experience of std::experimental::simd (Parallelism TS 2)
https://github.com/mattkretz/vir-simd
cpp cpp17-library parallelism-ts simd simd-library
Last synced: about 1 year ago
JSON representation
improve the usage experience of std::experimental::simd (Parallelism TS 2)
Host: GitHub
URL: https://github.com/mattkretz/vir-simd
Owner: mattkretz
License: lgpl-3.0
Created: 2022-09-15T09:04:32.000Z (almost 4 years ago)
Default Branch: master
Last Pushed: 2025-06-13T13:32:31.000Z (about 1 year ago)
Last Synced: 2025-06-13T14:44:56.038Z (about 1 year ago)
Topics: cpp, cpp17-library, parallelism-ts, simd, simd-library
Language: C++
Homepage: https://mattkretz.github.io/vir-simd/master/
Size: 888 KB
Stars: 28
Watchers: 6
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project

README

          # vir::stdx::simd

[![Conan Center](https://img.shields.io/conan/v/vir-simd)](https://conan.io/center/recipes/vir-simd)

[![GCC](https://github.com/mattkretz/vir-simd/actions/workflows/GCC.yml/badge.svg)](https://github.com/mattkretz/vir-simd/actions/workflows/GCC.yml)

[![Clang](https://github.com/mattkretz/vir-simd/actions/workflows/Clang.yml/badge.svg)](https://github.com/mattkretz/vir-simd/actions/workflows/Clang.yml)

[![MSVC](https://github.com/mattkretz/vir-simd/actions/workflows/MSVC.yml/badge.svg)](https://github.com/mattkretz/vir-simd/actions/workflows/MSVC.yml)

[![Emscripten](https://github.com/mattkretz/vir-simd/actions/workflows/Emscripten.yml/badge.svg)](https://github.com/mattkretz/vir-simd/actions/workflows/Emscripten.yml)

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7789153.svg)](https://doi.org/10.5281/zenodo.7789153)

[![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/6916/badge)](https://bestpractices.coreinfrastructure.org/projects/6916)

[![REUSE status](https://github.com/mattkretz/vir-simd/actions/workflows/reuse.yml/badge.svg)](https://github.com/mattkretz/vir-simd/actions/workflows/reuse.yml)

[![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu)

This project aims to provide a fallback std::experimental::simd (Parallelism TS 2)

implementation with additional features. Not every user can rely on GCC 11+ 

and its standard library to be present on all target systems. Therefore, the 

header `vir/simd.h` provides a fallback implementation of the TS specification 

that only implements the `scalar` and `fixed_size` ABI tags. Thus, your code 

can still compile and run correctly, even if it is missing the performance 

gains a proper implementation provides.

## Table of Contents

* [Installation](#installation)

* [Usage](#usage)

* [Options](#options)

* [Additional Features](#additional-features)

  - [Simple iota `simd` constants](#simple-iota-simd-constants)

  - [Making `simd` conversions more 

    convenient](#making-simd-conversions-more-convenient)

  - [Permutations](#permutations-paper)

  - [SIMD execution policy](#simd-execution-policy-p0350)

    + [Usable algorithms](#usable-algorithms)

    + [Example](#example)

    + [Execution policy modifiers](#execution-policy-modifiers)

  - [Bitwise operators for floating-point 

    `simd`](#bitwise-operators-for-floating-point-simd)

  - [Conversion between `std::bitset` and 

    `simd_mask`](#conversion-between-stdbitset-and-simd_mask)

  - [vir::simd_resize and 

    vir::simd_size_cast](#virsimd_resize-and-virsimd_size_cast)

  - [vir::simd_bit_cast](#virsimd_bit_cast)

  - [Concepts](#concepts)

  - [simdize type transformation](#simdize-type-transformation)

  - [Benchmark support functions](#benchmark-support-functions)

  - [`constexpr_wrapper`: function arguments as constant 

    expressions](#constexpr_wrapper-function-arguments-as-constant-expressions)

    + [Example](#example-1)

  - [Testing for the version of the vir::stdx::simd (vir-simd) 

    library](#testing-for-the-version-of-the-virstdxsimd-vir-simd-library)

    + [Semantics of version numbers](#semantics-of-version-numbers)

  - [Debugging](#debugging)

## Installation

This is a header-only library. Installation is a simple copy of the headers to 

wherever you want them. Per default `make install` copies the headers into 

`/usr/local/include/vir/`.

Examples:

```sh

# installs to $HOME/.local/include/vir

make install prefix=~/.local

# installs to $HOME/src/myproject/3rdparty/vir

make install includedir=~/src/myproject/3rdparty

```

## Usage

```c++

#include 

namespace stdx = vir::stdx;

using floatv = stdx::native_simd;

// ...

```

The `vir/simd.h` header will include `` if it is available, 

so you don't have to add any buildsystem support. It should just work.

## Options

* `VIR_SIMD_TS_DROPIN`: Define the macro `VIR_SIMD_TS_DROPIN` before including 

`` to define everything in the namespace specified in the 

Parallelism TS 2 (namely `std::experimental::parallelism_v2`).

* `VIR_DISABLE_STDX_SIMD`: Do not include `` even if it is 

available. This allows compiling your code with the `` 

implementation unconditionally. This is useful for testing.

## Additional Features

The TS curiously forgot to add `simd_cast` and `static_simd_cast` overloads for 

`simd_mask`. With `vir::stdx::(static_)simd_cast`, casts will also work for 

`simd_mask`. This does not require any additional includes.

### Simple iota `simd` constants

*Requires Concepts (C++20).*

```c++

#include 

constexpr auto a = vir::iota_v> * 3; // 0, 3, 6, 9, ...

```

The variable template `vir::iota_v` can be instantiated with arithmetic 

types, array types (`std::array` and C-arrays), and `simd` types. In all cases, 

the elements of the variable will be initialized to `0, 1, 2, 3, 4, ...`, 

depending on the number of elements in `T`. For arithmetic types 

`vir::iota_v` is always just `0`.

### Making `simd` conversions more convenient

*Requires Concepts (C++20).*

The TS is way too strict about conversions, requiring verbose 

`std::experimental::static_simd_cast(x)` instead of a concise `T(x)` or 

`static_cast(x)`. (`std::simd` in C++26 will fix this.)

`vir::cvt(x)` provides a tool to make `x` implicitly convertible into whatever 

the expression wants in order to be well-formed. This only works, if there is 

an unambiguous type that is required.

```c++

#include 

using floatv = stdx::native_simd;

using intv = stdx::rebind_simd_t;

void f(intv x) {

  using vir::cvt;

  // the floatv constructor and intv assignment operator clearly determine the

  // destination type:

  x = cvt(10 * sin(floatv(cvt(x))));

  // without vir::cvt, one would have write:

  x = stdx::static_simd_cast(10 * sin(stdx::static_simd_cast(x)));

  // probably don't do this too often:

  auto y = cvt(x); // y is a const-ref to x, but so much more convertible

                   // y is of type cvt

}

```

Note that `vir::cvt` also works for `simd_mask` and non-`simd` types. Thus, 

`cvt` becomes an important building block for writing "`simd`-generic" code 

(i.e. well-formed for `T` and `simd`).

### Permutations ([paper](https://wg21.link/P2664))

*Requires Concepts (C++20).*

```c++

#include 

// v = {0, 1, 2, 3} -> {1, 0, 3, 2}

vir::simd_permute(v, vir::simd_permutations::swap_neighbors);

// v = {1, 2, 3, 4} -> {2, 2, 2, 2}

vir::simd_permute(v, [](unsigned) { return 1; });

// v = {1, 2, 3, 4} -> {3, 3, 3, 3}

vir::simd_permute(v, [](unsigned) { return -2; });

```

The following permutations are pre-defined:

* `vir::simd_permutations::duplicate_even`: copy values at even indices to 

  neighboring odd position

* `vir::simd_permutations::duplicate_odd`: copy values at odd indices to 

  neighboring even position

* `vir::simd_permutations::swap_neighbors`: swap `N` consecutive values with 

the following `N` consecutive values

* `vir::simd_permutations::broadcast`: copy the value at index `Idx` to 

all other values

* `vir::simd_permutations::broadcast_first`: alias for `broadcast<0>`

* `vir::simd_permutations::broadcast_last`: alias for `broadcast<-1>`

* `vir::simd_permutations::reverse`: reverse the order of all values

* `vir::simd_permutations::rotate`: positive `Offset` rotates values to 

  the left, negative `Offset` rotates values to the right (i.e. 

  `rotate` moves values from index `(i + Offset) % size` to `i`)

* `vir::simd_permutations::shift`: positive `Offset` shifts values to 

  the left, negative `Offset` shifts values to the right; shifting in zeros.

A `vir::simd_permute(x, idx_perm)` overload, where `x` is of *vectorizable* 

type, is also included, facilitating generic code.

A special permutation `vir::simd_shift_in(x, ...)` shifts by N elements 

shifting in elements from additional `simd` objects passed via the pack. 

Example:

```c++

// v = {1, 2, 3, 4}, w = {5, 6, 7, 8} -> {2, 3, 4, 5}

vir::simd_shift_in<1>(v, w);

```

### SIMD execution policy ([P0350](https://wg21.link/P0350))

*Requires Concepts (C++20).*

Adds an execution policy `vir::execution::simd`. The execution policy can be 

used with the algorithms implemented in the `vir` namespace. These algorithms 

are additionally overloaded in the `std` namespace.

At this point, the implementation of the execution policy requires contiguous 

ranges / iterators.

#### Usable algorithms

* `std::for_each` / `vir::for_each`

* `std::count_if` / `vir::count_if`

* `std::transform` / `vir::transform`

* `std::transform_reduce` / `vir::transform_reduce`

* `std::reduce` / `vir::reduce`

#### Example

```c++

#include 

void increment_all(std::vector data) {

  std::for_each(vir::execution::simd, data.begin(), data.end(),

    [](auto& v) {

      v += 1.f;

    });

}

// or

void increment_all(std::vector data) {

  vir::for_each(vir::execution::simd, data,

    [](auto& v) {

      v += 1.f;

    });

}

```

#### Execution policy modifiers

The `vir::execution::simd` execution policy supports a few settings modifying 

its behavior:

* `vir::execution::simd.prefer_size()`:

  Start with chunking the range into parts of `N` elements, calling the 

  user-supplied function(s) with objects of type `resize_simd_t>`. 

* `vir::execution::simd.unroll_by()`:

  Iterate over the range in chunks of `simd::size() * M` instead of just 

  `simd::size()`. The algorithm will execute `M` loads (or stores) together 

  before/after calling the user-supplied function(s). The user-supplied 

  function may be called with `M` `simd` objects instead of one `simd` object. 

  Note that prologue and epilogue will typically still call the user-supplied 

  function with a single `simd` object.

  Algorithms like `std::count_if` require a return value from the user-supplied 

  function and therefore still call the function with a single `simd` (to avoid 

  the need for returning an `array` or `tuple` of `simd_mask`). Such algorithms 

  will still make use of unrolling inside their implementation.

* `vir::execution::simd.assume_matching_size()`:

  Add a precondition to the algorithm, that the given range size is a multiple 

  of the SIMD width (but not the SIMD width multiplied by the above unroll 

  factor). This modifier is only valid without prologue (the following two 

  modifiers). The algorithm consequently does not implement an epilogue and all 

  given callables are called with a single simd type (same width and ABI tag). 

  This can reduce code size significantly.

* `vir::execution::simd.prefer_aligned()`:

  Unconditionally iterate using smaller chunks, until the main iteration can 

  load (and store) chunks from/to aligned addresses. This can be more efficient 

  if the range is large, avoiding cache-line splits. (e.g. with AVX-512, 

  unaligned iteration leads to cache-line splits on every iteration; with AVX 

  on every second iteration)

* `vir::execution::simd.auto_prologue()`

  (still testing its viability, may be removed):

  Determine from run-time information (i.e. add a branch) whether a prologue 

  for alignment of the main chunked iteration might be more efficient.

### Bitwise operators for floating-point `simd`

```c++

#include 

using namespace vir::simd_float_ops;

```

Then the `&`, `|`, and `^` binary operators can be used with objects of type 

`simd<`floating-point`, A>`.

### Conversion between `std::bitset` and `simd_mask`

```c++

#include 

vir::stdx::simd_mask k;

std::bitset b = vir::to_bitset(k);

vir::stdx::simd_mask k2 = vir::to_simd_mask;

```

There are two overloads of `vir::to_simd_mask`:

```c++

to_simd_mask(bitset>)

```

and

```c++

to_simd_mask(bitset)

```

### vir::simd_resize and vir::simd_size_cast

The header

```c++

#include 

```

declares the functions

* `vir::simd_resize(simd)`,

* `vir::simd_resize(simd_mask)`,

* `vir::simd_size_cast(simd)`, and

* `vir::simd_size_cast(simd_mask)`.

These functions can resize a given `simd` or `simd_mask` object. If the return 

type requires more elements than the input parameter, the new elements are 

default-initialized and appended at the end. Both functions do not allow a 

change of the `value_type`. However, implicit conversions can happen on 

parameter passing to `simd_size_cast`.

### vir::simd_bit_cast

The header

```c++

#include 

```

declares the function `vir::simd_bit_cast(from)`. This function serves the 

same purpose as `std::bit_cast` but additionally works in cases where a `simd` 

type is not trivially copyable.

### Concepts

*Requires Concepts (C++20).*

The header

```c++

#include 

```

defines the following concepts:

* `vir::arithmetic`: What `std::arithmetic` should be: satisfied if `T` 

  is an arithmetic type (as specified by the C++ core language).

* `vir::vectorizable`: Satisfied if `T` is a valid element type for 

  `stdx::simd` and `stdx::simd_mask`.

* `vir::simd_abi_tag`: Satisfied if `T` is a valid ABI tag for `stdx::simd` 

  and `stdx::simd_mask`.

* `vir::any_simd`: Satisfied if `V` is a specialization of `stdx::simd` and the types `T` and `Abi` satisfy `vir::vectorizable` and 

  `vir::simd_abi_tag`.

* `vir::any_simd_mask`: Analogue to `vir::any_simd` for `stdx::simd_mask` 

  instead of `stdx::simd`.

* `vir::typed_simd`: Satisfied if `vir::any_simd` and `T` is the 

  element type of `V`.

* `vir::sized_simd`: Satisfied if `vir::any_simd` and `Width` is 

  the width of `V`.

* `vir::sized_simd_mask`: Analogue to `vir::sized_simd` for 

  `stdx::simd_mask` instead of `stdx::simd`.

### simdize type transformation

*Requires Concepts (C++20).*

:warning: consider this interface under :construction:

The header

```c++

#include 

```

defines the following types and constants:

* `vir::simdize`: `N` is optional. Type alias for a `simd` or 

  `vir::simd_tuple` type determined from the type `T`.

  - If `vir::vectorizable` is satisfied, then `stdx::simd` is 

    produced. `Abi` is determined from `N` and will be `simd_abi::native` if 

    `N` was omitted.

  - If `T` is a `std::tuple` or aggregate that can be reflected, then a 

    specialization of `vir::simd_tuple` is produced. If `T` is a template 

    specialization (without NTTPs), the metafunction tries vectorization via 

    applying `simdize` to all template arguments. If this doesn't yield the 

    same data structure layout as member-only vectorization, then the type 

    behaves similar to a `std::tuple` with additional API to make the type 

    similar to `stdx::simd` (see below).

    This specialization will be derived from `std::tuple` and the tuple 

    elements will either be `vir::simd_tuple` or `stdx::simd` types. 

    `vir::simdize` is applied recursively to the `std::tuple`/aggregate data 

    members.

  - Otherwise, `T` cannot be simdized (e.g. void, no data members, 

    `std::tuple<>`) then no transformation is applied and `simdize` is an 

    alias for `T`.

  - If `N` was omitted, the resulting width of *all* `simd` types in the 

    resulting type will match the largest `native_simd` width.

  Example: `vir::simdize>` produces a tuple with the 

  element types `stdx::rebind_simd_t>` and

  `stdx::native_simd`.

* `vir::simd_tuple`: Don't use this class 

  template directly. Let `vir::simdize` instantiate specializations of this 

  class template. `vir::simd_tuple` mostly behaves like a `std::tuple` and adds 

  the following interface on top of `std::tuple`:

  - `value_type`

  - `mask_type`

  - `size`

  - tuple-like constructors

  - broadcast and/or conversion constructors

  - load constructor

  - `as_tuple()`: Returns the data members as a `std::tuple`.

  - `operator[](size_t)`: Copy of a single `T` stored in the `simd_tuple`. This 

  is not a cheap operation because there are no `T` objects stored in the 

  `simd_tuple`.

  - `copy_from(std::contiguous_iterator)`: :construction: unoptimized load from 

  a contiguous array of struct (e.g. `std::vector`).

  - `copy_to(std::contiguous_iterator)`: :construction: unoptimized store to a 

  contiguous array of struct.

* `vir::simd_tuple`: TODO

* `vir::get(simd_tuple)`: Access to the `I`-th data member (a `simd`).


* `vir::simdize_size`, `vir::simdize_size_v`

### Benchmark support functions

*Requires Concepts (C++20) and GNU compatible inline-asm.*

The header

```c++

#include 

```

defines the following functions:

* `vir::fake_modify(...)`: Let the compiler assume that all arguments passed to 

  this functions are modified. This inhibits constant propagation, hoisting of 

  code sections, and dead-code elimination.

* `vir::fake_read(...)`: Let the compiler assume that all arguments passed to 

  this function are read (in the cheapest manner). This inhibits dead-code 

  elimination leading up to the results passed to this function.

### `constexpr_wrapper`: function arguments as constant expressions

The header

```c++

#include 

```

defines the following tools:

* `vir::constexpr_value` (concept): Satisfied by any type with a static 

  `::value` member that can be used in a constant expression.

* `vir::constexpr_wrapper` (class template): A type storing the value of 

  its NTTP (non-type template parameter) and overloading all operators to 

  return another `constexpr_wrapper`. `constexpr_wrapper` objects are 

  implicitly convertible to their value type (a `constexpr_wrapper` 

  automatically unwraps its constant expression).

* `vir::cw` (variable template): Shorthand for producing 

  `constexpr_wrapper` objects with the given value.

* `vir::literals` (namespace with `_cw` UDL): Shorthand for producing 

  `constexpr_wrapper` objects of the integer literal in front of the `_cw` 

  suffix. The type will be deduced automatically from the value of the literal 

  to be the smallest signed integral type, or if the value is larger, `unsigned 

  long long`. If the value is too large for an `unsigned long long`, the 

  program is ill-formed.

`constexpr_wrapper` may appear unrelated to `simd`. However, it is an important 

tool used in many places in the implementation and on interfaces of vir-simd 

tools. `vir::constexpr_wrapper` is very similar to `std::integral_constant`, 

which is used in the `simd` TS interface for generator constructors.

#### Example

```c++

#include 

auto f(vir::constexpr_value auto N)

{

  std::array x = {};

  return x;

}

std::array a = f(vir::cw<4>); // array

using namespace vir::literals;

std::array b = f(10_cw); // array

```

This example cannot work with a signature `constexpr auto f(int n)` (or 

`consteval`) because `n` will never be considered a constant expression in the 

body of the function.

### Testing for the version of the vir::stdx::simd (vir-simd) library

The header

```c++

#include 

```

(which is also included from ``) defines the type and constant

```c++

namespace vir

{

  struct simd_version_t { int major, minor, patchlevel; };

  constexpr simd_version_t simd_version;

}

```

in addition to the macros `VIR_SIMD_VERSION`, `VIR_SIMD_VERSION_MAJOR`, 

`VIR_SIMD_VERSION_MINOR`, and `VIR_SIMD_VERSION_PATCHLEVEL`.

`simd_version_t` implements all comparison operators, allowing e.g.

```c++

static_assert(vir::simd_version >= vir::simd_version_t{0,4,0});

```

#### Semantics of version numbers

* An increment of the major version number implies a breaking change.

* An increment of the minor version number implies new features without 

  breaking changes.

* An increment of the patchlevel is used for bug fixes.

* Odd patchlevel numbers indicate a development (not released) version.

## Debugging

Compile with `-D _GLIBCXX_DEBUG_UB` to get runtime checks for undefined 

behavior in the `simd` implementation(s). Otherwise, `-fsanitize=undefined` 

without the macro definition will also find the problems, but without 

additional error message.

Preconditions in the vir::stdx::simd implementation and extensions are 

controlled via the `-D VIR_CHECK_PRECONDITIONS=N` macro, which defaults to `3`. 

Compile-time diagnostics are only possible if the compiler's optimizer can 

detect the precondition failure. If you get a bogus compile-time failure, you 

need to introduce the necessary assumption into your calling function, which is 

typically a missing precondition check in your function.

| **Option** | **at compile-time** | **at run-time** |

|:--------------------------|:-------------------:|:---------------:|

| `-DVIR_CHECK_PRECONDITIONS=0` | warning | invoke UB/unreachable |

| `-DVIR_CHECK_PRECONDITIONS=1` |   error | invoke UB/unreachable |

| `-DVIR_CHECK_PRECONDITIONS=2` | warning | trap |

| `-DVIR_CHECK_PRECONDITIONS=3` |   error | trap |

| `-DVIR_CHECK_PRECONDITIONS=4` | warning | print error and abort |

| `-DVIR_CHECK_PRECONDITIONS=5` |   error | print error and abort |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mattkretz/vir-simd

Awesome Lists containing this project

README