Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/root-project/veccore

C++ Library for Portable SIMD Vectorization
https://github.com/root-project/veccore

simd veccore vectorization

Last synced: 1 day ago
JSON representation

C++ Library for Portable SIMD Vectorization

Host: GitHub
URL: https://github.com/root-project/veccore
Owner: root-project
License: other
Created: 2016-12-01T18:22:42.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2024-11-22T14:30:04.000Z (about 1 month ago)
Last Synced: 2024-12-24T10:26:16.889Z (8 days ago)
Topics: simd, veccore, vectorization
Language: C++
Homepage: https://root-project.github.io/veccore
Size: 15.8 MB
Stars: 80
Watchers: 17
Forks: 22
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

        # VecCore

**VecCore** is a simple abstraction layer on top of other vectorization

libraries. It provides an architecture-independent [API](doc/api.md) for

expressing vector operations on data. Code written with this API can then

be dispatched to one of several [backends](doc/backends.md) implemented using

libraries like [Vc](https://github.com/VcDevel/Vc),

[UME::SIMD](https://github.com/edanor/umesimd), or a scalar implementation.

This allows one to get the best performance on platforms supported by Vc and

UME::SIMD without losing portability to unsupported architectures like PowerPC,

for example, where the scalar backends can be used instead without requiring

changes in user code. Another advantage is that, unlike with compiler intrinsics,

the same code can be compiled for SSE, AVX2, AVX512, etc, without modifications.

With the addition of new backends, such as the new backend based on C++20 and

`std::experimental::simd`, users can automatically take advantage of new

features and better performance. This backend supports AVX512 on Intel/AMD64 and

NEON on ARM/ARM64, with best performance in most cases. However, it does require

compiling code in C++20 mode, which may not always be possible, so there is

still an advantage in using it via VecCore's implementation to have a fallback

when C++20 is not avaialble.

## Example

The [bench](bench/) directory of the repository has several usage examples of

the VecCore API that are used to compare how different backends perform in

various circumstances. Below we show how to convert a scalar function to compute

a [Julia Set](https://en.wikipedia.org/wiki/Julia_set) to work with SIMD instructions:

#### Scalar Implementation

```cpp

void julia(float xmin, float xmax, int nx, flaot ymin, float ymax, int ny,

           int max_iter, unsigned char *image, float real, float im)

{

    float dx = (xmax - xmin) / nx;

    float dy = (ymax - ymin) / ny;

    for (int i = 0; i < nx; ++i) {

        for (int j = 0; j < ny; ++j) {

            int k = 0;

            float x = xmin + i * dx, cr = real, zr = x;

            float y = ymin + j * dy, ci = im, zi = y;

            do {

                x  = zr*zr - zi*zi + cr;

                y  = 2.0f * zr*zi + ci;

                zr = x;

                zi = y;

            } while (++k < max_iter && (zr*zr + zi*zi < 4.0f));

            image[ny*i + j] = k;

        }

    }

}

```

#### SIMD Implementation using VecCore

```cpp

template

void julia_v(Scalar xmin, Scalar xmax, size_t nx, Scalar ymin, Scalar ymax, size_t ny,

             Scalar> max_iter, unsigned char *image, Scalar real, Scalar im)

{

    T iota(0.0);

    for (size_t i = 0; i < VectorSize(); ++i)

        Set(iota, i, i);

    T dx = T(xmax - xmin) / T(nx);

    T dy = T(ymax - ymin) / T(ny), dyv = iota * dy;

    for (size_t i = 0; i < nx; ++i) {

        for (size_t j = 0; j < ny; j += VectorSize()) {

            Scalar> k(0);

            T x = xmin + T(i) * dx,       cr = real, zr = x;

            T y = ymin + T(j) * dy + dyv, ci = im, zi = y;

            Index kv(0);

            Mask m(true);

            do {

                x = zr*zr - zi*zi + cr;

                y = T(2.0) * zr*zi + ci;

                MaskedAssign(zr, m, x);

                MaskedAssign(zi, m, y);

                MaskedAssign>(kv, m, ++k);

                m = zr*zr + zi*zi < T(4.0);

            } while (k < max_iter && !MaskEmpty(m));

            for (size_t k = 0; k < VectorSize(); ++k)

                image[ny*i + j + k] = (unsigned char) Get(kv, k);

        }

    }

}

```

The differences appear where branching is required and masks need to be used

instead of simple conditionals. In some places, casting scalars to the correct

type is also necessary in order enable their promotion to the correct SIMD vector

type.

#### Performance

Gains in performance usually depend not only on the code being vectorized, but

also on the runtime characteristics of the actual computations. For example,

when computing Julia sets, it matters what structure it has, as that determines

how much coherence there is between nearby pixels. That is, the more iterations

that get computed in vector mode for nearby pixels, the more performance is

improved. On the other hand, when more iterations are performed with elements

masked out, speedup is lower. Therefore, the fractal with the largest interior

consisting of diverging points (shown in black) has the largest speedup. The

figure below illustrates this fact for different fractals (left) by showing the

speedup as the point where the lines cross the axis of the radial plot (right).





 

 





## Supported Platforms

VecCore supports Linux, Mac OS X, and Windows. To compile software using

VecCore, you will need a compiler with support for C++17. We recommend using at

least the following compiler versions:

 - GCC 11.0

 - Clang 14.0

 - AppleClang 15.0

 - Intel® C/C++ Compiler 19.1

 - Microsoft Visual Studio 17 2019

Additionally, you will need CMake 3.16 or later, and you may want to install

a SIMD library such as

 - [Vc](https://github.com/VcDevel/Vc) (version 1.4 or later)

 - [UME::SIMD](https://github.com/edanor/umesimd) (version 0.8.1 or later)

 - [std::experimental::simd](https://gcc.gnu.org/gcc-11/changes.html#libstdcxx)

   (included in libstdc++ from GCC 11 or later)

and/or

 - [Nvidia's CUDA SDK](http://developer.nvidia.com/cuda) (version 11.0 or later).

## Documentation

The documentation can be generated by Doxygen by enabling `-DBUILD_DOCS=True`

when configuring, then building the `doxygen` target with `make doxygen`. It is

also available online at https://root-project.github.io/veccore.

## Publications

A list of publications is available [here](doc/publications.md).