{"id":19943238,"url":"https://github.com/root-project/veccore","last_synced_at":"2025-04-05T21:05:00.391Z","repository":{"id":47633070,"uuid":"75320199","full_name":"root-project/veccore","owner":"root-project","description":"C++ Library for Portable SIMD Vectorization","archived":false,"fork":false,"pushed_at":"2024-11-22T14:30:04.000Z","size":16545,"stargazers_count":83,"open_issues_count":1,"forks_count":22,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-03-29T20:02:31.817Z","etag":null,"topics":["simd","veccore","vectorization"],"latest_commit_sha":null,"homepage":"https://root-project.github.io/veccore","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/root-project.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-12-01T18:22:42.000Z","updated_at":"2025-03-25T11:21:43.000Z","dependencies_parsed_at":"2022-09-23T14:20:21.024Z","dependency_job_id":"daee9328-4b1a-439b-9cc1-8dd1a0e844f8","html_url":"https://github.com/root-project/veccore","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/root-project%2Fveccore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/root-project%2Fveccore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/root-project%2Fveccore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/root-project%2Fveccore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/root-project","download_url":"https://codeload.github.com/root-project/veccore/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399871,"owners_count":20932876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["simd","veccore","vectorization"],"created_at":"2024-11-13T00:15:56.237Z","updated_at":"2025-04-05T21:05:00.365Z","avatar_url":"https://github.com/root-project.png","language":"C++","readme":"# VecCore\n\n**VecCore** is a simple abstraction layer on top of other vectorization\nlibraries. It provides an architecture-independent [API](doc/api.md) for\nexpressing vector operations on data. Code written with this API can then\nbe dispatched to one of several [backends](doc/backends.md) implemented using\nlibraries like [Vc](https://github.com/VcDevel/Vc),\n[UME::SIMD](https://github.com/edanor/umesimd), or a scalar implementation.\nThis allows one to get the best performance on platforms supported by Vc and\nUME::SIMD without losing portability to unsupported architectures like PowerPC,\nfor example, where the scalar backends can be used instead without requiring\nchanges in user code. Another advantage is that, unlike with compiler intrinsics,\nthe same code can be compiled for SSE, AVX2, AVX512, etc, without modifications.\nWith the addition of new backends, such as the new backend based on C++20 and\n`std::experimental::simd`, users can automatically take advantage of new\nfeatures and better performance. This backend supports AVX512 on Intel/AMD64 and\nNEON on ARM/ARM64, with best performance in most cases. However, it does require\ncompiling code in C++20 mode, which may not always be possible, so there is\nstill an advantage in using it via VecCore's implementation to have a fallback\nwhen C++20 is not avaialble.\n\n## Example\n\nThe [bench](bench/) directory of the repository has several usage examples of\nthe VecCore API that are used to compare how different backends perform in\nvarious circumstances. Below we show how to convert a scalar function to compute\na [Julia Set](https://en.wikipedia.org/wiki/Julia_set) to work with SIMD instructions:\n\n#### Scalar Implementation\n\n```cpp\nvoid julia(float xmin, float xmax, int nx, flaot ymin, float ymax, int ny,\n           int max_iter, unsigned char *image, float real, float im)\n{\n    float dx = (xmax - xmin) / nx;\n    float dy = (ymax - ymin) / ny;\n\n    for (int i = 0; i \u003c nx; ++i) {\n        for (int j = 0; j \u003c ny; ++j) {\n            int k = 0;\n            float x = xmin + i * dx, cr = real, zr = x;\n            float y = ymin + j * dy, ci = im, zi = y;\n\n            do {\n                x  = zr*zr - zi*zi + cr;\n                y  = 2.0f * zr*zi + ci;\n                zr = x;\n                zi = y;\n            } while (++k \u003c max_iter \u0026\u0026 (zr*zr + zi*zi \u003c 4.0f));\n\n            image[ny*i + j] = k;\n        }\n    }\n}\n```\n\n#### SIMD Implementation using VecCore\n\n```cpp\ntemplate\u003ctypename T\u003e\nvoid julia_v(Scalar\u003cT\u003e xmin, Scalar\u003cT\u003e xmax, size_t nx, Scalar\u003cT\u003e ymin, Scalar\u003cT\u003e ymax, size_t ny,\n             Scalar\u003cIndex\u003cT\u003e\u003e max_iter, unsigned char *image, Scalar\u003cT\u003e real, Scalar\u003cT\u003e im)\n{\n    T iota(0.0);\n    for (size_t i = 0; i \u003c VectorSize\u003cT\u003e(); ++i)\n        Set\u003cT\u003e(iota, i, i);\n\n    T dx = T(xmax - xmin) / T(nx);\n    T dy = T(ymax - ymin) / T(ny), dyv = iota * dy;\n\n    for (size_t i = 0; i \u003c nx; ++i) {\n        for (size_t j = 0; j \u003c ny; j += VectorSize\u003cT\u003e()) {\n            Scalar\u003cIndex\u003cT\u003e\u003e k(0);\n            T x = xmin + T(i) * dx,       cr = real, zr = x;\n            T y = ymin + T(j) * dy + dyv, ci = im, zi = y;\n\n            Index\u003cT\u003e kv(0);\n            Mask\u003cT\u003e m(true);\n\n            do {\n                x = zr*zr - zi*zi + cr;\n                y = T(2.0) * zr*zi + ci;\n                MaskedAssign\u003cT\u003e(zr, m, x);\n                MaskedAssign\u003cT\u003e(zi, m, y);\n                MaskedAssign\u003cIndex\u003cT\u003e\u003e(kv, m, ++k);\n                m = zr*zr + zi*zi \u003c T(4.0);\n            } while (k \u003c max_iter \u0026\u0026 !MaskEmpty(m));\n\n            for (size_t k = 0; k \u003c VectorSize\u003cT\u003e(); ++k)\n                image[ny*i + j + k] = (unsigned char) Get(kv, k);\n        }\n    }\n}\n```\n\nThe differences appear where branching is required and masks need to be used\ninstead of simple conditionals. In some places, casting scalars to the correct\ntype is also necessary in order enable their promotion to the correct SIMD vector\ntype.\n\n#### Performance\n\nGains in performance usually depend not only on the code being vectorized, but\nalso on the runtime characteristics of the actual computations. For example,\nwhen computing Julia sets, it matters what structure it has, as that determines\nhow much coherence there is between nearby pixels. That is, the more iterations\nthat get computed in vector mode for nearby pixels, the more performance is\nimproved. On the other hand, when more iterations are performed with elements\nmasked out, speedup is lower. Therefore, the fractal with the largest interior\nconsisting of diverging points (shown in black) has the largest speedup. The\nfigure below illustrates this fact for different fractals (left) by showing the\nspeedup as the point where the lines cross the axis of the radial plot (right).\n\n\u003cp align=center\u003e\n\u003cimg src=\"doc/julia.gif\" alt=\"Julia Set Animation\" width=\"50%\" /\u003e\n\u0026nbsp;\n\u0026nbsp;\n\u003cimg src=\"doc/julia-speedup.gif\" alt=\"Julia Set Animation\" width=\"40%\" /\u003e\n\u003c/p\u003e\n\n## Supported Platforms\n\nVecCore supports Linux, Mac OS X, and Windows. To compile software using\nVecCore, you will need a compiler with support for C++17. We recommend using at\nleast the following compiler versions:\n\n - GCC 11.0\n - Clang 14.0\n - AppleClang 15.0\n - Intel® C/C++ Compiler 19.1\n - Microsoft Visual Studio 17 2019\n\nAdditionally, you will need CMake 3.16 or later, and you may want to install\na SIMD library such as\n\n - [Vc](https://github.com/VcDevel/Vc) (version 1.4 or later)\n - [UME::SIMD](https://github.com/edanor/umesimd) (version 0.8.1 or later)\n - [std::experimental::simd](https://gcc.gnu.org/gcc-11/changes.html#libstdcxx)\n   (included in libstdc++ from GCC 11 or later)\n\nand/or\n\n - [Nvidia's CUDA SDK](http://developer.nvidia.com/cuda) (version 11.0 or later).\n\n## Documentation\n\nThe documentation can be generated by Doxygen by enabling `-DBUILD_DOCS=True`\nwhen configuring, then building the `doxygen` target with `make doxygen`. It is\nalso available online at https://root-project.github.io/veccore.\n\n## Publications\n\nA list of publications is available [here](doc/publications.md).\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froot-project%2Fveccore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froot-project%2Fveccore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froot-project%2Fveccore/lists"}