{"id":17981157,"url":"https://github.com/rocm/rocprim","last_synced_at":"2026-04-02T11:46:58.829Z","repository":{"id":34830208,"uuid":"114167580","full_name":"ROCm/rocPRIM","owner":"ROCm","description":"ROCm Parallel Primitives","archived":false,"fork":false,"pushed_at":"2025-05-14T20:20:46.000Z","size":8968,"stargazers_count":172,"open_issues_count":15,"forks_count":76,"subscribers_count":46,"default_branch":"develop","last_synced_at":"2025-05-16T05:03:32.182Z","etag":null,"topics":["amd","cuda","gpu","hip","parallel","primitive","rocm"],"latest_commit_sha":null,"homepage":"https://rocm.docs.amd.com/projects/rocPRIM/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ROCm.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-12-13T20:51:12.000Z","updated_at":"2025-05-13T15:37:50.000Z","dependencies_parsed_at":"2023-10-11T21:58:25.215Z","dependency_job_id":"aafe144a-3860-4ed0-90b8-98210e571db1","html_url":"https://github.com/ROCm/rocPRIM","commit_stats":{"total_commits":1128,"total_committers":60,"mean_commits":18.8,"dds":0.7402482269503546,"last_synced_commit":"e55fbf53ac07f5ffadfab6c52ae544af5914ba84"},"previous_names":["rocm/rocprim","rocmsoftwareplatform/rocprim"],"tags_count":76,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2FrocPRIM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2FrocPRIM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2FrocPRIM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2FrocPRIM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ROCm","download_url":"https://codeload.github.com/ROCm/rocPRIM/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254471061,"owners_count":22076585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amd","cuda","gpu","hip","parallel","primitive","rocm"],"created_at":"2024-10-29T18:08:16.039Z","updated_at":"2026-04-02T11:46:58.799Z","avatar_url":"https://github.com/ROCm.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# rocPRIM\n\n\u003e [!NOTE]\n\u003e The published rocPRIM documentation is available [here](https://rocm.docs.amd.com/projects/rocPRIM/en/latest/) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the `docs` folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).\n\nrocPRIM is a header-only library that provides HIP parallel primitives. You can use this library to\ndevelop performant GPU-accelerated code on AMD ROCm platforms.\n\n## Requirements\n\n* Git\n* CMake (3.16 or later)\n* AMD [ROCm](https://rocm.docs.amd.com/en/latest/) platform (1.8.2 or later)\n  * Including\n    [HIP-clang](https://github.com/ROCm/HIP/blob/master/INSTALL.md#hip-clang)\n    compiler\n* C++17\n* Python 3.6 or higher (HIP on Windows only, required only for install script)\n* Visual Studio 2019 with Clang support (HIP on Windows only)\n* Strawberry Perl (HIP on Windows only)\n\nOptional:\n\n* [GoogleTest](https://github.com/google/googletest)\n  * Required only for tests. Building tests is on by default.\n  * This is automatically downloaded and built by the CMake script.\n* [Google Benchmark](https://github.com/google/benchmark)\n  * Required only for benchmarks. Building benchmarks is off by default.\n  * This is automatically downloaded and built by the CMake script.\n\n## Build and install\n\nYou can build and install rocPRIM on Linux or Windows.\n\n* Linux:\n\n  ```shell\n  git clone https://github.com/ROCm/rocPRIM.git\n\n  # Go to rocPRIM directory, create and go to the build directory.\n  cd rocPRIM; mkdir build; cd build\n\n  # Configure rocPRIM, setup options for your system.\n  # Build options:\n  #   ONLY_INSTALL - OFF by default, If this flag is on, the build ignore the BUILD_* flags\n  #   BUILD_TEST - OFF by default,\n  #   BUILD_EXAMPLE - OFF by default,\n  #   BUILD_BENCHMARK - OFF by default.\n  #   BENCHMARK_CONFIG_TUNING - OFF by default. The purpose of this flag to find the best kernel config parameters.\n  #     At ON the compilation time can be increased significantly.\n  #   AMDGPU_TARGETS - list of AMD architectures, default: gfx803;gfx900;gfx906;gfx908.\n  #     You can make compilation faster if you want to test/benchmark only on one architecture,\n  #     for example, add -DAMDGPU_TARGETS=gfx906 to 'cmake' parameters.\n  #   AMDGPU_TEST_TARGETS - list of AMD architectures, default: \"\" (default system device)\n  #     If you want to detect failures on a per GFX IP basis, setting it to some set of ips will create\n  #     separate tests with the ip name embedded into the test name. Building for all, but selecting\n  #     tests only of a specific architecture is possible for eg: ctest -R gfx803|gfx900\n  #\n  # ! IMPORTANT !\n  # Set C++ compiler to HIP-clang. You can do it by adding 'CXX=\u003cpath-to-compiler\u003e'\n  # before 'cmake' or setting cmake option 'CMAKE_CXX_COMPILER' to path to the compiler.\n  # Using HIP-clang:\n  [CXX=hipcc] cmake -DBUILD_BENCHMARK=ON ../.\n\n  # Build\n  make -j4\n\n  # Optionally, run tests if they're enabled.\n  ctest --output-on-failure\n\n  # Install\n  [sudo] make install\n  ```\n\n* Windows:\n\n  We've added initial support for HIP on Windows; to install, use the provided `rmake.py` python script:\n\n  ```shell\n  git clone https://github.com/ROCm/rocPRIM.git\n  cd rocPRIM\n\n  # the -i option will install rocPRIM to C:\\hipSDK by default\n  python rmake.py -i\n\n  # the -c option will build all clients including unit tests\n  python rmake.py -c\n  ```\n\n### Using rocPRIM\n\nInclude the `\u003crocprim/rocprim.hpp\u003e` header:\n\n```cpp\n#include \u003crocprim/rocprim.hpp\u003e\n```\n\nWe recommended including rocPRIM into a CMake project by using the package configuration files.\nThe rocPRIM package name is `rocprim`.\n\n```cmake\n# \"/opt/rocm\" - default install prefix\nfind_package(rocprim REQUIRED CONFIG PATHS \"/opt/rocm/rocprim\")\n\n...\n\n# Includes only rocPRIM headers, HIP libraries have\n# to be linked manually by user\ntarget_link_libraries(\u003cyour_target\u003e roc::rocprim)\n\n# Include rocPRIM headers and required HIP dependencies\n# - If using HIP language support (USE_HIPCXX=ON):\ntarget_link_libraries(\u003cyour_target\u003e hip::host)\n\n# - Otherwise:\ntarget_link_libraries(\u003cyour_target\u003e hip::device)\n```\n\nFor more information on `hip::host` and `hip::device`, please see the [ROCm documentation](https://rocm.docs.amd.com/en/latest/conceptual/cmake-packages.html#consuming-the-hip-api-in-c-code).\n\n## Running unit tests\n\nUnit tests are implemented in terms of GoogleTest. Collections of tests are wrapped and invoked from\nCTest.\n\n```shell\n# Go to rocPRIM build directory\ncd rocPRIM; cd build\n\n# List available tests\nctest --show-only\n\n# To run all tests\nctest\n\n# Run specific test(s)\nctest -R \u003cregex\u003e\n\n# To run the Google Test manually\n./test/rocprim/test_\u003cunit-test-name\u003e\n```\n\n### Using multiple GPUs concurrently for testing\n\nThis feature requires using CMake 3.16+ for building and testing.\n\n```note\nPrior versions of CMake can't assign IDs to tests when running in parallel. Assigning tests to distinct\ndevices could only be done at the cost of extreme complexity.\n```\n\nUnit tests can make use of the\n[CTest resource allocation](https://cmake.org/cmake/help/latest/manual/ctest.1.html#resource-allocation)\nfeature, which you can use to distribute tests across multiple GPUs in an intelligent manner. This\nfeature can accelerate testing when multiple GPUs of the same family are in a system. It can also test\nmultiple product families from one invocation without having to use the `HIP_VISIBLE_DEVICES`\nenvironment variable. The feature relies on the presence of a resource specifications file.\n\n```important\nTrying to use `RESOURCE_GROUPS` and `--resource-spec-file` with CMake and CTest for versions prior\nto 3.16 silently omits the feature. No warnings are issued about unknown properties or command-line\narguments. Make sure that the `cmake` and `ctest` versions you invoke are sufficiently recent.\n```\n\n#### Auto resource specification generation\n\nYou can independently call the utility script located in the repository using the following code:\n\n```shell\n# Go to rocPRIM build directory\ncd rocPRIM; cd build\n\n# Invoke directly or use CMake script mode via cmake -P\n../cmake/GenerateResourceSpec.cmake\n\n# Assuming you have 2 compatible GPUs in the system\nctest --resource-spec-file ./resources.json --parallel 2\n```\n\n#### Manual\n\nAssuming you have two GPUs from the gfx900 family and they are the first devices enumerated by the\nsystem, you can use `-D AMDGPU_TEST_TARGETS=gfx900` during configuration to specify that only\none family will be tested. Leaving this var empty (default) results in targeting the default device in the\nsystem. To let CMake know there are two GPUs that should be targeted, you have to provide a `JSON`\nfile to CTest via the `--resource-spec-file \u003cpath_to_file\u003e` flag. For example:\n\n```json\n{\n  \"version\": {\n    \"major\": 1,\n    \"minor\": 0\n  },\n  \"local\": [\n    {\n      \"gfx900\": [\n        {\n          \"id\": \"0\"\n        },\n        {\n          \"id\": \"1\"\n        }\n      ]\n    }\n  ]\n}\n```\n\nInvoking CTest as `ctest --resource-spec-file \u003cpath_to_file\u003e --parallel 2` allows two tests to run\nconcurrently, distributed between the two GPUs.\n\n### Using custom seeds for the tests\n\nModify the `rocPRIM/test/rocprim/test_seed.hpp` file.\n\n```cpp\n//(1)\nstatic constexpr int random_seeds_count = 10;\n\n//(2)\nstatic constexpr unsigned int seeds [] = {0, 2, 10, 1000};\n\n//(3)\nstatic constexpr size_t seed_size = sizeof(seeds) / sizeof(seeds[0]);\n```\n\n(1) Defines a constant that sets how many passes over the tests will be done with runtime-generated\nseeds. Modify at will.\n\n(2) Defines the user-generated seeds. Each of the array elements will be used as seed for all tests.\nModify at will. If you don't want any static seeds, leave the array empty.\n\n```cpp\nstatic constexpr unsigned int seeds [] = {};\n```\n\n(3) Never modify this line.\n\n## Running benchmarks\n\n```shell\n# Go to rocPRIM build directory\ncd rocPRIM; cd build\n\n# To run benchmark for warp functions:\n# Further option can be found using --help\n# [] Fields are optional\n./benchmark/benchmark_warp_\u003cfunction_name\u003e [--size \u003csize\u003e] [--trials \u003ctrials\u003e]\n\n# To run benchmark for block functions:\n# Further option can be found using --help\n# [] Fields are optional\n./benchmark/benchmark_block_\u003cfunction_name\u003e [--size \u003csize\u003e] [--trials \u003ctrials\u003e]\n\n# To run benchmark for device functions:\n# Further option can be found using --help\n# [] Fields are optional\n./benchmark/benchmark_device_\u003cfunction_name\u003e [--size \u003csize\u003e] [--trials \u003ctrials\u003e]\n```\n\n### Performance configuration\n\nMost device-specific primitives provided by rocPRIM can be tuned for other AMD devices, and\ndifferent types and operations, by passing compile-time configuration structures as a template\nparameter. The main \"knobs\" are usually the size of the block and the number of items processed by a\nsingle thread.\n\nrocPRIM has built-in default configurations for each of its primitives, these will be used automatically\nbased on the input types and the target architecture from the stream used.\n\n## hipCUB\n\n[hipCUB](https://github.com/ROCm/hipCUB/) is a thin wrapper library on top of\n[rocPRIM](https://github.com/ROCm/rocPRIM) or\n[CUB](https://github.com/NVlabs/cub). You can use it to port projects that use the CUB library to the\n[HIP](https://github.com/ROCm/HIP) layer and run them on AMD hardware. In the\n[ROCm](https://rocm.docs.amd.com/en/latest/) environment, hipCUB uses the rocPRIM library as a\nbackend; on CUDA platforms, it uses CUB as a backend.\n\n## Building the documentation locally\n\n### Requirements\n\n#### Doxygen\n\nThe build system uses Doxygen [version 1.9.4](https://github.com/doxygen/doxygen/releases/tag/Release_1_9_4). You can try using a newer version, but that might cause issues.\n\nAfter you have downloaded Doxygen version 1.9.4:\n\n```shell\n# Add doxygen to your PATH\necho 'export PATH=\u003cdoxygen 1.9.4 path\u003e/bin:$PATH' \u003e\u003e ~/.bashrc\n\n# Apply the updated .bashrc\nsource ~/.bashrc\n\n# Confirm that you are using version 1.9.4\ndoxygen --version\n```\n\n#### Python\n\nThe build system uses Python version 3.10. You can try using a newer version, but that might cause issues.\n\nYou can install Python 3.10 alongside your other Python versions using [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#installation):\n\n```shell\n# Install Python 3.10\npyenv install 3.10\n\n# Create a Python 3.10 virtual environment\npyenv virtualenv 3.10 venv_rocprim\n\n# Activate the virtual environment\npyenv activate venv_rocprim\n```\n\n### Building\n\nAfter cloning this repository, and `cd`ing into it:\n\n```shell\n# Install Python dependencies\npython3 -m pip install -r docs/sphinx/requirements.txt\n\n# Build the documentation\npython3 -m sphinx -T -E -b html -d docs/_build/doctrees -D language=en docs docs/_build/html\n```\n\nYou can then open `docs/_build/html/index.html` in your browser to view the documentation.\n\n### Build documentation via CMake\n\nInstall [rocm-cmake](https://github.com/ROCm/rocm-cmake/)\n\n```shell\n# Change directory to rocPRIM\ncd rocPRIM\n\n# Install documentation dependencies\npython3 -m pip install -r docs/sphinx/requirements.txt\n\n# Set C++ compiler\n# This example uses hipcc and assumes it is at the path /usr/bin\nexport CXX=hipcc\nexport PATH=/usr/bin:$PATH\n\n# Configure the project\ncmake -S . -B ./build -D BUILD_DOCS=ON\n\n# Build the documentation\ncmake --build ./build --target doc\n\n# To serve the HTML docs locally\ncd ./build/docs/html\npython3 -m http.server\n```\n\n## Support\n\nYou can report bugs and feature requests through our GitHub\n[issue tracker](https://github.com/ROCm/rocPRIM/issues).\n\n## Contributions and license\n\nContributions of any kind are most welcome! Contribution instructions are in\n[CONTRIBUTING](./CONTRIBUTING.md).\n\nLicensing information is in [LICENSE](./LICENSE.txt).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocm%2Frocprim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frocm%2Frocprim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocm%2Frocprim/lists"}