{"id":15047979,"url":"https://github.com/nvidia/cccl","last_synced_at":"2026-02-05T23:07:18.589Z","repository":{"id":39987438,"uuid":"296416761","full_name":"NVIDIA/cccl","owner":"NVIDIA","description":"CUDA Core Compute Libraries","archived":false,"fork":false,"pushed_at":"2025-05-12T18:28:20.000Z","size":84298,"stargazers_count":1632,"open_issues_count":1060,"forks_count":212,"subscribers_count":35,"default_branch":"main","last_synced_at":"2025-05-12T18:32:26.880Z","etag":null,"topics":["accelerated-computing","cpp","cpp-programming","cuda","cuda-cpp","cuda-kernels","cuda-library","cuda-programming","gpu","gpu-acceleration","gpu-computing","gpu-programming","hpc","modern-cpp","nvidia","nvidia-gpu","parallel-algorithm","parallel-computing","parallel-programming"],"latest_commit_sha":null,"homepage":"https://nvidia.github.io/cccl/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVIDIA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.md","codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-09-17T18:58:41.000Z","updated_at":"2025-05-12T16:40:02.000Z","dependencies_parsed_at":"2023-09-27T02:48:18.827Z","dependency_job_id":"8cc912ad-7264-4589-a13b-bebf5bd59c41","html_url":"https://github.com/NVIDIA/cccl","commit_stats":{"total_commits":9059,"total_committers":213,"mean_commits":42.53051643192488,"dds":0.7644331603929794,"last_synced_commit":"d9aa6f54e7457b4c6d7eae821d487abe93c0f998"},"previous_names":[],"tags_count":36,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fcccl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fcccl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fcccl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fcccl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVIDIA","download_url":"https://codeload.github.com/NVIDIA/cccl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254029008,"owners_count":22002284,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accelerated-computing","cpp","cpp-programming","cuda","cuda-cpp","cuda-kernels","cuda-library","cuda-programming","gpu","gpu-acceleration","gpu-computing","gpu-programming","hpc","modern-cpp","nvidia","nvidia-gpu","parallel-algorithm","parallel-computing","parallel-programming"],"created_at":"2024-09-24T21:06:27.432Z","updated_at":"2025-05-13T21:12:12.506Z","avatar_url":"https://github.com/NVIDIA.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/NVIDIA/cccl?quickstart=1\u0026devcontainer_path=.devcontainer%2Fdevcontainer.json)\n\n|[Contributor Guide](https://github.com/NVIDIA/cccl/blob/main/CONTRIBUTING.md)|[Dev Containers](https://github.com/NVIDIA/cccl/blob/main/.devcontainer/README.md)|[Discord](https://discord.gg/nvidiadeveloper)|[Godbolt](https://godbolt.org/z/x4G73af9a)|[GitHub Project](https://github.com/orgs/NVIDIA/projects/6)|[Documentation](https://nvidia.github.io/cccl)|\n|-|-|-|-|-|-|\n\n# CUDA Core Compute Libraries (CCCL)\n\nWelcome to the CUDA Core Compute Libraries (CCCL) where our mission is to make CUDA more delightful.\n\nThis repository unifies three essential CUDA C++ libraries into a single, convenient repository:\n\n- [Thrust](thrust) ([former repo](https://github.com/nvidia/thrust))\n- [CUB](cub) ([former repo](https://github.com/nvidia/cub))\n- [libcudacxx](libcudacxx) ([former repo](https://github.com/nvidia/libcudacxx))\n\nThe goal of CCCL is to provide CUDA C++ developers with building blocks that make it easier to write safe and efficient code.\nBringing these libraries together streamlines your development process and broadens your ability to leverage the power of CUDA C++.\nFor more information about the decision to unify these projects, see the [announcement here](https://github.com/NVIDIA/cccl/discussions/520).\n\n## Overview\n\nThe concept for the CUDA Core Compute Libraries (CCCL) grew organically out of the Thrust, CUB, and libcudacxx projects that were developed independently over the years with a similar goal: to provide high-quality, high-performance, and easy-to-use C++ abstractions for CUDA developers.\nNaturally, there was a lot of overlap among the three projects, and it became clear the community would be better served by unifying them into a single repository.\n\n- **Thrust** is the C++ parallel algorithms library which inspired the introduction of parallel algorithms to the C++ Standard Library. Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs via configurable backends that allow using multiple parallel programming frameworks (such as CUDA, TBB, and OpenMP).\n\n- **CUB** is a lower-level, CUDA-specific library designed for speed-of-light parallel algorithms across all GPU architectures. In addition to device-wide algorithms, it provides *cooperative algorithms* like block-wide reduction and warp-wide scan, providing CUDA kernel developers with building blocks to create speed-of-light, custom kernels.\n\n- **libcudacxx** is the CUDA C++ Standard Library. It provides an implementation of the C++ Standard Library that works in both host and device code. Additionally, it provides abstractions for CUDA-specific hardware features like synchronization primitives, cache control, atomics, and more.\n\nThe main goal of CCCL is to fill a similar role that the Standard C++ Library fills for Standard C++: provide general-purpose, speed-of-light tools to CUDA C++ developers, allowing them to focus on solving the problems that matter.\nUnifying these projects is the first step towards realizing that goal.\n\n## Example\n\nThis is a simple example demonstrating the use of CCCL functionality from Thrust, CUB, and libcudacxx.\n\nIt shows how to use Thrust/CUB/libcudacxx to implement a simple parallel reduction kernel.\nEach thread block computes the sum of a subset of the array using `cub::BlockReduce`.\nThe sum of each block is then reduced to a single value using an atomic add via `cuda::atomic_ref` from libcudacxx.\n\nIt then shows how the same reduction can be done using Thrust's `reduce` algorithm and compares the results.\n\n[Try it live on Godbolt!](https://godbolt.org/z/aMx4j9f4T)\n\n```cpp\n#include \u003cthrust/execution_policy.h\u003e\n#include \u003cthrust/device_vector.h\u003e\n#include \u003ccub/block/block_reduce.cuh\u003e\n#include \u003ccuda/atomic\u003e\n#include \u003ccuda/cmath\u003e\n#include \u003ccuda/std/span\u003e\n#include \u003ccstdio\u003e\n\ntemplate \u003cint block_size\u003e\n__global__ void reduce(cuda::std::span\u003cint const\u003e data, cuda::std::span\u003cint\u003e result) {\n  using BlockReduce = cub::BlockReduce\u003cint, block_size\u003e;\n  __shared__ typename BlockReduce::TempStorage temp_storage;\n\n  int const index = threadIdx.x + blockIdx.x * blockDim.x;\n  int sum = 0;\n  if (index \u003c data.size()) {\n    sum += data[index];\n  }\n  sum = BlockReduce(temp_storage).Sum(sum);\n\n  if (threadIdx.x == 0) {\n    cuda::atomic_ref\u003cint, cuda::thread_scope_device\u003e atomic_result(result.front());\n    atomic_result.fetch_add(sum, cuda::memory_order_relaxed);\n  }\n}\n\nint main() {\n\n  // Allocate and initialize input data\n  int const N = 1000;\n  thrust::device_vector\u003cint\u003e data(N);\n  thrust::fill(data.begin(), data.end(), 1);\n\n  // Allocate output data\n  thrust::device_vector\u003cint\u003e kernel_result(1);\n\n  // Compute the sum reduction of `data` using a custom kernel\n  constexpr int block_size = 256;\n  int const num_blocks = cuda::ceil_div(N, block_size);\n  reduce\u003cblock_size\u003e\u003c\u003c\u003cnum_blocks, block_size\u003e\u003e\u003e(cuda::std::span\u003cint const\u003e(thrust::raw_pointer_cast(data.data()), data.size()),\n                                                 cuda::std::span\u003cint\u003e(thrust::raw_pointer_cast(kernel_result.data()), 1));\n\n  auto const err = cudaDeviceSynchronize();\n  if (err != cudaSuccess) {\n    std::cout \u003c\u003c \"Error: \" \u003c\u003c cudaGetErrorString(err) \u003c\u003c std::endl;\n    return -1;\n  }\n\n  int const custom_result = kernel_result[0];\n\n  // Compute the same sum reduction using Thrust\n  int const thrust_result = thrust::reduce(thrust::device, data.begin(), data.end(), 0);\n\n  // Ensure the two solutions are identical\n  std::printf(\"Custom kernel sum: %d\\n\", custom_result);\n  std::printf(\"Thrust reduce sum: %d\\n\", thrust_result);\n  assert(kernel_result[0] == thrust_result);\n  return 0;\n}\n```\n\n## Getting Started\n\n### Users\n\nEverything in CCCL is header-only.\nTherefore, users need only concern themselves with how they get the header files and how they incorporate them into their build system.\n\n#### CUDA Toolkit\nThe easiest way to get started using CCCL is via the [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit) which includes the CCCL headers.\nWhen you compile with `nvcc`, it automatically adds CCCL headers to your include path so you can simply `#include` any CCCL header in your code with no additional configuration required.\n\nIf compiling with another compiler, you will need to update your build system's include search path to point to the CCCL headers in your CTK install (e.g., `/usr/local/cuda/include`).\n\n```cpp\n#include \u003cthrust/device_vector.h\u003e\n#include \u003ccub/cub.cuh\u003e\n#include \u003ccuda/std/atomic\u003e\n```\n\n#### GitHub\n\nUsers who want to stay on the cutting edge of CCCL development are encouraged to use CCCL from GitHub.\nUsing a newer version of CCCL with an older version of the CUDA Toolkit is supported, but not the other way around.\nFor complete information on compatibility between CCCL and the CUDA Toolkit, see [our platform support](#platform-support).\n\nEverything in CCCL is header-only, so cloning and including it in a simple project is as easy as the following:\n```bash\ngit clone https://github.com/NVIDIA/cccl.git\nnvcc -Icccl/thrust -Icccl/libcudacxx/include -Icccl/cub main.cu -o main\n```\n\u003e **Note**\n\u003e Use `-I` and not `-isystem` to avoid collisions with the CCCL headers implicitly included by `nvcc` from the CUDA Toolkit. All CCCL headers use `#pragma system_header` to ensure warnings will still be silenced as if using `-isystem`, see https://github.com/NVIDIA/cccl/issues/527 for more information.\n\n##### Installation\n\nA minimal build that only generates installation rules can be configured using the `install` CMake preset:\n```bash\ngit clone https://github.com/NVIDIA/cccl.git\ncd cccl\ncmake --preset install -DCMAKE_INSTALL_PREFIX=/usr/local/\ncd build/install\nninja install\n```\n\nTo include experimental libraries in the installation, use the `install-unstable` preset and build directory.\n\nTo install **only** the experimental libraries, use the `install-unstable-only` preset and build directory.\n\n#### Conda\n\nCCCL also provides conda packages of each release via the `conda-forge` channel:\n\n```bash\nconda config --add channels conda-forge\nconda install cccl\n```\n\nThis will install the latest CCCL to the conda environment's `$CONDA_PREFIX/include/` and `$CONDA_PREFIX/lib/cmake/` directories.\nIt is discoverable by CMake via `find_package(CCCL)` and can be used by any compilers in the conda environment.\nFor more information, see [this introduction to conda-forge](https://conda-forge.org/docs/user/introduction/).\n\nIf you want to use the same CCCL version that shipped with a particular CUDA Toolkit, e.g. CUDA 12.4, you can install CCCL with:\n\n```bash\nconda config --add channels conda-forge\nconda install cuda-cccl cuda-version=12.4\n```\n\nThe `cuda-cccl` metapackage installs the `cccl` version that shipped with the CUDA Toolkit corresponding to `cuda-version`.\nIf you wish to update to the latest `cccl` after installing `cuda-cccl`, uninstall `cuda-cccl` before updating `cccl`:\n\n```bash\nconda uninstall cuda-cccl\nconda install -c conda-forge cccl\n```\n\n\u003e **Note**\n\u003e There are also conda packages with names like `cuda-cccl_linux-64`.\n\u003e Those packages contain the CCCL versions shipped as part of the CUDA Toolkit, but are designed for internal use by the CUDA Toolkit.\n\u003e Install `cccl` or `cuda-cccl` instead, for compatibility with conda compilers.\n\u003e For more information, see the [cccl conda-forge recipe](https://github.com/conda-forge/cccl-feedstock/blob/main/recipe/meta.yaml).\n\n##### CMake Integration\n\nCCCL uses [CMake](https://cmake.org/) for all build and installation infrastructure, including tests as well as targets to link against in other CMake projects.\nTherefore, CMake is the recommended way to integrate CCCL into another project.\n\nFor a complete example of how to do this using CMake Package Manager see [our basic example project](examples/basic).\n\nOther build systems should work, but only CMake is tested.\nContributions to simplify integrating CCCL into other build systems are welcome.\n\n### Contributors\n\nInterested in contributing to making CCCL better? Check out our [Contributing Guide](CONTRIBUTING.md) for a comprehensive overview of everything you need to know to set up your development environment, make changes, run tests, and submit a PR.\n\n## Platform Support\n\n**Objective:** This section describes where users can expect CCCL to compile and run successfully.\n\nIn general, CCCL should work everywhere the CUDA Toolkit is supported, however, the devil is in the details.\nThe sections below describe the details of support and testing for different versions of the CUDA Toolkit, host compilers, and C++ dialects.\n\n### CUDA Toolkit (CTK) Compatibility\n\n**Summary:**\n- The latest version of CCCL is backward compatible with the current and preceding CTK major version series\n- CCCL is never forward compatible with any version of the CTK. Always use the same or newer than what is included with your CTK.\n- Minor version CCCL upgrades won't break existing code, but new features may not support all CTK versions\n\nCCCL users are encouraged to capitalize on the latest enhancements and [\"live at head\"](https://www.youtube.com/watch?v=tISy7EJQPzI) by always using the newest version of CCCL.\nFor a seamless experience, you can upgrade CCCL independently of the entire CUDA Toolkit.\nThis is possible because CCCL maintains backward compatibility with the latest patch release of every minor CTK release from both the current and previous major version series.\nIn some exceptional cases, the minimum supported minor version of the CUDA Toolkit release may need to be newer than the oldest release within its major version series.\n\nWhen a new major CTK is released, we drop support for the oldest supported major version.\n\n| CCCL Version | Supports CUDA Toolkit Version                  |\n|--------------|------------------------------------------------|\n| 2.x          | 11.1 - 11.8, 12.x (only latest patch releases) |\n| 3.x          | 12.x, 13.x  (only latest patch releases)       |\n\n[Well-behaved code](#compatibility-guidelines) using the latest CCCL should compile and run successfully with any supported CTK version.\nExceptions may occur for new features that depend on new CTK features, so those features would not work on older versions of the CTK.\n\nUsers can integrate a newer version of CCCL into an older CTK, but not the other way around.\nThis means an older version of CCCL is not compatible with a newer CTK.\nIn other words, **CCCL is never forward compatible with the CUDA Toolkit.**\n\nThe table below summarizes compatibility of the CTK and CCCL:\n\n| CTK Version | Included CCCL Version |    Desired CCCL     | Supported? |                           Notes                           |\n|:-----------:|:---------------------:|:--------------------:|:----------:|:--------------------------------------------------------:|\n|  CTK `X.Y`  |  CCCL `MAJOR.MINOR`   | CCCL `MAJOR.MINOR+n` |    ✅     |            Some new features might not work              |\n|  CTK `X.Y`  |  CCCL `MAJOR.MINOR`   | CCCL `MAJOR+1.MINOR` |    ✅     | Possible breaks; some new features might not be available|\n|  CTK `X.Y`  |  CCCL `MAJOR.MINOR`   | CCCL `MAJOR+2.MINOR` |    ❌     |    CCCL supports only two CTK major versions             |\n|  CTK `X.Y`  |  CCCL `MAJOR.MINOR`   | CCCL `MAJOR.MINOR-n` |    ❌     |          CCCL isn't forward compatible                   |\n|  CTK `X.Y`  |  CCCL `MAJOR.MINOR`   | CCCL `MAJOR-n.MINOR` |    ❌     |          CCCL isn't forward compatible                   |\n\nFor more information on CCCL versioning, API/ABI compatibility, and breaking changes see the [Versioning](#versioning) section below.\n\n### Operating Systems\n\nUnless otherwise specified, CCCL supports all the same operating systems as the CUDA Toolkit, which are documented here:\n - [Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements)\n - [Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#system-requirements)\n\n### Host Compilers\n\nUnless otherwise specified, CCCL supports the same host compilers as the latest CUDA Toolkit, which are documented here:\n- [Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#host-compiler-support-policy)\n- [Windows](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#system-requirements)\n\nFor GCC on Linux, at least 7.x is required.\n\nWhen using older CUDA Toolkits, we also only support the host compilers of the latest CUDA Toolkit,\nbut at least the most recent host compiler of any supported older CUDA Toolkit.\n\nWe may retain support of additional compilers and will accept corresponding patches from the community with reasonable fixes.\nBut we will not invest significant time in triaging or fixing issues for older compilers.\n\nIn the spirit of \"You only support what you test\", see our [CI Overview](https://github.com/NVIDIA/cccl/blob/main/ci-overview.md) for more information on exactly what we test.\n\n### C++ Dialects\n- C++17\n- C++20\n\n### GPU Architectures\n\nUnless otherwise specified, CCCL supports all the same GPU architectures/Compute Capabilities as the CUDA Toolkit, which are documented here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability\n\nNote that some features may only support certain architectures/Compute Capabilities.\n\n### Testing Strategy\n\nCCCL's testing strategy strikes a balance between testing as many configurations as possible and maintaining reasonable CI times.\n\nFor CUDA Toolkit versions, testing is done against both the oldest and the newest supported versions.\nFor instance, if the latest version of the CUDA Toolkit is 12.6, tests are conducted against 11.1 and 12.6.\nFor each CUDA version, builds are completed against all supported host compilers with all supported C++ dialects.\n\nThe testing strategy and matrix are constantly evolving.\nThe matrix defined in the [`ci/matrix.yaml`](ci/matrix.yaml) file is the definitive source of truth.\nFor more information about our CI pipeline, see [here](ci-overview.md).\n\n## Versioning\n\n**Objective:** This section describes how CCCL is versioned, API/ABI stability guarantees, and compatibility guidelines to minimize upgrade headaches.\n\n**Summary**\n- The entirety of CCCL's API shares a common semantic version across all components\n- Only the most recently released version is supported and fixes are not backported to prior releases\n- API breaking changes and incrementing CCCL's major version will only coincide with a new major version release of the CUDA Toolkit\n- Not all source breaking changes are considered breaking changes of the public API that warrant bumping the major version number\n- Do not rely on ABI stability of entities in the `cub::` or `thrust::` namespaces\n- ABI breaking changes for symbols in the `cuda::` namespace may happen at any time, but will be reflected by incrementing the ABI version which is embedded in an inline namespace for all `cuda::` symbols. Multiple ABI versions may be supported concurrently.\n\n**Note:** Prior to merging Thrust, CUB, and libcudacxx into this repository, each library was independently versioned according to semantic versioning.\nStarting with the 2.1 release, all three libraries synchronized their release versions in their separate repositories.\nMoving forward, CCCL will continue to be released under a single [semantic version](https://semver.org/), with 2.2.0 being the first release from the [nvidia/cccl](www.github.com/nvidia/cccl) repository.\n\n### Breaking Change\n\nA Breaking Change is a change to **explicitly supported** functionality between released versions that would require a user to do work in order to upgrade to the newer version.\n\nIn the limit, [_any_ change](https://www.hyrumslaw.com/) has the potential to break someone somewhere.\nAs a result, not all possible source breaking changes are considered Breaking Changes to the public API that warrant bumping the major semantic version.\n\nThe sections below describe the details of breaking changes to CCCL's API and ABI.\n\n### Application Programming Interface (API)\n\nCCCL's public API is the entirety of the functionality _intentionally_ exposed to provide the utility of the library.\n\nIn other words, CCCL's public API goes beyond just function signatures and includes (but is not limited to):\n- The location and names of headers intended for direct inclusion in user code\n- The namespaces intended for direct use in user code\n- The declarations and/or definitions of functions, classes, and variables located in headers and intended for direct use in user code\n- The semantics of functions, classes, and variables intended for direct use in user code\n\nMoreover, CCCL's public API does **not** include any of the following:\n- Any symbol prefixed with `_` or `__`\n- Any symbol whose name contains `detail` including the `detail::` namespace or a macro\n- Any header file contained in a `detail/` directory or sub-directory thereof\n- The header files implicitly included by any header part of the public API\n\nIn general, the goal is to avoid breaking anything in the public API.\nSuch changes are made only if they offer users better performance, easier-to-understand APIs, and/or more consistent APIs.\n\nAny breaking change to the public API will require bumping CCCL's major version number.\nIn keeping with [CUDA Minor Version Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/#minor-version-compatibility),\nAPI breaking changes and CCCL major version bumps will only occur coinciding with a new major version release of the CUDA Toolkit.\n\nAnything not part of the public API may change at any time without warning.\n\n#### API Versioning\n\nThe public API of all CCCL's components share a unified semantic version of `MAJOR.MINOR.PATCH`.\n\nOnly the most recently released version is supported.\nAs a rule, features and bug fixes are not backported to previously released version or branches.\n\nThe preferred method for querying the version is to use `CCCL_[MAJOR/MINOR/PATCH_]VERSION` as described below.\nFor backwards compatibility, the Thrust/CUB/libcudacxxx version definitions are available and will always be consistent with `CCCL_VERSION`.\nNote that Thrust/CUB use a `MMMmmmpp` scheme whereas the CCCL and libcudacxx use `MMMmmmppp`.\n\n|                        | CCCL                                   | libcudacxx                                | Thrust                       | CUB                       |\n|------------------------|----------------------------------------|-------------------------------------------|------------------------------|---------------------------|\n| Header                 | `\u003ccuda/version\u003e`                       | `\u003ccuda/std/version\u003e`                      | `\u003cthrust/version.h\u003e`         | `\u003ccub/version.h\u003e`         |\n| Major Version          | `CCCL_MAJOR_VERSION`                   | `_LIBCUDACXX_CUDA_API_VERSION_MAJOR`      | `THRUST_MAJOR_VERSION`       | `CUB_MAJOR_VERSION`       |\n| Minor Version          | `CCCL_MINOR_VERSION`                   | `_LIBCUDACXX_CUDA_API_VERSION_MINOR`      | `THRUST_MINOR_VERSION`       | `CUB_MINOR_VERSION`       |\n| Patch/Subminor Version | `CCCL_PATCH_VERSION`                   | `_LIBCUDACXX_CUDA_API_VERSION_PATCH`      | `THRUST_SUBMINOR_VERSION`    | `CUB_SUBMINOR_VERSION`    |\n| Concatenated Version   | `CCCL_VERSION (MMMmmmppp)`             | `_LIBCUDACXX_CUDA_API_VERSION (MMMmmmppp)`| `THRUST_VERSION (MMMmmmpp)`  | `CUB_VERSION (MMMmmmpp)`  |\n\n### Application Binary Interface (ABI)\n\nThe Application Binary Interface (ABI) is a set of rules for:\n- How a library's components are represented in machine code\n- How those components interact across different translation units\n\nA library's ABI includes, but is not limited to:\n- The mangled names of functions and types\n- The size and alignment of objects and types\n- The semantics of the bytes in the binary representation of an object\n\nAn **ABI Breaking Change** is any change that results in a change to the ABI of a function or type in the public API.\nFor example, adding a new data member to a struct is an ABI Breaking Change as it changes the size of the type.\n\nIn CCCL, the guarantees about ABI are as follows:\n\n- Symbols in the `thrust::` and `cub::` namespaces may break ABI at any time without warning.\n- The ABI of `thrust::` and `cub::` [symbols includes the CUDA architectures used for compilation](https://nvidia.github.io/cccl/cub/developer_overview.html#symbols-visibility). Therefore, a `thrust::` or `cub::` symbol may have a different ABI if:\n    - compiled with different architectures\n    - compiled as a CUDA source file (`-x cu`) vs C++ source (`-x cpp`)\n- Symbols in the `cuda::` namespace may also break ABI at any time. However, `cuda::` symbols embed an ABI version number that is incremented whenever an ABI break occurs. Multiple ABI versions may be supported concurrently, and therefore users have the option to revert to a prior ABI version. For more information, see [here](libcudacxx/docs/releases/versioning.md).\n\n**Who should care about ABI?**\n\nIn general, CCCL users only need to worry about ABI issues when building or using a binary artifact (like a shared library) whose API directly or indirectly includes types provided by CCCL.\n\nFor example, consider if `libA.so` was built using CCCL version `X` and its public API includes a function like:\n```c++\nvoid foo(cuda::std::optional\u003cint\u003e);\n```\n\nIf another library, `libB.so`, is compiled using CCCL version `Y` and uses `foo` from `libA.so`, then this can fail if there was an ABI break between version `X` and `Y`.\nUnlike with API breaking changes, ABI breaks usually do not require code changes and only require recompiling everything to use the same ABI version.\n\nTo learn more about ABI and why it is important, see [What is ABI, and What Should C++ Do About It?](https://wg21.link/P2028R0).\n\n### Compatibility Guidelines\n\nAs mentioned above, not all possible source breaking changes constitute a Breaking Change that would require incrementing CCCL's API major version number.\n\nUsers are encouraged to adhere to the following guidelines in order to minimize the risk of disruptions from accidentally depending on parts of CCCL that are not part of the public API:\n\n- Do not add any declarations to, or specialize any template from, the `thrust::`, `cub::`, `nv::`, or `cuda::` namespaces unless an exception is noted for a specific symbol, e.g., specializing `cuda::std::iterator_traits`\n    - **Rationale**: This would cause conflicts if a symbol or specialization is added with the same name.\n- Do not take the address of any API in the `thrust::`, `cub::`, `cuda::`, or `nv::` namespaces.\n    - **Rationale**: This would prevent adding overloads of these APIs.\n- Do not forward declare any API in the `thrust::`, `cub::`, `cuda::`, or `nv::` namespaces.\n    - **Rationale**: This would prevent adding overloads of these APIs.\n- Do not directly reference any symbol prefixed with `_`, `__`, or with `detail` anywhere in its name including a `detail::` namespace or macro\n     - **Rationale**: These symbols are for internal use only and may change at any time without warning.\n- Include what you use. For every CCCL symbol that you use, directly `#include` the header file that declares that symbol. In other words, do not rely on headers implicitly included by other headers.\n     - **Rationale**: Internal includes may change at any time.\n\nPortions of this section were inspired by [Abseil's Compatibility Guidelines](https://abseil.io/about/compatibility).\n\n## Deprecation Policy\n\nWe will do our best to notify users prior to making any breaking changes to the public API, ABI, or modifying the supported platforms and compilers.\n\nAs appropriate, deprecations will come in the form of programmatic warnings which can be disabled.\n\nThe deprecation period will depend on the impact of the change, but will usually last at least 2 minor version releases.\n\n\n## Mapping to CTK Versions\n\n| CCCL version | CTK version |\n|--------------|-------------|\n| 3.0          | 13.0        |\n| ...          | ...         |\n| 2.8          | 12.9        |\n| 2.7          | 12.8        |\n| 2.5          | 12.6        |\n| 2.4          | 12.5        |\n| 2.3          | 12.4        |\n\nTest yourself: https://cuda.godbolt.org/z/K818M4Y9f\n\nCTKs before 12.4 shipped Thrust, CUB and libcudacxx as individual libraries.\n\n| Thrust/CUB/libcudacxx version | CTK version |\n|-------------------------------|-------------|\n| 2.2                           | 12.3        |\n| 2.1                           | 12.2        |\n| 2.0/2.0/1.9                   | 12.1        |\n| 2.0/2.0/1.9                   | 12.0        |\n\n\n## CI Pipeline Overview\n\nFor a detailed overview of the CI pipeline, see [ci-overview.md](ci-overview.md).\n\n## Related Projects\n\nProjects that are related to CCCL's mission to make CUDA more delightful:\n- [cuCollections](https://github.com/NVIDIA/cuCollections) - GPU accelerated data structures like hash tables\n- [NVBench](https://github.com/NVIDIA/nvbench) - Benchmarking library tailored for CUDA applications\n- [stdexec](https://github.com/nvidia/stdexec) - Reference implementation for Senders asynchronous programming model\n\n## Projects Using CCCL\n\nDoes your project use CCCL? [Open a PR to add your project to this list!](https://github.com/NVIDIA/cccl/edit/main/README.md)\n\n- [AmgX](https://github.com/NVIDIA/AMGX) - Multi-grid linear solver library\n- [ColossalAI](https://github.com/hpcaitech/ColossalAI) - Tools for writing distributed deep learning models\n- [cuDF](https://github.com/rapidsai/cudf) - Algorithms and file readers for ETL data analytics\n- [cuGraph](https://github.com/rapidsai/cugraph) - Algorithms for graph analytics\n- [cuML](https://github.com/rapidsai/cuml) - Machine learning algorithms and primitives\n- [CuPy](https://cupy.dev) - NumPy \u0026 SciPy for GPU\n- [cuSOLVER](https://developer.nvidia.com/cusolver) - Dense and sparse linear solvers\n- [GooFit](https://github.com/GooFit/GooFit) - Library for maximum-likelihood fits\n- [HeavyDB](https://github.com/heavyai/heavydb) - SQL database engine\n- [HOOMD](https://github.com/glotzerlab/hoomd-blue) - Monte Carlo and molecular dynamics simulations\n- [HugeCTR](https://github.com/NVIDIA-Merlin/HugeCTR) - GPU-accelerated recommender framework\n- [Hydra](https://github.com/MultithreadCorner/Hydra) - High-energy Physics Data Analysis\n- [Hypre](https://github.com/hypre-space/hypre) - Multigrid linear solvers\n- [LightSeq](https://github.com/bytedance/lightseq) - Training and inference for sequence processing and generation\n- [MatX](https://github.com/NVIDIA/matx) - Numerical computing library using expression templates to provide efficient, Python-like syntax\n- [PyTorch](https://github.com/pytorch/pytorch) - Tensor and neural network computations\n- [Qiskit](https://github.com/Qiskit/qiskit-aer) - High performance simulator for quantum circuits\n- [QUDA](https://github.com/lattice/quda) - Lattice quantum chromodynamics (QCD) computations\n- [RAFT](https://github.com/rapidsai/raft) - Algorithms and primitives for machine learning\n- [TensorFlow](https://github.com/tensorflow/tensorflow) - End-to-end platform for machine learning\n- [TensorRT](https://github.com/NVIDIA/TensorRT) - Deep learning inference\n- [tsne-cuda](https://github.com/CannyLab/tsne-cuda) - Stochastic Neighborhood Embedding library\n- [Visualization Toolkit (VTK)](https://gitlab.kitware.com/vtk/vtk) - Rendering and visualization library\n- [XGBoost](https://github.com/dmlc/xgboost) - Gradient boosting machine learning algorithms\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia%2Fcccl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnvidia%2Fcccl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia%2Fcccl/lists"}