https://github.com/perhuepenbecker/cudyn
CUDA library for irregular tasks using a dynamic block-internal balancing mechanism
https://github.com/perhuepenbecker/cudyn
cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular
Last synced: about 1 month ago
JSON representation
CUDA library for irregular tasks using a dynamic block-internal balancing mechanism
- Host: GitHub
- URL: https://github.com/perhuepenbecker/cudyn
- Owner: PerHuepenbecker
- Created: 2025-04-05T09:39:36.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-04T20:38:31.000Z (12 months ago)
- Last Synced: 2025-06-05T02:09:07.327Z (12 months ago)
- Topics: cpp, cuda, cuda-library, cuda-programming, gpu-computing, gpu-programming, irregular
- Language: Cuda
- Homepage:
- Size: 44.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cudyn: Dynamic Task Scheduling for Irregular Applications on GPUs
## Short Description
**Cudyn (CUDA Dynamic)** is a generic C++ CUDA library designed for the efficient execution of irregular applications on NVIDIA GPUs. At its core, the library features a dynamic task scheduling mechanism at the thread block level
## Motivation
Modern GPUs offer enormous potential for parallel computations but are primarily optimized for regular applications where control flow and memory accesses are predictable. Irregular applications, characterized by data-dependent control flow and memory access patterns (e.g., in graph algorithms, sparse matrix operations, or simulations with stochastic runtimes), pose a significant challenge for efficient GPU utilization. *Cudyn* addresses this challenge by providing a flexible and user-friendly framework for dynamically scheduling such irregular tasks.
## Core Features
* **Dynamic Task Scheduling:** Implements a block-internal dynamic scheduler that allows GPU threads to atomically fetch tasks from a shared pool as needed. This aims for better utilization and reduction of idle times, especially with highly variable task runtimes.
* **Genericity via C++ Templates:** Core components are templated to support arbitrary user-defined logic (as functors or lambdas) and data types, enabling broad applicability.
* **Policy-Based Design:** The `Launcher` and `Schedulers` are designed using the policy idiom, offering high modularity and extensibility for future scheduling strategies.
* **Abstraction and User-Friendliness:**
* `Cudyn::Utils::Memory::CudynDevicePointer`: An RAII-based wrapper for safe and simplified GPU memory management.
* `Cudyn::Utils::GridConfiguration`: A component for flexible definition and validation of kernel launch configurations.
* **Integrated SpMV Support:**
* `Cudyn::CSR`: A module for handling Sparse Matrix-Vector Multiplication (SpMV) for matrices in the Compressed Sparse Row (CSR) format.
* `Cudyn::MatrixMarketParser`: A parser for reading matrices from the common MatrixMarket format.
* **Profiling Tools:** Includes a `launchProfiled` function for easy measurement and analysis of kernel runtimes.
## Architectural Overview
*Cudyn* has a modular structure, primarily organized through C++ namespaces. A deep classical object-oriented hierarchy has been intentionally avoided in favor of flexibility and template-based composition.
The main components are:
* **`Cudyn::Scheduler`**: Contains the logic for dynamic task distribution within the GPU kernel (e.g., `genericIrregularKernel`, `genericIrregularKernelLowAtomics`).
* **`Cudyn::Launcher`**: Serves as the central API interface for invoking kernels via a chosen scheduling policy.
* **`Cudyn::Utils`**:
* `GridConfiguration`: For managing and validating grid and block dimensions.
* `Memory`: Provides the `CudynDevicePointer` wrapper for memory management.
* **`Cudyn::CSR`** (optional): Offers data structures and kernel functors for SpMV operations.
* **`Cudyn::MatrixMarketParser`** (optional): Enables loading of matrices.
## Prerequisites
* CUDA Toolkit (Version 12.x or newer recommended)
* A C++ compiler with support for C++17 (or newer)
## Installation
git clone [https://github.com/PerHuepenbecker/Cudyn.git](https://github.com/PerHuepenbecker/Cudyn.git)