{"id":28328697,"url":"https://github.com/perhuepenbecker/cudyn","last_synced_at":"2026-04-28T00:32:02.242Z","repository":{"id":294423899,"uuid":"960891137","full_name":"PerHuepenbecker/Cudyn","owner":"PerHuepenbecker","description":"CUDA library for irregular tasks using a dynamic block-internal balancing mechanism ","archived":false,"fork":false,"pushed_at":"2025-06-04T20:38:31.000Z","size":46290,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-05T02:09:07.327Z","etag":null,"topics":["cpp","cuda","cuda-library","cuda-programming","gpu-computing","gpu-programming","irregular"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PerHuepenbecker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-05T09:39:36.000Z","updated_at":"2025-05-20T10:55:37.000Z","dependencies_parsed_at":"2025-05-20T11:38:14.938Z","dependency_job_id":"ec3a3e38-0f86-49b4-ae48-107be8f458a4","html_url":"https://github.com/PerHuepenbecker/Cudyn","commit_stats":null,"previous_names":["perhuepenbecker/cudyn"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/PerHuepenbecker/Cudyn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PerHuepenbecker%2FCudyn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PerHuepenbecker%2FCudyn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PerHuepenbecker%2FCudyn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PerHuepenbecker%2FCudyn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PerHuepenbecker","download_url":"https://codeload.github.com/PerHuepenbecker/Cudyn/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PerHuepenbecker%2FCudyn/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261103452,"owners_count":23109932,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","cuda","cuda-library","cuda-programming","gpu-computing","gpu-programming","irregular"],"created_at":"2025-05-26T08:16:39.961Z","updated_at":"2026-04-28T00:32:02.237Z","avatar_url":"https://github.com/PerHuepenbecker.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cudyn: Dynamic Task Scheduling for Irregular Applications on GPUs\n\n## Short Description\n\n**Cudyn (CUDA Dynamic)** is a generic C++ CUDA library designed for the efficient execution of irregular applications on NVIDIA GPUs. At its core, the library features a dynamic task scheduling mechanism at the thread block level\n\n## Motivation\n\nModern GPUs offer enormous potential for parallel computations but are primarily optimized for regular applications where control flow and memory accesses are predictable. Irregular applications, characterized by data-dependent control flow and memory access patterns (e.g., in graph algorithms, sparse matrix operations, or simulations with stochastic runtimes), pose a significant challenge for efficient GPU utilization. *Cudyn* addresses this challenge by providing a flexible and user-friendly framework for dynamically scheduling such irregular tasks.\n\n## Core Features\n\n* **Dynamic Task Scheduling:** Implements a block-internal dynamic scheduler that allows GPU threads to atomically fetch tasks from a shared pool as needed. This aims for better utilization and reduction of idle times, especially with highly variable task runtimes.\n* **Genericity via C++ Templates:** Core components are templated to support arbitrary user-defined logic (as functors or lambdas) and data types, enabling broad applicability.\n* **Policy-Based Design:** The `Launcher` and `Schedulers` are designed using the policy idiom, offering high modularity and extensibility for future scheduling strategies.\n* **Abstraction and User-Friendliness:**\n    * `Cudyn::Utils::Memory::CudynDevicePointer`: An RAII-based wrapper for safe and simplified GPU memory management.\n    * `Cudyn::Utils::GridConfiguration`: A component for flexible definition and validation of kernel launch configurations.\n* **Integrated SpMV Support:**\n    * `Cudyn::CSR`: A module for handling Sparse Matrix-Vector Multiplication (SpMV) for matrices in the Compressed Sparse Row (CSR) format.\n    * `Cudyn::MatrixMarketParser`: A parser for reading matrices from the common MatrixMarket format.\n* **Profiling Tools:** Includes a `launchProfiled` function for easy measurement and analysis of kernel runtimes.\n\n## Architectural Overview\n\n*Cudyn* has a modular structure, primarily organized through C++ namespaces. A deep classical object-oriented hierarchy has been intentionally avoided in favor of flexibility and template-based composition.\n\nThe main components are:\n* **`Cudyn::Scheduler`**: Contains the logic for dynamic task distribution within the GPU kernel (e.g., `genericIrregularKernel`, `genericIrregularKernelLowAtomics`).\n* **`Cudyn::Launcher`**: Serves as the central API interface for invoking kernels via a chosen scheduling policy.\n* **`Cudyn::Utils`**:\n    * `GridConfiguration`: For managing and validating grid and block dimensions.\n    * `Memory`: Provides the `CudynDevicePointer` wrapper for memory management.\n* **`Cudyn::CSR`** (optional): Offers data structures and kernel functors for SpMV operations.\n* **`Cudyn::MatrixMarketParser`** (optional): Enables loading of matrices.\n\n## Prerequisites\n\n* CUDA Toolkit (Version 12.x or newer recommended)\n* A C++ compiler with support for C++17 (or newer)\n\n## Installation\n\ngit clone [https://github.com/PerHuepenbecker/Cudyn.git](https://github.com/PerHuepenbecker/Cudyn.git)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperhuepenbecker%2Fcudyn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fperhuepenbecker%2Fcudyn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperhuepenbecker%2Fcudyn/lists"}