{"id":20379948,"url":"https://github.com/kerneltuner/kernel_launcher","last_synced_at":"2025-04-12T08:33:34.503Z","repository":{"id":54969594,"uuid":"521251695","full_name":"KernelTuner/kernel_launcher","owner":"KernelTuner","description":"Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner","archived":false,"fork":false,"pushed_at":"2024-04-25T11:24:34.000Z","size":5056,"stargazers_count":20,"open_issues_count":2,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-26T03:41:34.886Z","etag":null,"topics":["cpp","cuda","gpu","kernel-tuner"],"latest_commit_sha":null,"homepage":"https://KernelTuner.github.io/kernel_launcher/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KernelTuner.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-08-04T12:08:55.000Z","updated_at":"2025-02-06T21:37:02.000Z","dependencies_parsed_at":"2023-11-23T13:38:35.982Z","dependency_job_id":"85fa31dd-fc36-4685-a76c-ea6482818044","html_url":"https://github.com/KernelTuner/kernel_launcher","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KernelTuner%2Fkernel_launcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KernelTuner%2Fkernel_launcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KernelTuner%2Fkernel_launcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KernelTuner%2Fkernel_launcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KernelTuner","download_url":"https://codeload.github.com/KernelTuner/kernel_launcher/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248540553,"owners_count":21121376,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","cuda","gpu","kernel-tuner"],"created_at":"2024-11-15T02:05:42.073Z","updated_at":"2025-04-12T08:33:34.483Z","avatar_url":"https://github.com/KernelTuner.png","language":"C++","readme":"# Kernel Launcher\n\n![Kernel Launcher logo](https://kerneltuner.github.io/kernel_launcher/_images/logo.png)\n\n\n[![github](https://img.shields.io/badge/github-repo-000.svg?logo=github\u0026labelColor=gray\u0026color=blue)](https://github.com/KernelTuner/kernel_launcher/)\n![GitHub branch checks state](https://img.shields.io/github/actions/workflow/status/KernelTuner/kernel_launcher/docs.yml)\n![GitHub](https://img.shields.io/github/license/KernelTuner/kernel_launcher)\n![GitHub tag (latest by date)](https://img.shields.io/github/v/tag/KernelTuner/kernel_launcher)\n![GitHub Repo stars](https://img.shields.io/github/stars/KernelTuner/kernel_launcher?style=social)\n\n\n\n\n_Kernel Launcher_ is a C++ library that enables dynamic compilation of _CUDA_ kernels at run time (using [NVRTC](https://docs.nvidia.com/cuda/nvrtc/index.html)) and launching them in an easy type-safe way using C++ magic.\nOn top of that, Kernel Launcher supports _capturing_ kernel launches, to enable tuning by [Kernel Tuner](https://github.com/KernelTuner/kernel_tuner), and importing the tuning results, known as _wisdom_ files, back into the application.\nThe result: highly efficient GPU applications with maximum portability.\n\n\n\n\n## Installation\n\nRecommended installation is using CMake. See the [installation guide](https://kerneltuner.github.io/kernel_launcher/install.html).\n\n## Example\n\nThere are many ways of using Kernel Launcher. See the documentation for [examples](https://kerneltuner.github.io/kernel_launcher/example.html) or check out the [examples/](https://github.com/KernelTuner/kernel_launcher/tree/master/examples) directory.\n\n\n### Pragma-based API\nBelow shows an example of using the pragma-based API, which allows existing CUDA kernels to be annotated with Kernel-Launcher-specific directives.\n\n**kernel.cu**\n```cpp\n#pragma kernel tune(threads_per_block=32, 64, 128, 256, 512, 1024)\n#pragma kernel block_size(threads_per_block)\n#pragma kernel problem_size(n)\n#pragma kernel buffers(A[n], B[n], C[n])\ntemplate \u003ctypename T, int threads_per_block\u003e\n__global__ void vector_add(int n, T *C, const T *A, const T *B) {\n    int i = blockIdx.x * threads_per_block + threadIdx.x;\n    if (i \u003c n) {\n        C[i] = A[i] + B[i];\n    }\n}\n```\n\n**main.cpp**\n```cpp\n#include \"kernel_launcher.h\"\n\nint main() {\n    // Initialize CUDA memory. This is outside the scope of kernel_launcher.\n    unsigned int n = 1000000;\n    float *dev_A, *dev_B, *dev_C;\n    /* cudaMalloc, cudaMemcpy, ... */\n\n    // Namespace alias.\n    namespace kl = kernel_launcher;\n\n    // Launch the kernel! Again, the grid size and block size do not need to\n    // be specified, they are calculated from the kernel specifications and\n    // run-time arguments.\n    kl::launch(\n        kl::PragmaKernel(\"vector_add\", \"kernel.cu\", {\"float\"}),\n        n, dev_C, dev_A, dev_B\n    );\n}\n\n```\n\n\n### Builder-based API\nBelow shows an example of the `KernelBuilder`-based API.\nThis offers more flexiblity than the pragma-based API, but is also more verbose:\n\n**kernel.cu**\n```cpp\ntemplate \u003ctypename T\u003e\n__global__ void vector_add(int n, T *C, const T *A, const T *B) {\n    int i = blockIdx.x * blockDim.x + threadIdx.x;\n    if (i \u003c n) {\n        C[i] = A[i] + B[i];\n    }\n}\n```\n\n**main.cpp**\n```cpp\n#include \"kernel_launcher.h\"\n\nint main() {\n    // Namespace alias.\n    namespace kl = kernel_launcher;\n\n    // Define the variables that can be tuned for this kernel.\n    auto space = kl::ConfigSpace();\n    auto threads_per_block = space.tune(\"block_size\", {32, 64, 128, 256, 512, 1024});\n\n    // Create a kernel builder and set kernel properties such as block size,\n    // grid divisor, template arguments, etc.\n    auto builder = kl::KernelBuilder(\"vector_add\", \"kernel.cu\", space);\n    builder\n        .template_args(kl::type_of\u003cfloat\u003e())\n        .problem_size(kl::arg0)\n        .block_size(threads_per_block);\n\n    // Define the kernel\n    auto vector_add_kernel = kl::WisdomKernel(builder);\n\n    // Initialize CUDA memory. This is outside the scope of kernel_launcher.\n    unsigned int n = 1000000;\n    float *dev_A, *dev_B, *dev_C;\n    /* cudaMalloc, cudaMemcpy, ... */\n\n    // Launch the kernel! Note that kernel is compiled on the first call.\n    // The grid size and block size do not need to be specified, they are\n    // derived from the kernel specifications and run-time arguments.\n    vector_add_kernel(n, dev_C, dev_A, dev_B);\n}\n```\n\n\n\n## License\n\nLicensed under Apache 2.0. See [LICENSE](https://github.com/KernelTuner/kernel_launcher/blob/master/LICENSE).\n\n\n## Citation\n\nIf you use Kernel Launcher in your work, please cite the following publication:\n\n\u003e S. Heldens, B. van Werkhoven (2023), \"Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications\", The Eighteenth International Workshop on Automatic Performance Tuning (iWAPT2023) co-located with IPDPS 2023\n\nAs BibTeX:\n\n```Latex\n@article{heldens2023kernellauncher,\n  title={Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications},\n  author={Heldens, Stijn and van Werkhoven, Ben},\n  journal={The Eighteenth International Workshop on Automatic Performance Tuning (iWAPT2023) co-located with IPDPS 2023},\n  year={2023}\n}\n```\n\n## Related Work\n\n* [Kernel Tuner](https://github.com/KernelTuner/kernel_tuner)\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkerneltuner%2Fkernel_launcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkerneltuner%2Fkernel_launcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkerneltuner%2Fkernel_launcher/lists"}