{"id":13738977,"url":"https://github.com/kerneltuner/kernel_tuner","last_synced_at":"2025-05-15T04:08:07.939Z","repository":{"id":11409483,"uuid":"54894320","full_name":"KernelTuner/kernel_tuner","owner":"KernelTuner","description":"Kernel Tuner","archived":false,"fork":false,"pushed_at":"2025-05-14T11:32:09.000Z","size":42962,"stargazers_count":336,"open_issues_count":21,"forks_count":54,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-05-15T04:08:00.766Z","etag":null,"topics":["auto-tuning","autotuning","c","cplusplus","cuda","cuda-kernels","gpu","gpu-computing","kernel-tuner","machine-learning","opencl","opencl-kernels","optimization","python","software-development","testing"],"latest_commit_sha":null,"homepage":"https://kerneltuner.github.io/kernel_tuner/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KernelTuner.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":".zenodo.json"}},"created_at":"2016-03-28T13:32:17.000Z","updated_at":"2025-05-13T09:47:13.000Z","dependencies_parsed_at":"2023-12-20T14:35:51.357Z","dependency_job_id":"ad872be5-9766-4deb-879d-50105fd9f6d7","html_url":"https://github.com/KernelTuner/kernel_tuner","commit_stats":{"total_commits":1579,"total_committers":32,"mean_commits":49.34375,"dds":0.4186193793540215,"last_synced_commit":"5465a0905f894ab111a0617b69a5cedd1ebce9d1"},"previous_names":[],"tags_count":31,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KernelTuner%2Fkernel_tuner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KernelTuner%2Fkernel_tuner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KernelTuner%2Fkernel_tuner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KernelTuner%2Fkernel_tuner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KernelTuner","download_url":"https://codeload.github.com/KernelTuner/kernel_tuner/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254270656,"owners_count":22042860,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-tuning","autotuning","c","cplusplus","cuda","cuda-kernels","gpu","gpu-computing","kernel-tuner","machine-learning","opencl","opencl-kernels","optimization","python","software-development","testing"],"created_at":"2024-08-03T04:00:22.259Z","updated_at":"2025-05-15T04:08:02.926Z","avatar_url":"https://github.com/KernelTuner.png","language":"Python","readme":"\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"500px\" src=\"https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/KernelTuner-logo.png\"/\u003e\n\u003c/div\u003e\n\n---\n[![Build Status](https://github.com/KernelTuner/kernel_tuner/actions/workflows/test-python-package.yml/badge.svg)](https://github.com/KernelTuner/kernel_tuner/actions/workflows/test-python-package.yml)\n[![CodeCov Badge](https://codecov.io/gh/KernelTuner/kernel_tuner/branch/master/graph/badge.svg)](https://codecov.io/gh/KernelTuner/kernel_tuner)\n[![PyPi Badge](https://img.shields.io/pypi/v/kernel_tuner.svg?colorB=blue)](https://pypi.python.org/pypi/kernel_tuner/)\n[![Zenodo Badge](https://zenodo.org/badge/54894320.svg)](https://zenodo.org/badge/latestdoi/54894320)\n[![SonarCloud Badge](https://sonarcloud.io/api/project_badges/measure?project=KernelTuner_kernel_tuner\u0026metric=alert_status)](https://sonarcloud.io/dashboard?id=KernelTuner_kernel_tuner)\n[![OpenSSF Badge](https://bestpractices.coreinfrastructure.org/projects/6573/badge)](https://bestpractices.coreinfrastructure.org/projects/6573)\n[![FairSoftware Badge](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu)\n---\n\n\nCreate optimized GPU applications in any mainstream GPU \nprogramming language (CUDA, HIP, OpenCL, OpenACC).\n\nWhat Kernel Tuner does:\n\n- Works as an external tool to benchmark and optimize GPU kernels in isolation\n- Can be used directly on existing kernel code without extensive changes \n- Can be used with applications in any host programming language\n- Blazing fast search space construction\n- More than 20 [optimization algorithms](https://kerneltuner.github.io/kernel_tuner/stable/optimization.html) to speedup tuning\n- Energy measurements and optimizations [(power capping, clock frequency tuning)](https://arxiv.org/abs/2211.07260)\n- ... and much more! For example, [caching](https://kerneltuner.github.io/kernel_tuner/stable/cache_files.html), [output verification](https://kerneltuner.github.io/kernel_tuner/stable/correctness.html), [tuning host and device code](https://kerneltuner.github.io/kernel_tuner/stable/hostcode.html), [user defined metrics](https://kerneltuner.github.io/kernel_tuner/stable/metrics.html), see [the full documentation](https://kerneltuner.github.io/kernel_tuner/stable/index.html).\n\n\n\n## Installation\n\n- First, make sure you have your [CUDA](https://kerneltuner.github.io/kernel_tuner/stable/install.html#cuda-and-pycuda), [OpenCL](https://kerneltuner.github.io/kernel_tuner/stable/install.html#opencl-and-pyopencl), or [HIP](https://kerneltuner.github.io/kernel_tuner/stable/install.html#hip-and-hip-python) compiler installed\n- Then type: `pip install kernel_tuner[cuda]`, `pip install kernel_tuner[opencl]`, or `pip install kernel_tuner[hip]`\n- or why not all of them: `pip install kernel_tuner[cuda,opencl,hip]`\n\nMore information on installation, also for other languages, in the [installation guide](http://kerneltuner.github.io/kernel_tuner/stable/install.html).\n\n## Example\n\n```python\nimport numpy as np\nfrom kernel_tuner import tune_kernel\n\nkernel_string = \"\"\"\n__global__ void vector_add(float *c, float *a, float *b, int n) {\n    int i = blockIdx.x * block_size_x + threadIdx.x;\n    if (i\u003cn) {\n        c[i] = a[i] + b[i];\n    }\n}\n\"\"\"\n\nn = np.int32(10000000)\n\na = np.random.randn(n).astype(np.float32)\nb = np.random.randn(n).astype(np.float32)\nc = np.zeros_like(a)\n\nargs = [c, a, b, n]\n\ntune_params = {\"block_size_x\": [32, 64, 128, 256, 512]}\n\ntune_kernel(\"vector_add\", kernel_string, n, args, tune_params)\n```\n\nMore [examples here](https://kerneltuner.github.io/kernel_tuner/stable/examples.html).\n\n## Resources\n\n- [Full documentation](https://kerneltuner.github.io/kernel_tuner/stable/)\n- Guides:\n  - [Getting Started](https://kerneltuner.github.io/kernel_tuner/stable/quickstart.html)\n  - [Convolution](https://kerneltuner.github.io/kernel_tuner/stable/convolution.html)\n  - [Diffusion](https://kerneltuner.github.io/kernel_tuner/stable/diffusion.html)\n  - [Matrix Multiplication](https://kerneltuner.github.io/kernel_tuner/stable/matrix_multiplication.html)\n- Features \u0026 Use cases:\n  - [Full list of examples](https://kerneltuner.github.io/kernel_tuner/stable/examples.html)\n  - [Output verification](https://kerneltuner.github.io/kernel_tuner/stable/correctness.html)\n  - [Test GPU code from Python](https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/test_vector_add.py)\n  - [Tune code in both host and device code](https://kerneltuner.github.io/kernel_tuner/stable/hostcode.html)\n  - [Optimization algorithms](https://kerneltuner.github.io/kernel_tuner/stable/optimization.html)\n  - [Mixed-precision \u0026 Accuracy tuning](https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/accuracy.py)\n  - [Custom metrics \u0026 tuning objectives](https://kerneltuner.github.io/kernel_tuner/stable/metrics.html)\n- **Kernel Tuner Tutorial** slides [[PDF]](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/slides/2022_SURF/SURF22-Kernel-Tuner-Tutorial.pdf), hands-on:\n  - Vector add example [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/00_Kernel_Tuner_Introduction.ipynb)] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/00_Kernel_Tuner_Introduction.ipynb)\n  - Tuning thread block dimensions [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/01_Kernel_Tuner_Getting_Started.ipynb)] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/01_Kernel_Tuner_Getting_Started.ipynb)\n  - Search space restrictions \u0026 output verification [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/02_Kernel_Tuner_Intermediate.ipynb)] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/02_Kernel_Tuner_Intermediate.ipynb)\n  - Visualization \u0026 search space optimization [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/03_Kernel_Tuner_Advanced.ipynb)] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/hands-on/cuda/03_Kernel_Tuner_Advanced.ipynb)\n- **Energy Efficient GPU Computing** tutorial slides [[PDF]](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/slides/2023_Supercomputing/SC23.pdf), hands-on:\n  - Kernel Tuner for GPU energy measurements [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/00_Kernel_Tuner_Introduction.ipynb)] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/00_Kernel_Tuner_Introduction.ipynb)\n  - Code optimizations for energy [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/01_Code_Optimizations_for_Energy.ipynb)] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/01_Code_Optimizations_for_Energy.ipynb)\n  - Mixed precision and accuracy tuning [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/02_Mixed_precision_programming.ipynb)] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/02_Mixed_precision_programming.ipynb)\n  - Optimzing for time vs for energy [[.ipynb](https://github.com/KernelTuner/kernel_tuner_tutorial/blob/master/energy/03_energy_efficient_computing.ipynb)] [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/KernelTuner/kernel_tuner_tutorial/blob/master/energy/03_energy_efficient_computing.ipynb)\n\n\n## Kernel Tuner ecosystem\n\n\u003cimg width=\"250px\" src=\"https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/kernel_launcher.png\"/\u003e\u003cbr /\u003eC++ magic to integrate auto-tuned kernels into C++ applications \n\n\u003cimg width=\"250px\" src=\"https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/kernel_float.png\"/\u003e\u003cbr /\u003eC++ data types for mixed-precision CUDA kernel programming\n\n\u003cimg width=\"275px\" src=\"https://raw.githubusercontent.com/KernelTuner/kernel_tuner/master/doc/images/kernel_dashboard.png\"/\u003e\u003cbr /\u003eMonitor, analyze, and visualize auto-tuning runs\n\n\n## Communication \u0026 Contribution\n\n- GitHub [Issues](https://github.com/KernelTuner/kernel_tuner/issues): Bug reports, install issues, feature requests, work in progress\n- GitHub [Discussion group](https://github.com/orgs/KernelTuner/discussions): General questions, Q\u0026A, thoughts\n\nContributions are welcome! For feature requests, bug reports, or usage problems, please feel free to create an issue.\nFor more extensive contributions, check the [contribution guide](http://kerneltuner.github.io/kernel_tuner/stable/contributing.html).\n\n## Citation\n\nIf you use Kernel Tuner in research or research software, please cite the most relevant among the [publications on Kernel \nTuner](https://kerneltuner.github.io/kernel_tuner/stable/#citation). To refer to the project as a whole, please cite:\n\n```latex\n@article{kerneltuner,\n  author  = {Ben van Werkhoven},\n  title   = {Kernel Tuner: A search-optimizing GPU code auto-tuner},\n  journal = {Future Generation Computer Systems},\n  year = {2019},\n  volume  = {90},\n  pages = {347-358},\n  url = {https://www.sciencedirect.com/science/article/pii/S0167739X18313359},\n  doi = {https://doi.org/10.1016/j.future.2018.08.004}\n}\n```\n\n","funding_links":[],"categories":["Green Software Awesome List"],"sub_categories":["Contents"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkerneltuner%2Fkernel_tuner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkerneltuner%2Fkernel_tuner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkerneltuner%2Fkernel_tuner/lists"}