https://github.com/NVIDIA/cutile-python
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
https://github.com/NVIDIA/cutile-python
Last synced: 23 days ago
JSON representation
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
- Host: GitHub
- URL: https://github.com/NVIDIA/cutile-python
- Owner: NVIDIA
- License: other
- Created: 2025-06-13T22:07:17.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-12-05T07:22:12.000Z (about 1 month ago)
- Last Synced: 2025-12-07T09:20:10.966Z (about 1 month ago)
- Language: Python
- Size: 474 KB
- Stars: 576
- Watchers: 5
- Forks: 24
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Security: SECURITY.md
Awesome Lists containing this project
- AiTreasureBox - NVIDIA/cutile-python - 12-14_1468_4](https://img.shields.io/github/stars/NVIDIA/cutile-python.svg)|cuTile is a programming model for writing parallel kernels for NVIDIA GPUs| (Repos)
README
cuTile Python
=============
cuTile Python is a programming language for NVIDIA GPUs. The official documentation can be found
on [docs.nvidia.com](https://docs.nvidia.com/cuda/cutile-python),
or built from source located in the [docs](docs/) folder.
Example
-------
```python
# This examples uses CuPy which can be installed via `pip install cupy-cuda13x`
# Make sure cuda toolkit 13.1+ is installed: https://developer.nvidia.com/cuda-downloads
import cuda.tile as ct
import cupy
import numpy as np
TILE_SIZE = 16
# cuTile kernel for adding two dense vectors. It runs in parallel on the GPU.
@ct.kernel
def vector_add_kernel(a, b, result):
block_id = ct.bid(0)
a_tile = ct.load(a, index=(block_id,), shape=(TILE_SIZE,))
b_tile = ct.load(b, index=(block_id,), shape=(TILE_SIZE,))
result_tile = a_tile + b_tile
ct.store(result, index=(block_id,), tile=result_tile)
# Generate input arrays
a = cupy.random.uniform(-5, 5, 128)
b = cupy.random.uniform(-5, 5, 128)
expected = cupy.asnumpy(a) + cupy.asnumpy(b)
# Allocate an output array and launch the kernel
result = cupy.zeros_like(a)
grid = (ct.cdiv(a.shape[0], TILE_SIZE), 1, 1)
ct.launch(cupy.cuda.get_current_stream(), grid, vector_add_kernel, (a, b, result))
# Verify the results
result_np = cupy.asnumpy(result)
np.testing.assert_array_almost_equal(result_np, expected)
```
More examples can be found at [Samples](samples/) and [TileGym](https://github.com/NVIDIA/TileGym).
System Requirements
-------------------
cuTile Python generates kernels based on [Tile IR](https://docs.nvidia.com/cuda/tile-ir/)
which requries NVIDIA Driver r580 or later to run.
Furthermore, the `tileiras` compiler only supports Blackwell GPU with 13.1 release, but the
restriction will be removed in the coming versions.
Checkout the [prerequisites](https://docs.nvidia.com/cuda/cutile-python/quickstart.html#prerequisites)
for full list of requirements.
Installing from PyPI
--------------------
cuTile Python is published on [PyPI](https://pypi.org/) under the
[cuda-tile](https://pypi.org/project/cuda-tile/) package name and can be installed with `pip`:
```
pip install cuda-tile
```
Currently, the [CUDA Toolkit 13.1+](https://developer.nvidia.com/cuda-downloads) is required
and needs to be installed separately. On a Debian-based system, use `apt-get install
cuda-tileiras-13.1 cuda-compiler-13.1` instead of `apt-get install cuda-toolkit-13.1`
if you wish to avoid installing the full CUDA Toolkit.
Building from Source
--------------------
cuTile is written mostly in Python, but includes a C++ extension which needs to be built.
You will need:
- A C++17-capable compiler, such as GNU C++ or MSVC;
- CMake 3.18+;
- GNU Make on Linux or msbuild on Windows;
- Python 3.10+ with development headers (`venv` module is recommended but optional);
- [CUDA Toolkit 13.1+](https://developer.nvidia.com/cuda-downloads)
On an Ubuntu system, the first four dependencies can be installed with APT:
```
sudo apt-get update && sudo apt-get install build-essential cmake python3-dev python3-venv
```
The CMakeLists.txt script will also automatically download
the [DLPack](https://github.com/dmlc/dlpack) dependency from GitHub.
If you wish to disable this behavior and provide your own copy of DLPack,
set the `CUDA_TILE_CMAKE_DLPACK_PATH` environment variable to a local path
to the DLPack source tree.
Unless you are already using a Python virtual environment, it is recommended to create one
in order to avoid installing cuTile globally:
```
python3 -m venv env
source env/bin/activate
```
Once the build dependencies are in place, the simplest way to build cuTile is to install it
in editable mode by running the following command in the source root directory:
```
pip install -e .
```
This will create the `build` directory and invoke the CMake-based build process.
In editable mode, the compiled extension module will be placed in the build directory,
and then a symbolic link to it will be created in the source directory.
This makes sure that the `pip install -e .` command above is needed only once, and recompiling
the extension after making changes to the C++ code can be done with `make -C build`
which is much faster. This logic is defined in [setup.py](./setup.py).
Experimental Features (Optional)
--------------------------------
cuTile now provides an experimental package containing APIs that are still under active development.
These are **not** part of the stable `cuda.tile` API and may change.
To enable the experimental features when working from a source checkout, install the experimental
package from the repository root:
```
pip install ./experimental
```
You can also install it directly from a GitHub repository subdirectory:
```
pip install \
"git+https://github.com/NVIDIA/cutile-python.git#egg=cuda-tile-experimental&subdirectory=experimental"
```
For example, this will make the experimental namespace available for autotuner:
```
from cuda.tile_experimental import autotune_launch, clear_autotune_cache
```
Running Tests
-------------
cuTile uses the [pytest](https://pytest.org) framework for testing.
Tests have extra dependencies, such as PyTorch, which can be installed with
```
pip install -r test/requirements.txt
```
The tests are located in the [test/](test/) directory. To run a specific test file,
for example `test_copy.py`, use the following command:
```
pytest test/test_copy.py
```
Copyright and License Information
---------------------------------
Copyright © 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
cuTile-Python is licensed under the Apache 2.0 license. See the [LICENSES](LICENSES/) folder for the full license text.