https://github.com/infinitensor/ninetoothed
A domain-specific language (DSL) based on Triton but providing higher-level abstractions.
https://github.com/infinitensor/ninetoothed
Last synced: 9 months ago
JSON representation
A domain-specific language (DSL) based on Triton but providing higher-level abstractions.
- Host: GitHub
- URL: https://github.com/infinitensor/ninetoothed
- Owner: InfiniTensor
- License: apache-2.0
- Created: 2024-07-25T06:36:57.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-10-22T09:24:26.000Z (over 1 year ago)
- Last Synced: 2024-10-23T09:59:56.281Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 72.3 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NineToothed

[](https://ninetoothed.org/)
[](https://pypi.org/project/ninetoothed/)
[](LICENSE)
NineToothed is a Triton-based domain-specific language (DSL). By introducing **tensor-oriented meta-programming (TOM)**, it makes writing high-performance GPU kernels easier.
## Installation
We can use `pip` to install `ninetoothed`.
```shell
pip install ninetoothed
```
After successfully running the above command, `ninetoothed` will be installed. However, to fully utilize its capabilities, you also need to install a deep learning framework supported by `ninetoothed`. For trial purposes, we recommend installing `torch`.
## Usage
Thanks to tensor-oriented meta-programming, NineToothed can be written using the **arrange-and-apply** paradigm, which involves separately defining `arrangement`, `application`, and `tensors`, and then integrating them using `ninetoothed.make` to generate the kernel.
### Matrix Multiplication
Here is the code we need for matrix multiplication:
```python
import ninetoothed
import ninetoothed.language as ntl
from ninetoothed import Tensor, block_size
BLOCK_SIZE_M = block_size()
BLOCK_SIZE_N = block_size()
BLOCK_SIZE_K = block_size()
def arrangement(input, other, output):
output_arranged = output.tile((BLOCK_SIZE_M, BLOCK_SIZE_N))
input_arranged = input.tile((BLOCK_SIZE_M, BLOCK_SIZE_K))
input_arranged = input_arranged.tile((1, -1))
input_arranged = input_arranged.expand((-1, output_arranged.shape[1]))
input_arranged.dtype = input_arranged.dtype.squeeze(0)
other_arranged = other.tile((BLOCK_SIZE_K, BLOCK_SIZE_N))
other_arranged = other_arranged.tile((-1, 1))
other_arranged = other_arranged.expand((output_arranged.shape[0], -1))
other_arranged.dtype = other_arranged.dtype.squeeze(1)
return input_arranged, other_arranged, output_arranged
def application(input, other, output):
accumulator = ntl.zeros(output.shape, dtype=ntl.float32)
for k in range(input.shape[0]):
accumulator += ntl.dot(input[k], other[k])
output = accumulator
tensors = (Tensor(2), Tensor(2), Tensor(2))
kernel = ninetoothed.make(arrangement, application, tensors)
```
## Useful Links
- [NineToothed Documentation](https://ninetoothed.org/)
- [NineToothed Operators](https://github.com/InfiniTensor/ntops)
- [NineToothed Examples](https://github.com/InfiniTensor/ninetoothed-examples)
## License
This project is distributed under the Apache-2.0 license. See the included [LICENSE](LICENSE) file for details.