https://github.com/loreloc/triturus
A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming
https://github.com/loreloc/triturus
cuda pytorch triton
Last synced: 2 months ago
JSON representation
A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming
- Host: GitHub
- URL: https://github.com/loreloc/triturus
- Owner: loreloc
- License: mit
- Created: 2025-06-07T12:50:59.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-08-01T23:20:24.000Z (11 months ago)
- Last Synced: 2025-08-11T00:03:18.085Z (11 months ago)
- Topics: cuda, pytorch, triton
- Language: Python
- Homepage:
- Size: 126 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🦎 Triturus 🦎
The following table describes the implemented kernels.
| Kernel ID | Description | Operation | Source |
| ------------ | -------------------------------------------- | ------------------------------------------------------ | ---------------------------- |
| vadd | Vector addition | $a_i+b_i$ | [add](triturus/add.py) |
| vamax | Vector maximum | $\max_i a_i$ | [max](triturus/max.py) |
| vmax | Vector maximum with indices | $(\max_i a_i, \arg\max_i a_i)$ | [max](triturus/max.py) |
| matmax | Matrix maximum along one axis | $\max_i a_{ij}$ or $\max_j a_{ij}$ | [max](triturus/max.py) |
| mm | Matrix multiplication | $\sum_j a_{ij}b_{jk}$ | [mm](triturus/mm.py) |
| lm2exp | Batch log-matmul, one matrix in log-space | $\log(\sum_j a_{rij} \exp b_{rjk})$ | [lm2exp](triturus/lm2exp.py) |
| lt2exp | Batch log-Tucker2, two matrices in log-space | $\log(\sum_{i,j} w_{rsij} \exp a_{rik} \exp b_{rjk})$ | [lt2exp](triturus/lt2exp.py) |
## Benchmarks Gallery
| Kernel ID | Benchmark Description | Baselines | Results |
| ------------ | -------------------------------------------------------- | ----------- | ---------------------------- |
| vmax | Vector maximum with and without indices | torch | [here](#benchmark-of-vmax) |
| matmax | Matrix maximum along rows and columns | torch | [here](#benchmark-of-matmax) |
| mm | Matrix multiplication with square matrices | torch | [here](#benchmark-of-mm) |
| lm2exp | Batch log-matmul, square and rectangular batch matrices | torch + jit | [here](#benchmark-of-lm2exp) |
| lt2exp | Batch log-Tucker2, square and rectangular batch matrices | torch + jit | [here](#benchmark-of-lt2exp) |
---
### Benchmark of vmax

### Benchmark of matmax

### Benchmark of mm

### Benchmark of lm2exp

### Benchmark of lt2exp