https://github.com/cbalint13/rvv-kernels

RISCV Vector Kernel C/LLVM-IR generator
https://github.com/cbalint13/rvv-kernels

int8 kernel llvm math riscv rvv tvm vector

Last synced: 6 months ago
JSON representation

RISCV Vector Kernel C/LLVM-IR generator

Host: GitHub
URL: https://github.com/cbalint13/rvv-kernels
Owner: cbalint13
License: apache-2.0
Created: 2024-03-13T09:00:04.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-12-16T18:31:13.000Z (10 months ago)
Last Synced: 2025-03-25T23:05:19.790Z (7 months ago)
Topics: int8, kernel, llvm, math, riscv, rvv, tvm, vector
Language: C
Homepage:
Size: 13.8 MB
Stars: 7
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## High performance RVV kernel generator to C & LLVM-IR dialects

This is a C/LLVM-IR kernel generator that address unsupported RVV ISA versions for LLVM or any other toolchains.

### Benchmark

| XuanTie TH1520 | SpacemiT K1 X60 |
| ------------------------------------------------------- | --------------------------------------------------- |
| ![INT8-v0.7.1-BENCHMARK](benchmark-v0.7.1-int8.log.png) | ![INT8-v1.0-BENCHMARK](benchmark-v1.0-int8.log.png) |
| ![FP16-v0.7.1-BENCHMARK](benchmark-v0.7.1-fp16.log.png) | ![FP16-v1.0-BENCHMARK](benchmark-v1.0-fp16.log.png) |
| ![FP32-v0.7.1-BENCHMARK](benchmark-v0.7.1-fp32.log.png) | ![FP32-v1.0-BENCHMARK](benchmark-v1.0-fp32.log.png) |

### Usage

* Prepare a docker image with rv64 cross compiler
```
$ git clone https://github.com/cbalint13/rvv-kernels
$ cd rvv-kernels
$ docker build --file Dockerfile.ML.fedora --tag th1520-rvv .
```

* Generate a kernel
```
$ docker run -it --rm -v "$PWD":/opt/src th1520-rvv bash
[root@b8032fd28a75 src]# ./make.sh 32 4 int8 v0.7.1 cbalint@192.168.1.45

(x) Naive kernel:
HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00
O[] = 00010416 00026288 00042160 00058032

(x) MACC operations: elems[32] x lanes[4] = 256 Ops

(x) RVV kernel:
HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00
O[] = 00010416 00026288 00042160 00058032

RVV bench: 25.600 GOPS in 2.215818 secs
RVV speed: 11.553 GOPS/sec

[root@b8032fd28a75 src]# ls -l dot_int8_kernel.*
-rw-r--r-- 1 1000 1000 3867 Mar 13 18:03 dot_int8_kernel.c
-rw-r--r-- 1 1000 1000 5034 Mar 13 18:03 dot_int8_kernel.ir
```

* Optional benchmark logs & graph
```
[root@b8032fd28a75 src]# ./script/0-explore.sh
[root@b8032fd28a75 src]# ls -l benchmark-int8.log
-rw-r--r-- 1 1000 1000 5731 Mar 13 17:38 benchmark-int8.log

[root@b8032fd28a75 src]# ./script/1-plotgraph.py --logs benchmark-int8.log --title 'RVV v0.7.1 int8 kernels benchmark (TH1520)'
[root@b8032fd28a75 src]# ls -l benchmark-int8.log.png
-rw-r--r-- 1 1000 1000 58380 Mar 13 18:47 benchmark-int8.log.png
```

### Notes

* This generator emmits C / LLVM-IR kernels, with encoded insn, thus making it RVV version agnostic
* T-Head 1520 (C906, also others) implements older v0.7.1 RVV ISA, now unsupported by LLVM upstream
* TH1520 ```setvli``` ASIC implementation is slow, see comments on a dynamic kernel: [trials/riscv-asm.c](trials/riscv-asm.c)
* The ```setvli``` slowness issue force the SVE (scalable vector) concept to avoid frequent ```setvli``` calls

The [trials/riscv-asm.c](trials/riscv-asm.c) sample kernel would cope with **SVE concept** of **runtime dynamism**
but for reasons tested and mentioned here, on the particular T-Head's C906 RVV ASIC implementation, the context
switching ```setvli``` drags down the whole performance in a severe way, thus ```setvli``` calls should be minimized
for this particular target.

For RVV 0.7.1 there is a limit of how & which vector registers can be used in the context of MUL (multiplier),
so the maximum vector fill width of 64 x ```int8``` being reduced into x2 lanes is not possible, it would require
**e8/m4** MUL mode that leaves room for only 4 x vregs (v0, v8, v16, v24) a insufficient amount of registers.
The maximum usable ```int8``` elements width is 32 for RVV 0.7.1 version.

The generated kernel sets```setvli``` once and unrolls computations across the vector registers.

### Changelog

* **16 Dec 2024** benchmark full int8/fp16/fp32 RVV v1.0 & v0.7.1
* **06 Jun 2024** realease ```fp16``` & ```fp32``` for RVV 0.7.1 version
* **13 Mar 2024** intial realease, for now ```int8``` with RVV 0.7.1 version

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cbalint13/rvv-kernels

Awesome Lists containing this project

README