Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cbalint13/rvv-kernels
RISCV Vector Kernel C/LLVM-IR generator
https://github.com/cbalint13/rvv-kernels
int8 kernel llvm math riscv rvv tvm vector
Last synced: 30 days ago
JSON representation
RISCV Vector Kernel C/LLVM-IR generator
- Host: GitHub
- URL: https://github.com/cbalint13/rvv-kernels
- Owner: cbalint13
- License: apache-2.0
- Created: 2024-03-13T09:00:04.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-06-05T23:50:09.000Z (5 months ago)
- Last Synced: 2024-09-30T03:41:02.293Z (about 1 month ago)
- Topics: int8, kernel, llvm, math, riscv, rvv, tvm, vector
- Language: C
- Homepage:
- Size: 13.7 MB
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## High performance RVV kernel generator to C & LLVM-IR dialects
This is a C/LLVM-IR kernel generator that address unsupported RVV ISA versions for LLVM or any other toolchains.
### Benchmark
![INT8-BENCHMARK](benchmark-int8.log.png)
![FP16-BENCHMARK](benchmark-fp16.log.png)
![FP32-BENCHMARK](benchmark-fp32.log.png)### Usage
* Prepare a docker image with rv64 cross compiler
```
$ git clone https://github.com/cbalint13/rvv-kernels
$ cd rvv-kernels
$ docker build --file Dockerfile.ML.fedora --tag th1520-rvv .
```* Generate a kernel
```
$ docker run -it --rm -v "$PWD":/opt/src th1520-rvv bash
[root@b8032fd28a75 src]# ./make.sh 32 4 int8 v0.7.1 [email protected](x) Naive kernel:
HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00
O[] = 00010416 00026288 00042160 00058032(x) MACC operations: elems[32] x lanes[4] = 256 Ops
(x) RVV kernel:
HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00
O[] = 00010416 00026288 00042160 00058032RVV bench: 25.600 GOPS in 2.215818 secs
RVV speed: 11.553 GOPS/sec[root@b8032fd28a75 src]# ls -l dot_int8_kernel.*
-rw-r--r-- 1 1000 1000 3867 Mar 13 18:03 dot_int8_kernel.c
-rw-r--r-- 1 1000 1000 5034 Mar 13 18:03 dot_int8_kernel.ir
```* Optional benchmark logs & graph
```
[root@b8032fd28a75 src]# ./script/0-explore.sh
[root@b8032fd28a75 src]# ls -l benchmark-int8.log
-rw-r--r-- 1 1000 1000 5731 Mar 13 17:38 benchmark-int8.log[root@b8032fd28a75 src]# ./script/1-plotgraph.py --logs benchmark-int8.log --title 'RVV v0.7.1 int8 kernels benchmark (TH1520)'
[root@b8032fd28a75 src]# ls -l benchmark-int8.log.png
-rw-r--r-- 1 1000 1000 58380 Mar 13 18:47 benchmark-int8.log.png
```### Notes
* This generator emmits C / LLVM-IR kernels, with encoded insn, thus making it RVV version agnostic
* T-Head 1520 (C906, also others) implements older v0.7.1 RVV ISA, now unsupported by LLVM upstream
* TH1520 ```setvli``` ASIC implementation is slow, see comments on a dynamic kernel: [trials/riscv-asm.c](trials/riscv-asm.c)
* The ```setvli``` slowness issue force the SVE (scalable vector) concept to avoid frequent ```setvli``` callsThe [trials/riscv-asm.c](trials/riscv-asm.c) sample kernel would cope with **SVE concept** of **runtime dynamism**
but for reasons tested and mentioned here, on the particular T-Head's C906 RVV ASIC implementation, the context
switching ```setvli``` drags down the whole performance in a severe way, thus ```setvli``` calls should be minimized
for this particular target.
For RVV 0.7.1 there is a limit of how & which vector registers can be used in the context of MUL (multiplier),
so the maximum vector fill width of 64 x ```int8``` being reduced into x2 lanes is not possible, it would require
**e8/m4** MUL mode that leaves room for only 4 x vregs (v0, v8, v16, v24) a insufficient amount of registers.
The maximum usable ```int8``` elements width is 32 for RVV 0.7.1 version.The generated kernel sets```setvli``` once and unrolls computations across the vector registers.
### Changelog
* **06 Jun 2024** realease ```fp16``` & ```fp32``` for RVV 0.7.1 version
* **13 Mar 2024** intial realease, for now ```int8``` with RVV 0.7.1 version