{"id":14991156,"url":"https://github.com/cbalint13/rvv-kernels","last_synced_at":"2025-04-12T03:30:21.703Z","repository":{"id":227449295,"uuid":"771414101","full_name":"cbalint13/rvv-kernels","owner":"cbalint13","description":"RISCV Vector Kernel C/LLVM-IR generator","archived":false,"fork":false,"pushed_at":"2024-12-16T18:31:13.000Z","size":14457,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-25T23:05:19.790Z","etag":null,"topics":["int8","kernel","llvm","math","riscv","rvv","tvm","vector"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cbalint13.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-13T09:00:04.000Z","updated_at":"2024-12-16T19:51:55.000Z","dependencies_parsed_at":"2024-03-17T01:16:07.772Z","dependency_job_id":"8f2b9de7-4318-4832-9cfb-9df0ce5ba4dc","html_url":"https://github.com/cbalint13/rvv-kernels","commit_stats":{"total_commits":16,"total_committers":1,"mean_commits":16.0,"dds":0.0,"last_synced_commit":"b78b0c5b812710b512f8995a7a5fd494ee81053e"},"previous_names":["cbalint13/rvv-kernels"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cbalint13%2Frvv-kernels","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cbalint13%2Frvv-kernels/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cbalint13%2Frvv-kernels/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cbalint13%2Frvv-kernels/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cbalint13","download_url":"https://codeload.github.com/cbalint13/rvv-kernels/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248512412,"owners_count":21116597,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["int8","kernel","llvm","math","riscv","rvv","tvm","vector"],"created_at":"2024-09-24T14:21:36.420Z","updated_at":"2025-04-12T03:30:21.688Z","avatar_url":"https://github.com/cbalint13.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n## High performance RVV kernel generator to C \u0026 LLVM-IR dialects\n\n  This is a C/LLVM-IR kernel generator that address unsupported RVV ISA versions for LLVM or any other toolchains.\n\n### Benchmark\n\n|                    XuanTie TH1520                       |                SpacemiT K1 X60                      |\n| ------------------------------------------------------- | --------------------------------------------------- |\n| ![INT8-v0.7.1-BENCHMARK](benchmark-v0.7.1-int8.log.png) | ![INT8-v1.0-BENCHMARK](benchmark-v1.0-int8.log.png) |\n| ![FP16-v0.7.1-BENCHMARK](benchmark-v0.7.1-fp16.log.png) | ![FP16-v1.0-BENCHMARK](benchmark-v1.0-fp16.log.png) |\n| ![FP32-v0.7.1-BENCHMARK](benchmark-v0.7.1-fp32.log.png) | ![FP32-v1.0-BENCHMARK](benchmark-v1.0-fp32.log.png) |\n\n### Usage\n\n* Prepare a docker image with rv64 cross compiler\n```\n$ git clone https://github.com/cbalint13/rvv-kernels\n$ cd rvv-kernels\n$ docker build --file Dockerfile.ML.fedora --tag th1520-rvv .\n```\n\n* Generate a kernel\n```\n$ docker run -it --rm -v \"$PWD\":/opt/src th1520-rvv bash\n[root@b8032fd28a75 src]# ./make.sh 32 4 int8 v0.7.1 cbalint@192.168.1.45\n\n(x) Naive kernel:\n  HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00\n  O[] = 00010416 00026288 00042160 00058032\n\n(x) MACC operations: elems[32] x lanes[4] = 256 Ops\n\n(x) RVV kernel:\n  HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00\n  O[] = 00010416 00026288 00042160 00058032\n\nRVV bench: 25.600 GOPS in 2.215818 secs\nRVV speed: 11.553 GOPS/sec\n\n[root@b8032fd28a75 src]# ls -l dot_int8_kernel.*\n-rw-r--r-- 1 1000 1000 3867 Mar 13 18:03 dot_int8_kernel.c\n-rw-r--r-- 1 1000 1000 5034 Mar 13 18:03 dot_int8_kernel.ir\n```\n\n* Optional benchmark logs \u0026 graph\n```\n[root@b8032fd28a75 src]# ./script/0-explore.sh\n[root@b8032fd28a75 src]# ls -l benchmark-int8.log\n-rw-r--r-- 1 1000 1000 5731 Mar 13 17:38 benchmark-int8.log\n\n[root@b8032fd28a75 src]# ./script/1-plotgraph.py --logs benchmark-int8.log --title 'RVV v0.7.1 int8 kernels benchmark (TH1520)'\n[root@b8032fd28a75 src]# ls -l benchmark-int8.log.png\n-rw-r--r-- 1 1000 1000 58380 Mar 13 18:47 benchmark-int8.log.png\n```\n\n\n### Notes\n\n  * This generator emmits C / LLVM-IR kernels, with encoded insn, thus making it RVV version agnostic\n  * T-Head 1520 (C906, also others) implements older v0.7.1 RVV ISA, now unsupported by LLVM upstream\n  * TH1520 ```setvli``` ASIC implementation is slow, see comments on a dynamic kernel: [trials/riscv-asm.c](trials/riscv-asm.c)\n  * The ```setvli``` slowness issue force the SVE (scalable vector) concept to avoid frequent ```setvli``` calls\n\n  The [trials/riscv-asm.c](trials/riscv-asm.c) sample kernel would cope with **SVE concept** of **runtime dynamism**\nbut for reasons tested and mentioned here, on the particular T-Head's C906 RVV ASIC implementation, the context\nswitching ```setvli``` drags down the whole performance in a severe way, thus ```setvli``` calls should be minimized\nfor this particular target.\n  \n  For RVV 0.7.1 there is a limit of how \u0026 which vector registers can be used in the context of MUL (multiplier),\nso the maximum vector fill width of 64 x ```int8``` being reduced into x2 lanes is not possible, it would require\n**e8/m4** MUL mode that leaves room for only 4 x vregs (v0, v8, v16, v24) a insufficient amount of registers.\nThe maximum usable ```int8``` elements width is 32 for RVV 0.7.1 version.\n\n  The generated kernel sets```setvli``` once and unrolls computations across the vector registers.\n\n\n### Changelog\n\n  * **16 Dec 2024** benchmark full int8/fp16/fp32 RVV v1.0 \u0026 v0.7.1\n  * **06 Jun 2024** realease ```fp16``` \u0026 ```fp32``` for RVV 0.7.1 version\n  * **13 Mar 2024** intial realease, for now ```int8``` with RVV 0.7.1 version\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcbalint13%2Frvv-kernels","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcbalint13%2Frvv-kernels","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcbalint13%2Frvv-kernels/lists"}