https://github.com/openmachine-ai/huggingfive
HuggingFive 🖐️ is a collection of ML functions and libraries written in RISC-V assembly and C.
https://github.com/openmachine-ai/huggingfive
ai assembly machine-learning ml risc-v riscv-assembly riscv32
Last synced: 2 months ago
JSON representation
HuggingFive 🖐️ is a collection of ML functions and libraries written in RISC-V assembly and C.
- Host: GitHub
- URL: https://github.com/openmachine-ai/huggingfive
- Owner: OpenMachine-ai
- License: mit
- Created: 2023-07-22T09:22:24.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-29T20:00:05.000Z (over 2 years ago)
- Last Synced: 2025-04-13T23:37:00.135Z (7 months ago)
- Topics: ai, assembly, machine-learning, ml, risc-v, riscv-assembly, riscv32
- Homepage:
- Size: 76.2 KB
- Stars: 8
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# HuggingFive :raised_hand_with_fingers_splayed:
HuggingFive :raised_hand_with_fingers_splayed: is a collection of ML functions and libraries written in RISC-V assembly and C. This includes neural network layers, activation functions, as well as entire neural networks. Think of it as a low-level HuggingFace for RISC-V assembly code. The table below includes performance numbers for benchmarking. The hope is to eventually roll all these handwritten tricks into the existing compiler toolchains (or have AI generate even better assembly code).
Description
Author
RV config
Data types
Performance numbers for an exemplary config
Notes
Config
MACs
Ops
Register utilization
Memory size (B)
Conv2D 1x1
OpenMachine
RV32IF
FP32
C=32, F=32, R=6
C F R2 = 36,864
57,953
8/31 x-regs, 21/32 f-regs
Conv2D 3x3
OpenMachine
RV32IF
FP32
C=3, F=8, R=12, stride=1
9 C F R2 = 31,104
TBD
TBD
Depthwise Conv2D 3x3
OpenMachine
RV32IF
FP32
C=4, R=6, stride=1
9 C R2 = 1,296
TBD
TBD
- C : input channels
- F : output channels (or filters), only used if F is not the same as C
- R : square root of input resolution (e.g. R=6 for image resolution of 6x6 pixels)
- Q : square root of output resolution, only used if Q is not the same as R
- MACs : number of fused multiply-accumulate operations required by the neural-network layer (can be used as a lower-bound for total number of ops; this number ignores possible savings from zero-padding for conv-layers)
## Contribute
Please add your functions and routines to HuggingFive :raised_hand_with_fingers_splayed:: Add a link to your code in the
table and submit a PR, which will get approved promptly because there are no rules here.
More details coming soon ...