{"id":22097428,"url":"https://github.com/nuclei-software/nuclei-ai-library","last_synced_at":"2025-07-24T22:33:07.980Z","repository":{"id":252909690,"uuid":"840138481","full_name":"Nuclei-Software/nuclei-ai-library","owner":"Nuclei-Software","description":"Nuclei AI Library Optimized For RISC-V Vector","archived":false,"fork":false,"pushed_at":"2024-08-26T10:33:47.000Z","size":185,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"develop","last_synced_at":"2024-08-27T13:08:01.373Z","etag":null,"topics":["ai-library","risc-v","riscv","rvv"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Nuclei-Software.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-09T03:54:34.000Z","updated_at":"2024-08-26T10:33:50.000Z","dependencies_parsed_at":"2024-08-26T12:31:30.921Z","dependency_job_id":null,"html_url":"https://github.com/Nuclei-Software/nuclei-ai-library","commit_stats":null,"previous_names":["nuclei-software/nuclei-ai-library"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nuclei-Software%2Fnuclei-ai-library","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nuclei-Software%2Fnuclei-ai-library/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nuclei-Software%2Fnuclei-ai-library/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nuclei-Software%2Fnuclei-ai-library/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Nuclei-Software","download_url":"https://codeload.github.com/Nuclei-Software/nuclei-ai-library/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227482406,"owners_count":17779968,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-library","risc-v","riscv","rvv"],"created_at":"2024-12-01T04:15:32.561Z","updated_at":"2025-07-24T22:33:07.966Z","avatar_url":"https://github.com/Nuclei-Software.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nuclei AI Library\n\nNuclei AI Library is a set of ONNX AI operators optimized for Nuclei RISC-V Processors which support RISC-V Vector Instruction Set.\n\nWe implemented the ONNX Operators in pure c code, and also provided RISC-V Vector optimized implementation, see source code located in `src` for details.\n\nWe also provided test code to evaluate the ONNX operators implemention, which can be evaluated with Nuclei SDK.\n\n## Supported ONNX Operators\n\n\u003e Some operator implementation may only support subset of the ONNX operator.\n\n**VPU Lite**: VPU Lite is a lightweight VPU implementation, which **don't support** following features in whole or in part:\n\n- segment load/store\n- vslide/vgather/vcompress\n- ELEN=64\n\nIn the chart below, `VPU Lite Compatibility` illustrates the degree to which each operator is compatible with VPU Lite. The symbol `√` indicates that the operator is fully compatible with VPU Lite. In instances where compatibility is not achieved, the chart will illustrates the reasons why the operator is not compatible.\n\n| Operator           | VPU Lite compatibility | FP32 | FP16 | BF16 | FP8 | INT32 | INT8 | INT4 | Boolean |\n| --                 | --                     | --   | --   | --   | --  | --    | --   | --   | --      |\n| Abs                | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| Add                | √                      | √    | √    | ×    | ×   | ×     |  √   | ×    |   |\n| BatchNormalization | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Clamp              | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| Concat             | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| ConvInteger        | invoke segment load    | ×    | ×    | ×    | ×   | ×     |  √   | ×    |   |\n| Cos                | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Div                | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Elu                | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Erf                |                        | ×    | ×    | ×    | ×   | ×     |  ×   | ×    |   |\n| Flip               | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| GatherElements     | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| Gelu               |                        | ×    | ×    | ×    | ×   | ×     |  ×   | ×    |   |\n| LayerNormalization | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Log                | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| MatMul             | √                      | √    | √    | ×    | ×   | ×     |  √   | ×    |   |\n| Mul                | √                      | √    | √    | ×    | ×   | ×     |  √   | ×    |   |\n| Negate             | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| Pad                | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| Pow                | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Reciprocal         | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| ReduceAll          | √                      | ×    | ×    | ×    | ×   | ×     |  ×   | ×    | √ |\n| ReduceAny          | √                      | ×    | ×    | ×    | ×   | ×     |  ×   | ×    | √ |\n| ReduceMax          | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| ReduceMin          | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| ReduceProd         | invoke vslide          | √    | √    | ×    | ×   |       |      | ×    |   |\n| ReduceSum          | √                      | √    | √    | ×    | ×   |       |      | ×    |   |\n| Relu               | √                      | √    | √    | ×    | ×   |       |      | ×    |   |\n| RMSNormalization   | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Rsqrt              | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| ScatterElements    | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| Silu               | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Sin                | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Slice              | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| Softmax            | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Sqrt               | √                      | √    | √    | ×    | ×   | ×     |  ×   | ×    |   |\n| Sub                | √                      | √    | √    | ×    | ×   | ×     |  √   | ×    |   |\n| Tile               | √                      | √    | √    | ×    | ×   | √     |  √   | ×    |   |\n| TopK               | invoke vslide          | √    | √    | ×    | ×   | √     |  ×   | ×    |   |\n\n## File Structure\n\n| Directory | Description |\n| --------- | ----------- |\n| src       | Source files, operators implementation, each file corresponds to one operator|\n| inc       | Header files, operators declaration |\n| test      | Test files, each file corresponds to one kind of operators(except [main.c](./test/main.c)) |\n\n## How to Use\n\n### Prerequests\n\nWe recommend utilizing the latest version of the Nuclei SDK and associated toolchain for optimal performance and compatibility. For this project we use the following versions:\n\n- [Nuclei SDK version 0.6.0](https://github.com/Nuclei-Software/nuclei-sdk/releases/tag/0.6.0)\n- [Nuclei Studio IDE for Linux version 2024.06](https://download.nucleisys.com/upload/files/nucleistudio/NucleiStudio_IDE_202406-lin64.tgz)\n\nPlease adhere to the instructions outlined in the [Setup Tools and Environment](https://doc.nucleisys.com/nuclei_sdk/quickstart.html#get-and-setup-nuclei-sdk) section to properly prepare your Nuclei SDK and toolchain for use. Both Linux and Windows operating systems are supported, for the purpose of example, we will demonstrate the process using the Ubuntu 20.04 Linux operating system.\n\nIt is recommended to setup `NUCLEI_SDK_ROOT` environment variable to point to `/path/to/nuclei-sdk`.\n\n```shell\nexport NUCLEI_SDK_ROOT=/path/to/nuclei-sdk\n```\n\nAfter that, no matter where this project located in, you can run make to build and run the test program.\n\nOtherwise, you should place this project in the directory of `$NUCLEI_SDK_ROOT/application/baremetal`\n\n```shell\n# if you have cloned this project to your local directory\nmv /path/to/nuclei-ai-library /path/to/nuclei-sdk/application/baremetal\n# if you havn't cloned this project to your local directory\ngit clone -b develop https://github.com/Nuclei-Software/nuclei-ai-library.git /path/to/nuclei-sdk/application/baremetal/nuclei-ai-library\n```\n\nAfter that, the files should organized as follows:\n\n```shell\n$NUCLEI_SDK_ROOT\n├── application\n│   ├── baremetal\n│   │   ├── nuclei-ai-library\n│   │   │   ├── ci\n│   │   │   ├── evalsoc.ld\n│   │   │   ├── inc\n│   │   │   ├── Makefile\n│   │   │   ├── README.md\n│   │   │   ├── src\n│   │   │   └── test\n│   │   │   ...\n```\n### Build\n\nTo build the test program for rv64, run the following command:\n\n```shell\ncd /path/to/nuclei-ai-library\nmake CORE=nx900fd ARCH_EXT=v_zfh_zvfh all\n```\n\nWhen not specify `CORE` and `ARCH_EXT`，the `CORE=nx900fd` and `ARCH_EXT=v_zfh_zvfh` will be used as default.\n\nIf you want to specify `CORE` and `ARCH_EXT` to build for rv32，you can run the following command:\n\n```shell\nmake CORE=n900f ARCH_EXT=_zfh_zvfh_zve32f all\n```\n\nAfter make, the binary file `ailib_bench.elf` will be generated in the root directory of this project.\n\n### Run Test\n\n#### Test on QEMU\n\nTo run the test program with QEMU, run the following command:\n\n```shell\n# run test on qemu for rv64\nmake CORE=nx900fd ARCH_EXT=v_zfh_zvfh SIMU=qemu clean all run_qemu\n# run test on qemu for rv32\nmake CORE=n900f ARCH_EXT=_zfh_zvfh_zve32f SIMU=qemu clean all run_qemu\n```\n\nThese command will rebuild the test program with `SIMU=qemu`，and run the test program on QEMU after build. When `SIMU=qemu` is specified, QEMU will automatically terminate upon the completion of the test. In other cases, you will need to press `CTRL+C` to manually exit QEMU once the test is completed.\n\n#### Test on Hardware\n\n**Check Binary**. To run the test program with hardware, `SIMU=qemu` is not allowed. You'd better run `make clean` and rebuild your binary file **without** `SIMU=qemu` before running.\n\n**Check Hardware**. The hardware should meet the following requirements:\n\n- 1024kB ilm and 1024kB dlm\n- support v extension (rv64) or _zve32f extension (rv32)\n- support _zfh extension\n- support _zvfh extension\n\nWhen the hardware has connected to your host locally, you can run the following command:\n\n```shell\n# when the hardware is rv64\nmake CORE=nx900fd ARCH_EXT=v_zfh_zvfh clean all upload\n# when the hardware is rv32\nmake CORE=n900f ARCH_EXT=_zfh_zvfh_zve32f clean all upload\n```\n\nTo lean more details about run applications on hardware please refer to [Build, Run and Debug Sample Application](https://doc.nucleisys.com/nuclei_sdk/quickstart.html#build-run-and-debug-sample-application) section in Nuclei SDK documentation.\n\n#### Test Results\n\nNo matter how you run the test program, the test results will be shown in the terminal like this:\n\n```shell\n...\nCSV, Tile_float32_axis0, 5064\nCSV, Tile_float32_rvv_axis0, 1548\nCSV, Tile_float32_axis1, 7066\nCSV, Tile_float32_rvv_axis1, 1286\nCSV, Tile_float32_bothaxes, 9638\nCSV, Tile_float32_rvv_bothaxes, 2380\nCSV, Topk_int32, 116255\nCSV, Topk_int32_rvv, 84587\nAll test done!\n-------------\nAll tests passed!\n```\n\nEach line starting with `CSV` corresponds to a test case and adheres to the CSV format. Following the `CSV, ` is the name of a specific test case, with the final number indicating the number of cycles consumed for that test.\n\n**Test Case Naming Rules**: \\\u003cOperator\\\u003e\\_\\\u003cDataType\\\u003e[\\_rvv][\\_CaseName]\n\n- **Operator**(required): The name of the ONNX operator.\n- **DataType**(required): The data type of the input and output.\n- **_rvv**(optional): If the operator is optimized with RISC-V vector extension.\n- **_CaseName**(optional): The name of the subdivided test cases.\n\n## Reference\n\n- https://github.com/onnx/onnx/tree/main/onnx/reference/ops\n- https://github.com/microsoft/onnxruntime/tree/main/onnxruntime\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnuclei-software%2Fnuclei-ai-library","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnuclei-software%2Fnuclei-ai-library","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnuclei-software%2Fnuclei-ai-library/lists"}