{"id":13645898,"url":"https://github.com/uTensor/uTensor","last_synced_at":"2025-04-21T17:31:32.133Z","repository":{"id":25720856,"uuid":"104375922","full_name":"uTensor/uTensor","owner":"uTensor","description":"TinyML AI inference library","archived":false,"fork":false,"pushed_at":"2025-04-19T09:02:51.000Z","size":77934,"stargazers_count":1798,"open_issues_count":56,"forks_count":232,"subscribers_count":102,"default_branch":"master","last_synced_at":"2025-04-19T15:23:02.863Z","etag":null,"topics":["cortex-m","deep-learning","edge-computing","embedded","iot","iot-middleware","machine-learning","mbed","microcontroller","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uTensor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-21T17:04:59.000Z","updated_at":"2025-04-19T11:16:33.000Z","dependencies_parsed_at":"2024-01-07T00:06:23.031Z","dependency_job_id":"29cf640e-2606-4c8a-8d30-a6241f358353","html_url":"https://github.com/uTensor/uTensor","commit_stats":{"total_commits":847,"total_committers":22,"mean_commits":38.5,"dds":0.6162927981109799,"last_synced_commit":"6aa081bf06c0106205b846291e1a6495f103f477"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uTensor%2FuTensor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uTensor%2FuTensor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uTensor%2FuTensor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uTensor%2FuTensor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uTensor","download_url":"https://codeload.github.com/uTensor/uTensor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250100448,"owners_count":21374943,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cortex-m","deep-learning","edge-computing","embedded","iot","iot-middleware","machine-learning","mbed","microcontroller","tensorflow"],"created_at":"2024-08-02T01:02:44.505Z","updated_at":"2025-04-21T17:31:32.114Z","avatar_url":"https://github.com/uTensor.png","language":"C++","funding_links":[],"categories":["Machine Learning \u0026 AI on MCU","C++","Uncategorized","微控制器 MCU 端","Data processing","Networks","Libraries","Library"],"sub_categories":["USB","Uncategorized","Awesome-Embedded Repository","AI ML","Inference Framework"],"readme":"# uTensor - Test Release\n[![CircleCI](https://circleci.com/gh/uTensor/uTensor.svg?style=svg)](https://circleci.com/gh/uTensor/uTensor)\nNote: If you are looking for stable releases, checkout master.\n\n## Tutorials\n\n### Building Tutorial Examples\n\nMake sure `cmake` is available on your system and run following commands:\n\n```bash\n$ mkdir build\n$ cd build\n$ cmake -DPACKAGE_TUTORIALS=ON ..\n$ make\n```\n\nAfter the building process finish, you should find the tutorial executables under `build/tutorials/` directory.\n\nFollow instructions in the `README.md` in each tutorial directories to learn how to use `uTensor`.\n\nHere are the links to the tutorials:\n\n1. [Error Handling with uTensor](tutorials/error_handling)\n2. [Custom Operator](tutorials/custom_operator)\n\n## Introduction\n\n### What is it?\nuTensor is an extremely light-weight machine learning inference framework built on Tensorflow and optimized for Arm targets. It consists of a runtime library and an offline tool that handles most of the model translation work. This repo holds the core runtime and some example implementations of operators, memory managers/schedulers, and more, and the size of the core runtime is only ~2KB!\n\n| Module                       |         .text |       .data |        .bss |\n|------------------------------|---------------|-------------|-------------|\n| uTensor/src/uTensor/core     |   1275(+1275) |       4(+4) |     28(+28) |\n| uTensor/src/uTensor/tensors  |     791(+791) |       0(+0) |       0(+0) |\n\n\n### How does the uTensor workflow work?\n\u003cdiv\u003e\u003cimg src=docs/img/uTensorFlow.jpg width=600 align=center/\u003e\u003c/div\u003e\n\nA model is constructed and trained in Tensorflow. uTensor takes the model and produces a .cpp and .hpp file. These files contains the generated C++11 code needed for inferencing. Working with uTensor on the embedded side is as easy as copy-and-paste.\n\n### How does the uTensor runtime work?\n[Check out the detailed description here](src/uTensor/README.md)\n\n\n## Release Note\nThe rearchitecture is fundamentally centered around a few key ideas, and the structure of the code base and build tools naturally followed.\nOld key points:\n- Tensors describe how data is accessed and where from\n  - Performance of ops depends on which tensors are used\n- Operators are Tensor agnostic\n  - High performance ops can fetch blocks of data at once\n- Strive for low total power in execution\n- Low static and dynamic footprint, be small\n  - Low cost per Tensor throughout the entire system, since most generated models have 100+ including intermediates, also impacts dynamic footprint\n  - Lightweight class hierarchy\n  - Duh\n\nNew additional key ideas:\n- System safety\n  - All tensor metadata and actual data are owned in dedicated regions\n    - This can either be user provided, or one we create\n  - We can guarantee that runtime will use no more than N bytes of RAM at code gen time or at compile time!\n  - Generally should not collide with userspace or system space memory, i.e. dont share heaps\n  - Generally implications: a safe runtime means we can safely update models remotely\n  - As many compile time errors as possible!\n    - Mismatched inputs, outputs, or numbers\n    - wrong sizes used\n    - Impossible memory accesses\n    - etc.\n- Clear, Concise, and Debuggable\n  - Previous iteration of uTensor relied almost too heavily on codegen, making changes to a model for any reason was near impossible\n  - A developer should be able to make changes to the model without relying on code gen\n  - A developer should be able to look at a model file and immediately understand what the graph looks like, without a massive amound of jumping around\n  - Default tensor interface should behave like a higher level language, but exploit the speed of C++\n    - Generally: No more pointer bullshit! C is super error prone, fight me\n      - Only specialized operators have access to raw data blocks, and these ops will be wicked fast\n  - Extensible, configurable, and optimize-outable error handling\n  - GDB debugging IS NOW TRIVIAL\n\nAs mentioned before, these key ideas need to be reflected not only in the code, but in the code structure in such a way that it is Maintainable, Hackable, and User-extensible. Pretty much everything in the uTensor runtime can be divided into two components: core, and everything else. The core library contains all the deep low level functionality needed for the runtime to make the above guarantees, as well as the interfaces required for concrete implementation. Furthermore, the overhead of this core engine should be negligible relative to the system operation. Everything not in the core library really should just be thought of a reasonable defaults. For example, tensor implementations, default operators, example memory allocators, or even possible logging systems and error handlers. These modules should be the primary area for future optimization, especially before model deployment.\n\n## High level API\n\n```c++\nusing namespace uTensor;\n\nconst uint8_t s_a[4] = {1, 2, 3, 4};\nconst uint8_t s_b[4] = {5, 6, 7, 8};\nconst uint8_t s_c_ref[4] = {19, 22, 43, 50};\n\n// These can also be embedded in models\n// Recommend, not putting these on the heap or stack directly as they can be large\nlocalCircularArenaAllocator\u003c256\u003e meta_allocator; // All tensor metadata gets stored here automatically, even when new is called\nlocalCircularArenaAllocator\u003c256\u003e ram_allocator;  // All temporary storage gets allocated here\n\nvoid foo() {\n  // Tell the uTensor context which allocators to use\n  Context::get_default_context()-\u003eset_metadata_allocator(\u0026meta_allocator);\n  Context::get_default_context()-\u003eset_ram_data_allocator(\u0026ram_allocator);\n\n  // Tensors are simply handles for accessing data as necessary, they are no larger than a pointer\n  // RomTensor(TensorShape, data_type, data*);\n  Tensor a = new /*const*/ RomTensor({2, 2}, u8, s_a);\n  Tensor b = new /*const*/ RomTensor({2, 2}, u8, s_b);\n  Tensor c_ref = new RomTensor({2,2}, u8, s_c_ref);\n  // RamTensors are held internally and can be moved or cleared depending on the memory schedule (optional)\n  Tensor c = new RamTensor({2, 2}, u8);\n\n  // Operators take in a fixed size map of (input_name -\u003e parameter), this gives compile time errors on input mismatching\n  // Also, the name binding + lack of parameter ordering makes ctag jumping and GDB sessions significantly more intuitive\n  MatrixMultOperator\u003cuint8_t\u003e mult_AB;\n  mult_AB\n      .set_inputs({{MatrixMultOperator\u003cuint8_t\u003e::a, a}, {MatrixMultOperator\u003cuint8_t\u003e::b, b}})\n      .set_outputs({{MatrixMultOperator\u003cuint8_t\u003e::c, c}})\n      .eval();\n\n  // Compare results\n  TensorShape\u0026 c_shape = c-\u003eget_shape();\n  for (int i = 0; i \u003c c_shape[0]; i++) {\n    for (int j = 0; j \u003c c_shape[1]; j++) {\n      // Just need to cast the access to the expected type\n      if( static_cast\u003cuint8_t\u003e(c(i, j)) != static_cast\u003cuint8_t\u003e(c_ref(i, j)) ) {\n        printf(\"Oh crap!\\n\");\n        exit(-1);\n      }\n    }\n  }\n}\n```\n\n## Building and testing locally\n\n```\ngit clone git@github.com:uTensor/uTensor.git\ncd uTensor/\ngit checkout proposal/rearch\ngit submodule init\ngit submodule update\nmkdir build\ncd build/\ncmake -DPACKAGE_TESTS=ON -DCMAKE_BUILD_TYPE=Debug ..\nmake\nmake test\n```\n\n## Building and running on Arm Mbed OS\n\nThe uTensor core library is configured as a mbed library out of the box, so we just need to import it into our project and build as normal.\n\n```\nmbed new my_project\ncd my_project\nmbed import https://github.com/uTensor/uTensor.git\n# Create main file\n# Run uTensor-cli workflow and copy model directory here\nmbed compile # as normal\n```\n\n## Building and running on Arm systems\nTODO\nNote: CMake Support for Arm is currently experimental\nhttps://stackoverflow.com/questions/46916611/cross-compiling-googletest-for-arm64\n\nDefault build\n```\nmkdir build \u0026\u0026 cd build\ncmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=../extern/CMSIS_5/CMSIS/DSP/gcc.cmake  ..\n```\n\nWith CMSIS optimized kernels\n```\nmkdir build \u0026\u0026 cd build\ncmake -DARM_PROJECT=1 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_TOOLCHAIN_FILE=../extern/CMSIS_5/CMSIS/DSP/gcc.cmake  ..\n```\n\n## Further Reading\n- [Why Edge Computing](https://towardsdatascience.com/why-machine-learning-on-the-edge-92fac32105e6)\n- [Why the Future of Machine Learning is Tiny](https://petewarden.com/2018/06/11/why-the-future-of-machine-learning-is-tiny/)\n- [TensorFlow](https://www.tensorflow.org)\n- [Mbed](https://developer.mbed.org)\n- [Node-Viewer](https://github.com/neil-tan/tf-node-viewer/)\n- [How to Quantize Neural Networks with TensorFlow](https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/)\n- [mxnet Handwritten Digit Recognition](https://mxnet.incubator.apache.org/tutorials/python/mnist.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FuTensor%2FuTensor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FuTensor%2FuTensor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FuTensor%2FuTensor/lists"}