https://github.com/unit-mesh/edge-infer
EdgeInfer enables efficient edge intelligence by running small AI models, including embeddings and OnnxModels, on resource-constrained devices like Android, iOS, or MCUs for real-time decision-making. EdgeInfer 旨在资源受限的设备上运行小型 AI 模型(包括向量化和 Onnx 模型),如 Android、iOS 或 MCUs,实现高效的边缘智能,用于实时决策。
https://github.com/unit-mesh/edge-infer
inference llm
Last synced: 8 months ago
JSON representation
EdgeInfer enables efficient edge intelligence by running small AI models, including embeddings and OnnxModels, on resource-constrained devices like Android, iOS, or MCUs for real-time decision-making. EdgeInfer 旨在资源受限的设备上运行小型 AI 模型(包括向量化和 Onnx 模型),如 Android、iOS 或 MCUs,实现高效的边缘智能,用于实时决策。
- Host: GitHub
- URL: https://github.com/unit-mesh/edge-infer
- Owner: unit-mesh
- License: mit
- Created: 2023-11-08T05:40:33.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2024-04-17T09:23:00.000Z (over 1 year ago)
- Last Synced: 2025-04-01T07:54:14.530Z (9 months ago)
- Topics: inference, llm
- Language: Rust
- Homepage:
- Size: 273 KB
- Stars: 43
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Edge Infer
> EdgeInfer enables efficient edge intelligence by running small AI models, including embeddings and OnnxModels, on
> resource-constrained devices like Android, iOS, or MCUs for real-time decision-making.
Architecture:

Platform support (by Design):
- Android, iOS
- Linux, Windows, Mac,
- Raspberry Pi, MCU
## Todos
- [x] Inference wrapper
- [x] Onnx Runtime
- [x] Tokenizer
- [x] [UniFFI](https://github.com/mozilla/uniffi-rs), is a toolkit for building cross-platform software components in
Rust.
- [ ] GRPC server with [tonic](https://github.com/hyperium/tonic)
- [ ] Multiple OS support:
- Desktop: Windows, Mac, Linux (x86, x64)
- Mobile: Android, iOS, Linux (ARM)
- Embedded Linux (ARM).
- [ ] Flexible Configuration: Easily configurable via command-line parameters, including listening port, batch size,
thread count, and others.
## Usecases
- [ ] SearchEverywhere: Search for anything, anywhere, anytime.
- Model: Embedding,
like [Sentence-Transformers MiniLM](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
- Extra: Local Indexing
- [ ] Visualization
- Model: [Ultralytics YOLOv9](https://github.com/ultralytics/ultralytics)
- [ ] AutoComplete
- Model: Embedding with ??
- [ ] Summarization
## Resources
Examples:
- Modern cross-platform telemetry: [Glean](https://github.com/mozilla/glean)
### MCU
ToSpike
- ESP32: [esp-rs](https://github.com/esp-rs)
- Raspberry Pi Classic
- [built-onnxruntime-for-raspberrypi-linux](https://github.com/nknytk/built-onnxruntime-for-raspberrypi-linux)
- [ONNX Runtime IoT Deployment on Raspberry Pi](https://onnxruntime.ai/docs/tutorials/iot-edge/rasp-pi-cv.html)
Not working:
- Arduino M0 Pro, Flash: 256 KB, SRAM: 32Kb
- Official: [Arduino M0 Pro](https://docs.arduino.cc/retired/boards/arduino-m0-pro)
- Rust's [cortex-m-quickstart](https://github.com/rust-embedded/cortex-m-quickstart)
- Raspberry Pi Zero W, Flash: 512 MB, SRAM: 512 MB
- Official: [Raspberry Pi Zero W](https://www.raspberrypi.com/products/raspberry-pi-zero/)
- [Using Rust to Control a Raspberry Pi Zero W Rover](https://disconnected.systems/blog/rust-powered-rover/)
- Not working reason: See in [inference_rpi](inference_rpi/README.md)
## License
This project is licensed under the MIT License, See [LICENSE](LICENSE) for the full license text.