https://github.com/nndeploy/nndeploy
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
https://github.com/nndeploy/nndeploy
ascend easy-to-use hpc mnn model-deployment multi-inference openvino out-of-box-model parallel rknn tensorrt yolo
Last synced: 11 days ago
JSON representation
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
- Host: GitHub
- URL: https://github.com/nndeploy/nndeploy
- Owner: nndeploy
- License: apache-2.0
- Created: 2023-08-08T13:13:25.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-12T03:31:52.000Z (over 1 year ago)
- Last Synced: 2024-09-13T05:06:17.908Z (over 1 year ago)
- Topics: ascend, easy-to-use, hpc, mnn, model-deployment, multi-inference, openvino, out-of-box-model, parallel, rknn, tensorrt, yolo
- Language: C++
- Homepage: https://nndeploy-zh.readthedocs.io/zh/latest/
- Size: 13.4 MB
- Stars: 584
- Watchers: 23
- Forks: 88
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-parallel-computing - nndeploy: An Easy-to-Use and High-Performance AI deployment framework
- awesome-production-machine-learning - nndeploy - An Easy-to-Use and High-Performance AI deployment framework. (Deployment and Serving)
README
[English](README_EN.md) | 简体中文
nndeploy:一款简单易用和高性能的AI部署框架
文档
| Ask DeepWiki
| 微信
| Discord
---
## 介绍
nndeploy 是一款简单易用和高性能的 AI 部署框架。解决的是 AI 算法在端侧部署的问题,包含桌面端(Windows、macOS)、移动端(Android、iOS)、边缘计算设备(NVIDIA Jetson、Ascend310B、RK 等)以及单机服务器(RTX 系列、T4、Ascend310P 等),**基于可视化工作流和多端推理,可让 AI 算法在上述平台和硬件更高效、更高性能的落地。**
**针对10B以上的大模型(如大语言模型和 AIGC 生成模型),nndeploy 适合作为一款可视化工作流工具。**
### **简单易用**
- **可视化工作流**:拖拽节点即可部署 AI 算法,参数实时可调,效果一目了然。
- **自定义节点**:支持 Python/C++自定义节点,无论是用 Python 实现预处理,还是用 C++/CUDA 编写高性能节点,均可无缝集成到与可视化工作流。
- **一键部署**:工作流支持导出为 JSON,可通过 C++/Python API 调用,适用于 Linux、Windows、macOS、Android 等平台
桌面端搭建AI工作流
移动端部署
### **高性能**
- **并行优化**:支持串行、流水线并行、任务并行等执行模式
- **内存优化**:零拷贝、内存池、内存复用等优化策略
- **高性能优化**:内置 C++/CUDA/Ascend C/SIMD 等优化实现的节点
- **多端推理**:一套工作流适配多端推理,深度集成 13 种主流推理框架,全面覆盖云端服务器、桌面应用、移动设备、边缘计算等全平台部署场景。框架支持灵活选择推理引擎,可按需编译减少依赖,同时支持接入自定义推理框架的独立运行模式。
| 推理框架 | 状态 |
| :------------------------------------------------------------------------------- | :--- |
| [ONNXRuntime](https://github.com/microsoft/onnxruntime) | ✅ |
| [TensorRT](https://github.com/NVIDIA/TensorRT) | ✅ |
| [OpenVINO](https://github.com/openvinotoolkit/openvino) | ✅ |
| [MNN](https://github.com/alibaba/MNN) | ✅ |
| [TNN](https://github.com/Tencent/TNN) | ✅ |
| [ncnn](https://github.com/Tencent/ncnn) | ✅ |
| [CoreML](https://github.com/apple/coremltools) | ✅ |
| [AscendCL](https://www.hiascend.com/zh/) | ✅ |
| [RKNN](https://www.rock-chips.com/a/cn/downloadcenter/BriefDatasheet/index.html) | ✅ |
| [SNPE](https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk) | ✅ |
| [TVM](https://github.com/apache/tvm) | ✅ |
| [PyTorch](https://pytorch.org/) | ✅ |
| [nndeploy内部推理子模块](docs/zh_cn/inference/README_INFERENCE.md) | ✅ |
### **开箱即用的算法**
已部署多类 AI 模型,并开发 100+可视化节点,实现开箱即用体验。随着部署节点数量的增加,节点库的复用性不断提升,这将显著降低后续算法部署的开发成本。我们还将持续部署更多具有实用价值的算法。
| Application Scenario | Available Models | Remarks |
| -------------------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
| **Large Language Models** | **QWen-2.5**, **QWen-3** | Support small B models |
| **Image/Video Generation** | Stable Diffusion 1.5, Stable Diffusion XL, Stable Diffusion 3, HunyuanDiT, etc. | Support text-to-image, image-to-image, image inpainting, based on **diffusers** |
| **Face Swapping** | **deep-live-cam** | |
| **OCR** | **Paddle OCR** | |
| **Object Detection** | **YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv11, YOLOx** | |
| **Object Tracking** | FairMot | |
| **Image Segmentation** | RBMGv1.4, PPMatting, **Segment Anything** | |
| **Classification** | ResNet, MobileNet, EfficientNet, PPLcNet, GhostNet, ShuffleNet, SqueezeNet | |
| **API Services** | OPENAI, DeepSeek, Moonshot | Support LLM and AIGC services |
> 更多查看[已部署模型列表详解](docs/zh_cn/quick_start/model_list.md)
## 快速开始
- **步骤一:安装**
```bash
pip install --upgrade nndeploy
```
- **步骤二:启动可视化界面**
```bash
# 方式一:命令行
nndeploy-app --port 8000
# 方式二:代码启动
cd path/to/nndeploy
python app.py --port 8000
```
启动成功后,打开 http://localhost:8000 即可访问工作流编辑器。在这里,你可以拖拽节点、调整参数、实时预览效果,所见即所得。
- **步骤三:保存并加载运行**
在可视化界面中搭建、调试完成后,点击保存,工作流会导出 JSON 文件,文件中封装了所有的处理流程。你可以用以下两种方式在**生产环境**中运行:
- 方式一:命令行运行
用于调试
```bash
# Python CLI
nndeploy-run-json --json_file path/to/workflow.json
# C++ CLI
nndeploy_demo_run_json --json_file path/to/workflow.json
```
- 方式 2:在 Python/C++ 代码中加载运行
可以将 JSON 文件集成到你现有的 Python 或 C++ 项目中,以下是一个加载和运行 LLM 工作流的示例代码:
- Python API 加载运行 LLM 工作流
```Python
graph = nndeploy.dag.Graph("")
graph.remove_in_out_node()
graph.load_file("path/to/llm_workflow.json")
graph.init()
input = graph.get_input(0)
text = nndeploy.tokenizer.TokenizerText()
text.texts_ = [ "<|im_start|>user\nPlease introduce NBA superstar Michael Jordan<|im_end|>\n<|im_start|>assistant\n" ]
input.set(text)
status = graph.run()
output = graph.get_output(0)
result = output.get_graph_output()
graph.deinit()
```
- C++ API 加载运行 LLM 工作流
```C++
std::shared_ptr graph = std::make_shared("");
base::Status status = graph->loadFile("path/to/llm_workflow.json");
graph->removeInOutNode();
status = graph->init();
dag::Edge* input = graph->getInput(0);
tokenizer::TokenizerText* text = new tokenizer::TokenizerText();
text->texts_ = {
"<|im_start|>user\nPlease introduce NBA superstar Michael Jordan<|im_end|>\n<|im_start|>assistant\n"};
input->set(text, false);
status = graph->run();
dag::Edge* output = graph->getOutput(0);
tokenizer::TokenizerText* result =
output->getGraphOutput();
status = graph->deinit();
```
> 要求 Python 3.10+,默认包含 ONNXRuntime、MNN,更多推理后端请采用开发者模式。
## 文档
- [编译](docs/zh_cn/quick_start/build.md)
- [可视化工作流](docs/zh_cn/quick_start/workflow.md)
- [最佳实践](docs/zh_cn/quick_start/deploy.md)
- [Python 自定义节点开发手册](docs/zh_cn/quick_start/plugin_python.md)
- [C++自定义节点开发手册](docs/zh_cn/quick_start/plugin.md)
- [接入新推理框架](docs/zh_cn/developer_guide/how_to_support_new_inference.md)
## 性能测试
测试环境:Ubuntu 22.04,i7-12700,RTX3060
- **流水线并行加速**。以 YOLOv11s 端到端工作流总耗时,串行 vs 流水线并行

| 运行方式\推理引擎 | ONNXRuntime | OpenVINO | TensorRT |
| ----------------- | ----------- | --------- | --------- |
| 串行 | 54.803 ms | 34.139 ms | 13.213 ms |
| 流水线并行 | 47.283 ms | 29.666 ms | 5.681 ms |
| 性能提升 | 13.7% | 13.1% | 57% |
- **任务并行加速**。组合任务(分割 RMBGv1.4+检测 YOLOv11s+分类 ResNet50)的端到端总耗时,串行 vs 任务并行

| 运行方式\推理引擎 | ONNXRuntime | OpenVINO | TensorRT |
| ----------------- | ----------- | ---------- | --------- |
| 串行 | 654.315 ms | 489.934 ms | 59.140 ms |
| 任务并行 | 602.104 ms | 435.181 ms | 51.883 ms |
| 性能提升 | 7.98% | 11.2% | 12.2% |
## 下一步计划
- [工作流生态](https://github.com/nndeploy/nndeploy/issues/191)
- [端侧大模型推理](https://github.com/nndeploy/nndeploy/issues/161)
- [架构优化](https://github.com/nndeploy/nndeploy/issues/189)
- [AI Box](https://github.com/nndeploy/nndeploy/issues/190)
## 联系我们
- 如果你热爱开源、喜欢折腾,不论是出于学习目的,亦或是有更好的想法,欢迎加入我们
- 微信:Always031856(欢迎加好友,进群交流,备注:nndeploy\_姓名)
## 致谢
- 感谢以下项目:[TNN](https://github.com/Tencent/TNN)、[FastDeploy](https://github.com/PaddlePaddle/FastDeploy)、[opencv](https://github.com/opencv/opencv)、[CGraph](https://github.com/ChunelFeng/CGraph)、[tvm](https://github.com/apache/tvm)、[mmdeploy](https://github.com/open-mmlab/mmdeploy)、[FlyCV](https://github.com/PaddlePaddle/FlyCV)、[oneflow](https://github.com/Oneflow-Inc/oneflow)、[flowgram.ai](https://github.com/bytedance/flowgram.ai)、[deep-live-cam](https://github.com/hacksider/Deep-Live-Cam)。
- 感谢[HelloGithub](https://hellogithub.com/repository/nndeploy/nndeploy)推荐
## 贡献者
[](https://star-history.com/#nndeploy/nndeploy)