{"id":16794734,"url":"https://github.com/NetEase-Media/grps","last_synced_at":"2025-11-04T08:30:23.978Z","repository":{"id":247395687,"uuid":"824124601","full_name":"NetEase-Media/grps","owner":"NetEase-Media","description":"【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架，支持dynamic batching、streaming模式，支持python/c++双语言，可限制，可拓展，高性能。帮助用户快速地将模型部署到线上，并通过http/rpc接口方式提供服务。","archived":false,"fork":false,"pushed_at":"2025-01-10T02:21:03.000Z","size":71059,"stargazers_count":154,"open_issues_count":0,"forks_count":13,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-02-16T19:11:17.307Z","etag":null,"topics":["dynamic-batching","serving","tensorflow","tensorrt","tensorrt-llm","torch","triton-inference-server","vllm"],"latest_commit_sha":null,"homepage":"https://zhuanlan.zhihu.com/p/707491462","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NetEase-Media.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-04T12:12:03.000Z","updated_at":"2025-02-14T06:02:26.000Z","dependencies_parsed_at":"2024-11-08T08:39:17.114Z","dependency_job_id":null,"html_url":"https://github.com/NetEase-Media/grps","commit_stats":null,"previous_names":["netease-media/grps"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NetEase-Media%2Fgrps","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NetEase-Media%2Fgrps/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NetEase-Media%2Fgrps/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NetEase-Media%2Fgrps/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NetEase-Media","download_url":"https://codeload.github.com/NetEase-Media/grps/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239428066,"owners_count":19636886,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dynamic-batching","serving","tensorflow","tensorrt","tensorrt-llm","torch","triton-inference-server","vllm"],"created_at":"2024-10-13T09:01:45.791Z","updated_at":"2025-11-04T08:30:23.912Z","avatar_url":"https://github.com/NetEase-Media.png","language":"C++","readme":"![grps.png](./docs/grps.png)\n\n# GRPS(Generic Realtime Prediction Service)\n\n![grps_outline.png](./docs/grps_outline.png)\n\n## 1. 介绍\n\n一款支持```tf/torch/trt/vllm/trtllm```以及更多```nn```框架的、稳定的、性能较好的模型在线部署框架，核心目的是帮助用户快速搭建一个在线模型推理服务，将模型部署到线上生产环境，并通过```HTTP/RPC```接口方式提供服务。\n\ngrps具有如下特性：\n\n* 通用性：通用服务访问接口和自定义框架，自定义拓展不限制模型类型，不限制深度学习框架，不限制前后处理。\n* 易用性：内置```tensorflow```、```torch```以及```tensorrt```（支持多流模式）推理后端，可以实现一键快速部署服务，自定义拓展简单。\n* 双语言：支持纯```c++```实现的高性能服务以及python轻量级服务，一键快捷部署功能由纯```c++```实现。\n* 可拓展：提供```c++```和```py```两种编程语言自定义工程，用户可以自定义自己的前后处理、模型推理以及```http```接口格式等。\n* 可限制：通过配置方式可以实现```tf```和```torch```显存限制功能，适用于共享gpu场景；通过配置方式可以实现服务并发限制。\n* 可监控：提供用户日志系统，内置指标监控系统，内置简洁的web图形展示页面，可以便捷的观测```qps```、```latency```、```cpu```、```gpu```使用情况等。\n* 灵活访问方式：自动适配```http```、```gRPC```、```bRPC```访问方式，提供```py```、```c++```、```java```等客户端样例。\n* Batching：支持```dynamic batching```，充分利用```gpu```资源，提高推理性能与吞吐。\n* Streaming：支持模型持续推理并返回结果，适用于自然语言生成、视频处理等场景。\n* 多模型支持：支持部署多个模型，多个模型可以组合成一个服务或者单独提供服务。\n* 多卡支持：支持配置方式选择gpu部署模型，支持监控多```gpu```使用情况。\n* 更好的性能：通过```rpc```支持、纯```c++```服务支持、```tensorrt```多流推理支持、```dynamic batching```支持等等，使得服务能够达到更高的性能。\n* LLM支持：目前通过自定义后端插件方式支持trtllm/vllm，见[grps-trtllm](https://github.com/NetEase-Media/grps_trtllm)，[grps-vllm](https://github.com/NetEase-Media/grps_vllm)。目前前者兼容OpenAI协议，后者也会进行OpenAI协议兼容开发。\n\n## 2. 目录结构\n\n```\n|--apis: 接口定义\n|--deps: 环境依赖\n|--docker: docker相关\n|--docs: 文档\n|--grpst: grps工具链\n|--server: grps server实现\n|--template: grps自定义工程模板\n|--grps_client_env.sh: grps client环境配置脚本\n|--grps_client_install.sh: grps client安装脚本\n|--grps_env.sh: grps环境配置脚本\n|--grps_install.sh: grps安装脚本\n```\n\n## 3. 文档教程\n\n* [快速开始](./docs/1_QuickStart.md)\n* [接口说明](./docs/2_Interface.md)\n* [grpst工具使用说明](./docs/3_Grpst.md)\n* [快捷部署](./docs/4_QuickDeploy.md)\n* [自定义模型工程](./docs/5_Customized.md)\n* [前后处理转换器](./docs/6_InternalConverter.md)\n* [NN推理后端](./docs/7_InternalInferer.md)\n* [客户端说明](./docs/8_Client.md)\n* [日志系统](./docs/9_Logger.md)\n* [服务指标监控说明](./docs/10_Monitor.md)\n* [自定义HTTP](./docs/11_CustomizedHttp.md)\n* [Streaming](./docs/12_Streaming.md)\n* [Batching](./docs/13_Batching.md)\n* [TRT多流模式](./docs/20_TrtMultiStream.md)\n* [多模型支持](./docs/14_MultiModels.md)\n* [服务限制](./docs/15_ServiceLimit.md)\n* [Docker部署](./docs/16_DockerDeploy.md)\n* [从源码构建](./docs/17_BuildFromSources.md)\n* [远程开发与调试](./docs/18_RemoteDev.md)\n* [镜像列表](./docs/19_ImageList.md)\n* [FAQ](./docs/90_FAQ.md)\n* [样例](https://github.com/NetEase-Media/grps_examples)\n\n## 4. TODO\n\n框架在持续开发中，计划在未来版本支持：\n\n* ```nn```框架新版本的支持，可以提```issue```进行申请支持。\n* 支持更多的推理后端，例如```onnx-runtime```等。\n* 支持更多```batching```算法，例如```continuous batching```。\n* 支持分布式组装服务，由多个推理后端组装成完整服务。\n* 模型推理性能分析工具。\n","funding_links":[],"categories":["C++"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNetEase-Media%2Fgrps","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNetEase-Media%2Fgrps","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNetEase-Media%2Fgrps/lists"}