{"id":19054672,"url":"https://github.com/ssbuild/aigc_serving","last_synced_at":"2025-10-14T17:08:37.311Z","repository":{"id":182705814,"uuid":"668935486","full_name":"ssbuild/aigc_serving","owner":"ssbuild","description":"aigc_serving lightweight and efficient Language service model reasoning","archived":false,"fork":false,"pushed_at":"2024-06-12T16:14:11.000Z","size":1308,"stargazers_count":24,"open_issues_count":5,"forks_count":2,"subscribers_count":1,"default_branch":"dev","last_synced_at":"2025-04-18T12:17:50.002Z","etag":null,"topics":["aigc","aigc-serving","chat","gpt-serving","inference","langchain","llm","llm-model"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ssbuild.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-21T00:43:36.000Z","updated_at":"2024-12-08T13:25:42.000Z","dependencies_parsed_at":"2023-10-17T06:55:40.558Z","dependency_job_id":"49ac6207-83ec-496a-82d2-90812cad686e","html_url":"https://github.com/ssbuild/aigc_serving","commit_stats":null,"previous_names":["ssbuild/aigc_serving","ssbuild/localserving"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssbuild%2Faigc_serving","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssbuild%2Faigc_serving/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssbuild%2Faigc_serving/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssbuild%2Faigc_serving/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ssbuild","download_url":"https://codeload.github.com/ssbuild/aigc_serving/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250552866,"owners_count":21449288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aigc","aigc-serving","chat","gpt-serving","inference","langchain","llm","llm-model"],"created_at":"2024-11-08T23:39:20.522Z","updated_at":"2025-10-14T17:08:32.266Z","avatar_url":"https://github.com/ssbuild.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n## 简介\n\n   aigc_serving lightweight and efficient Language service model reasoning\n\n\n   \n   ![llm.png](assets/llm.png)\n   \u003cdiv align=\"center\"\u003e 图片来自于论文: [A Survey of Large Language Models](https://arxiv.org/pdf/2303.18223.pdf) \u003c/div\u003e\n\n## update information\n```text\n    2024-04-22 0.3.0\n    2023-12-07 support sus-chat and deepseek-coder\n    2023-12-04 support auto-gptq , such as 4bits 8bits\n    2023-12-03 limit model_max_length for inputs\n    2023-12-02 support qwen 1.8b 7b 14b 72b and \n          support chat chat_stream is for openai_chat ,  \n          generate generate_stream is for openai , \n          support batch for generate method\n    2023-11-28 support autoawq , such as 4bits 8bits\n    2023-11-27 yi model_type change to llama\n    2023-11-22 support sentence_transformers , such as bge,m3e and so on\n    2023-11-20 support seed for generator sample and support bianque2 , lingowhale\n    2023-11-06 fix pydantic 2 and support api_keys in config\n    2023-11-04 support yi aigc-zoo\u003e=0.2.7.post2 , 支持 pydantic \u003e= 2 \n    2023-11-01 support bluelm aigc-zoo\u003e=0.2.7.post1\n    2023-10-31 support chatglm3,CausalLM,skywork , aigc-zoo\u003e=0.2.7.post0\n    2023-10-11 support t5\n    2023-09-13 支持模型别名\n    2023-09-11 增加支持且仅支持 ptv2 for chatglm,chatglm2\n    2023-09-06 support baichuan2\n    2023-09-03 增加tiger , openbuddy 模板 ， test openbuddy-70b passed\n    2023-08-26 fix same group stream order\n    2023-08-25 aigc-zoo 0.2.0.post1 support xverse-13b-chat , 已有模型 实现 stop 功能\n    2023-08-20 support embedding\n    2023-08-17 add tiger-chat-13b\n    2023-08-16 推理可选使用 Rope NtkScale , 不训练扩展推理长度\n    2023-08-14 支持lora model 基础模型和lora头切换\n    2023-08-12 增加通义千问模型工具调用示例，支持 **`function call`** 特性，调用方式参考 [邮件发送助手](./tests/email_sender.py)、[定积分计算器](./tests/quad_calculator.py)、[SQL查询](./tests/sql_querier.py)\n    2023-08-11 qwen官方配置文件更新，请使用aigc-zoo 0.1.17.post0 , 并更新官方config.json , generation_config.json 等信息\n    2023-08-10 0.1.17 release , fix new bugs\n    2023-08-08 support xverse-13b , 版本要求 deep_training 0.1.15.rc2\n    2023-08-07 support llama llama2 量化推理 , 版本要求 deep_training 0.1.15.rc1\n    2023-08-05 aigc_zoo 最低版本0.1.14 \n    2023-08-03 support qwen\n    2023-08-02 support muti lora infer , 手动升级 aigc_zoo , pip install -U git+https://github.com/ssbuild/deep_training.zoo.git --force-reinstall --no-deps\n    2023-07-27 support openai client\n    2023-07-26 support streaming\n    2023-07-24 support chat\n    2023-07-23 support deepspeed , accelerate\n```\n\n## 注意\n   - 推荐环境: linux python \u003e=3.10 torch \u003e= 2.0.1\n\n## install\npip install -r requirements.txt\n\n## 支持的模型\n支持且不限于以下模型 ，原则上支持transformer 全系列\n\n| 模型             | 16bit | 4bit | ptv2 | deepspeed | accelerate | hf |\n|----------------|-------|------|------|-----------|------------|----|\n| baichuan-7b    | √     | √    | ×    | √         | √          | √  |\n| baichuan-13b   | √     | √    | ×    | √         | √          | √  |\n| baichuan2-7b   | √     | √    | ×    | √         | √          | √  |\n| baichuan2-13b  | √     | √    | ×    | √         | √          | √  |\n| bloom          | √     | ×    | ×    | √         | √          | √  |\n| casuallm       | √     | √    | ×    | √         | √          | √  |\n| chatglm        | √     | √    | √    | √         | √          | √  |\n| chatglm2       | √     | √    | √    | √         | √          | √  |\n| chatglm3       | √     | √    | √    | √         | √          | √  |\n| internlm       | √     | √    | ×    | √         | √          | √  |\n| llama          | √     | √    | ×    | √         | √          | √  |\n| moss           | √     | √    | ×    | √         | √          | √  |\n| openbuddy      | √     | √    | ×    | √         | √          | √  |\n| opt            | √     | ×    | ×    | √         | √          | √  |\n| qwen           | √     | √    | ×    | √         | √          | √  |\n| rwkv           | √     | ×    | ×    | √         | √          | √  |\n| t5             | √     | ×    | ×    | √         | √          | √  |\n| tiger          | √     | ×    | ×    | √         | √          | √  |\n| xverse         | √     | √    | ×    | √         | √          | √  |\n| bluelm         | √     | √    | ×    | √         | √          | √  |\n| yi             | √     | √    | ×    | √         | √          | √  |\n| bianque2       | √     | √    | ×    | √         | √          | √  |\n| lingowhale     | √     | √    | ×    | √         | √          | √  |\n| sus_chat       | √     | √    | ×    | √         | √          | √  |\n| deepseek       | √     | √    | ×    | √         | √          | √  |\n| deepseek_coder | √     | √    | ×    | √         | √          | √  |\n\n## docker\n\n### build\n```commandline\ncd aigc_serving\ndocker build -f docker/Dockerfile -t aigc_serving ..\n```\n### docker run\n```commandline\ndocker run -it --runtime=nvidia --name aigc_serving aigc_serving:latest /bin/bash\n```\n## 模型配置 \n[config.yaml](config/config.yaml)\n更多模型配置参见 assets/template\n\n## 依赖\n - [aigc-zoo](https://pypi.org/project/aigc-zoo/#history)\n - [deep-training](https://pypi.org/project/deep-training/#history)\n\n\n\n\n## 服务启动和停止\n\n```commandline\n# 启动\ncd script\nbash start.sh\n# 停止\ncd script\nbash stop.sh\n```\n## 加密服务启动和停止\n### 第一步加密工程\n```commandline\npip install -U se_imports\ncd serving/cc\npython cc.py\n```\n### 第二步加密工程部署\n```commandline\npip install -U se_imports\n# 启动\ncd script_se\nbash start.sh\n# 停止\ncd script_se\nbash stop.sh\n```\n\n\n## 推荐模型指标评估\n -  [openai/evals](https://github.com/openai/evals)\n -  [ssbuild/aigc_evals](https://github.com/ssbuild/aigc_evals) \n\n## 推荐界面 ChatGPT-Next-Web 或者 dify \n![界面](assets/t5.png)\n![界面](assets/1.png)\n![界面](assets/moss.png)\n![界面](assets/xverse.png)\n\n\n\n## 客户端测试 tests\n\n## openai 接口  \n### chat demo tests/test_openai_chat.py\n\n```text\nimport openai\n\nopenai.api_key = \"EMPTY\"\nopenai.api_base = \"http://192.168.2.180:8081/v1\"\nmodel = \"chatglm2-6b-int4\"\nmodel = \"qwen-7b-chat-int4\"\n\n# # Test list models API\n# models = openai.Model.list()\n# print(\"Models:\", models)\n\n# Test completion API\nstream = False\n\ndata = {\n    \"model\": model,\n    \"adapter_name\": None, # lora头\n    \"prompt\": [\"你是谁?\"],\n    \"top_p\": 0.8,\n    \"temperature\": 1.0,\n    \"frequency_penalty\": 1.01,\n    \"stream\": stream,\n    \"nchar\": 1,# stream 字符\n    \"n\": 1, # 返回 n 个choices\n    # \"stop\": [\"Observation:\",\"Observation:\\n\"]\n}\n\n\ncompletion = openai.Completion.create(**data)\nif stream:\n    text = ''\n    for choices in completion:\n        c = choices.choices[0]\n        text += c.text\n        print(c.text)\n    print(text)\nelse:\n    for choice in completion.choices:\n        print(\"result:\", choice.text)\n\n```\n\n\n### embedding tests/test_openai_embedding.py\n```python\nimport openai\n# 新版本\nopenai.api_key = \"EMPTY\"\nopenai.api_base = \"http://192.168.2.180:8081/v1\"\n\nmodel = \"chatglm2-6b-int4\"\nmodel = \"qwen-7b-chat-int4\"\n\n# # Test list models API\n# models = openai.Model.list()\n# print(\"Models:\", models)\n\n# Test completion API\nstream = False\n\ndata = {\n    \"model\": model,\n    \"adapter_name\": None, # lora头\n    \"input\": [\"你是谁\",],\n}\n\n\ncompletion = openai.Embedding.create(**data)\n\nfor d in completion.data:\n    print(d)\n```\n\n## \n    纯粹而干净的代码\n\n\n## 注意事项\n - 1、 如果deepspeed ， 确保 num_attention_heads % len(device_id) == 0\n - 2、 模型键值，必须以模型名开始 , 不区分大小写\n\n\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=ssbuild/aigc_serving\u0026type=Date)](https://star-history.com/#ssbuild/aigc_serving\u0026Date)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssbuild%2Faigc_serving","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fssbuild%2Faigc_serving","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssbuild%2Faigc_serving/lists"}