{"id":19730459,"url":"https://github.com/hcd233/aris-ai-model-server","last_synced_at":"2025-05-08T03:13:12.769Z","repository":{"id":248265570,"uuid":"826081439","full_name":"hcd233/Aris-AI-Model-Server","owner":"hcd233","description":"An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API","archived":false,"fork":false,"pushed_at":"2025-04-17T07:44:44.000Z","size":1101,"stargazers_count":14,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-08T03:13:07.661Z","etag":null,"topics":["ai","awq","embedding","fastapi","gptq","llm","mlx","openai-compatible-api","rag","reranker","sentence-transformers","vllm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hcd233.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-09T04:35:43.000Z","updated_at":"2025-04-28T01:20:54.000Z","dependencies_parsed_at":"2024-07-13T15:22:25.355Z","dependency_job_id":"1b80b73f-79ee-444f-9f14-f5fe0331ec5a","html_url":"https://github.com/hcd233/Aris-AI-Model-Server","commit_stats":null,"previous_names":["hcd233/aris-ai-model-server"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcd233%2FAris-AI-Model-Server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcd233%2FAris-AI-Model-Server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcd233%2FAris-AI-Model-Server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcd233%2FAris-AI-Model-Server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hcd233","download_url":"https://codeload.github.com/hcd233/Aris-AI-Model-Server/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252990003,"owners_count":21836668,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","awq","embedding","fastapi","gptq","llm","mlx","openai-compatible-api","rag","reranker","sentence-transformers","vllm"],"created_at":"2024-11-12T00:16:27.171Z","updated_at":"2025-05-08T03:13:12.754Z","avatar_url":"https://github.com/hcd233.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Aris-AI-Model-Server\n\n[ English | [简体中文](README_zh.md) ]\n\n## Introduction\n\nIn AI application development, we often need to deploy multiple models to complete different tasks. For model dialogue services, we need LLM models, and for knowledge base retrieval services, we need Embedding and Reranker models. Therefore, `Aris-AI-Model-Server` was born, focusing on integrating multiple model services into one, providing users with simple and convenient model access capabilities. The project name comes from the character Aris in Blue Archive, as shown in the figure below:\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/93331597.png\" width=\"50%\"\u003e\n  \u003cbr\u003eAris: Character from Blue Archive\n\u003c/p\u003e\n\n---\n\n## Changelog\n\n- [2024-07-13] Aris-AI-Model-Server officially open-sourced.\n\n- [2024-06-23] We released the [Aris-14B-Chat series models](https://huggingface.co/collections/Aris-AI/aris-chat-arcturus-6642fd11069310a4467db222), which are based on [Qwen1.5-14B-Chat](https://huggingface.co/Qwen/Qwen1.5-14B-Chat) and have undergone SFT and DPO on our private dataset of 140K entries. When using this model, please comply with the Qwen open-source license.\n\n## Technology Stack\n\n### Model Backend\n\n#### Embedding\n\n- Sentence Transformers\n\n#### Reranker\n\n- Sentence Transformers\n\n#### LLM\n\n- VLLM\n- MLX\n\n### API Backend\n\n- FastAPI\n\n## API Interfaces\n\n| Route | Request Method | Authentication | OpenAI Compatible | Description |\n| --- | --- | --- | --- | --- |\n| / | GET | ❌ | ❌ | Root directory |\n| /v1/embeddings | GET | ✅ | ❌ | Get all Embedding models |\n| /v1/embeddings | POST | ✅ | ✅ | Call Embedding for text embedding |\n| /v1/rerankers | GET | ✅ | ❌ | Get all Reranker models |\n| /v1/rerankers | POST | ✅ | ❌ | Call Reranker for document reranking |\n| /v1/models | GET | ✅ | ✅ | Get all LLMs |\n| /v1/chat/completions | POST | ✅ | ✅ | Call LLM for dialogue generation |\n\n## Project Structure\n\n```text\n.\n├── assets\n│   └── 110531412.jpg\n├── config # Environment variables and model configuration\n│   ├── .env.template\n│   └── models.yaml.template\n├── dockerfile\n├── main.py\n├── poetry.lock\n├── pyproject.toml\n├── scripts # awq, gptq quantization scripts\n│   ├── autoawq.py\n│   ├── autoawq.sh\n│   ├── autogptq.py\n│   └── autogptq.sh\n└── src\n    ├── api # OpenAI Compatible API\n    │   ├── auth\n    │   │   └── bearer.py\n    │   ├── model\n    │   │   ├── chat_cmpl.py\n    │   │   ├── embedding.py\n    │   │   ├── reranker.py\n    │   │   └── root.py\n    │   └── router\n    │       ├── __init__.py\n    │       ├── root.py\n    │       └── v1\n    │           ├── chat_cmpl.py\n    │           ├── embedding.py\n    │           ├── __init__.py\n    │           └── reranker.py\n    ├── config\n    │   ├── arg.py # Command line arguments\n    │   ├── env.py # Environment variables\n    │   ├── gbl.py # Global variables\n    │   ├── __init__.py\n    │   └── model.py # Model configuration\n    ├── controller\n    │   ├── controller.py # Engine controller\n    │   └── __init__.py\n    ├── engine # Model invocation engine\n    │   ├── base.py\n    │   ├── embedding.py\n    │   ├── mlx.py\n    │   ├── reranker.py\n    │   └── vllm.py\n    ├── logger # Logging library\n    │   └── __init__.py\n    ├── middleware # Middleware\n    │   └── logger\n    │       └── __init__.py\n    └── utils\n        ├── formatter.py # Prompt format (referenced from llama-factory implementation)\n        └── template.py # Format (referenced from llama-factory implementation)\n```\n\n## Local Deployment\n\n### Clone Repository\n\n```bash\ngit clone https://github.com/hcd233/Aris-AI-Model-Server.git\ncd Aris-AI-Model-Server\n```\n\n### Create Virtual Environment (Optional)\n\nThis step is optional, but ensure your Python environment is 3.11\n\n```bash\nconda create -n aris python=3.11.0\nconda activate aris\n```\n\n### Install Dependencies\n\n#### Install poetry\n\n```bash\npip install poetry\n```\n\n#### Install Dependencies Based on Requirements\n\n| Dependency | Description | Command |\n| --- | --- | --- |\n| base | Install basic dependencies for API startup | `poetry install` |\n| reranker | Install dependencies for deploying reranker models | `{{base}}` + `-E reranker` |\n| embedding | Install dependencies for deploying embedding models | `{{base}}` + `-E embedding` |\n| vllm | Install dependencies for vllm backend | `{{base}}` + `-E vllm` |\n| mlx | Install dependencies for mlx backend | `{{base}}` + `-E mlx` |\n| awq | Install dependencies for awq quantization | `{{base}}` + `-E awq` |\n| gptq | Install dependencies for gptq quantization | `{{base}}` + `-E gptq` |\n\nExample: If you want to deploy an embedding model, use awq quantization, and deploy models with vllm, execute the following command to install dependencies:\n\n```bash\npoetry install -E embedding -E awq -E vllm\n```\n\n### Configure model.yaml and .env (Omitted)\n\nPlease refer to the template files for specific modifications\n\n```bash\ncp config/models.yaml.template models.yaml\ncp config/.env.template .env\n```\n\n### Start API\n\n```bash\npython main.py --config_path models.yaml\n```\n\n### Model Quantization\n\n#### awq\n\n```bash\nbash scripts/autoawq.sh\n```\n\n#### gptq\n\n```bash\nbash scripts/autogptq.sh\n```\n\n## Docker Deployment\n\nNot available yet\n\n## Project Outlook\n\n### Goals\n\n1. Architecture division: Expand from single-machine version to kubernetes-based distributed version\n2. Enrich backends: Support more model backends, such as Triton, ONNX, etc.\n\n### Author Status\n\nDue to busy work, project progress may be slow, updates will be occasional. PRs and Issues are welcome.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhcd233%2Faris-ai-model-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhcd233%2Faris-ai-model-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhcd233%2Faris-ai-model-server/lists"}