{"id":19085963,"url":"https://github.com/openlmlab/moss_vortex","last_synced_at":"2025-10-09T20:03:51.152Z","repository":{"id":154464529,"uuid":"631192909","full_name":"OpenLMLab/MOSS_Vortex","owner":"OpenLMLab","description":"Moss Vortex is a lightweight and high-performance deployment and inference backend engineered specifically for MOSS 003, providing a wealth of features aimed at enhancing performance and functionality, built upon the foundations of MOSEC and Torch.","archived":false,"fork":false,"pushed_at":"2023-04-25T05:28:01.000Z","size":266,"stargazers_count":37,"open_issues_count":4,"forks_count":9,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-10-09T20:03:32.428Z","etag":null,"topics":["chatgpt","deployment","inference","natural-language-processing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenLMLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-04-22T08:24:32.000Z","updated_at":"2024-01-18T08:43:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"ab1d8826-3ea2-488c-87ac-a6b4140e7f3c","html_url":"https://github.com/OpenLMLab/MOSS_Vortex","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/OpenLMLab/MOSS_Vortex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenLMLab%2FMOSS_Vortex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenLMLab%2FMOSS_Vortex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenLMLab%2FMOSS_Vortex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenLMLab%2FMOSS_Vortex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenLMLab","download_url":"https://codeload.github.com/OpenLMLab/MOSS_Vortex/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenLMLab%2FMOSS_Vortex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279002012,"owners_count":26083258,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","deployment","inference","natural-language-processing"],"created_at":"2024-11-09T02:57:57.120Z","updated_at":"2025-10-09T20:03:51.146Z","avatar_url":"https://github.com/OpenLMLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MOSS Vortex\n\nMoss Vortex is a lightweight, fast, simple, and high-performance deployment and inference backend engineered specifically for MOSS 003, providing a wealth of features aimed at enhancing performance and functionality, built upon the foundations of MOSEC and Torch.  \n\nYou only need to execute a few commands and spend a few minutes to quickly deploy your MOSS 003 inference server on your own GPU server.\n\n\nThe features encompass:\n\n* Websocket-based streaming output: MOSS Vortex utilizes Websockets to enable real-time, bidirectional communication between the server and clients. This allows for efficient streaming of output, providing faster response times and improved user experience.\n\n* Multiple sampling strategies for LLM generation: The application supports various sampling strategies for Large Language Models (LLMs) to improve the quality and diversity of generated content. This allows for better control over the output and helps fine-tune the results based on specific requirements.\n\n* Infinite conversation loops: MOSS Vortex is designed to handle extended dialogues between users and the AI, facilitating engaging and dynamic conversations without any limitations on the number of exchanges.\n\n* Support for custom tools: The application offers support for multiple custom tools, allowing users to integrate and utilize additional functionality based on their needs. This flexibility ensures that MOSS Vortex can adapt to a wide range of use cases and requirements.\n\n* ONNX model acceleration: MOSS Vortex takes advantage of ONNX (Open Neural Network Exchange) format for model acceleration, optimizing the performance of the underlying deep learning models. This ensures faster inference times and more efficient resource utilization.\n\n* Model parallelism: The application leverages model parallelism techniques to distribute the workload across multiple GPUs or other processing units. This allows for improved scalability and performance, particularly when dealing with large-scale models and data.  \n\nThe main flaw of MOSS Vortex is does not implement _Token Batching_, which is crucial for LLM reasoning, and I will implement it shortly.  \n\n\n\u003cimg src=\"./img/search_case.jpeg\" alt=\"example\" width=\"400\" height=\"400\"\u003e\n\n\n## QuickStart\n\nTo quickly deploy Moss Vortex using Docker:  \n\n\n```\ngit clone https://github.com/piglaker/vortex.git  \ncd Vortex  \nbash install_run.sh     \n```\n## Test\nTo run a test on MOSS Vortex:  \n```\nbash scripts/test.sh\n```  \n\n## Interface\n\n```\ncurl -X POST http://127.0.0.1:21333/inference -d \\\n'{\"x\": \"\u003c|Human|\u003e: hello\u003ceoh\u003e\\n\u003c|Inner thoughts|\u003e: None.\u003ceot\u003e\\n\u003c|Commands|\u003e: None\u003ceoc\u003e\\n\u003c|Results|\u003e: None\u003ceor\u003e\\n\u003c|MOSS|\u003e:\", \\  \n\"max_iterations\":\"128\", \\  \n\"temperature\":\"0.7\", \\\n\"repetition_penalty\":\"1.1\"\\\n}'\n```\n\nReturn Format: \n```\n\u003e\u003e bash scripts/short_vortex_test.sh\n#date\n{\n  \"pred\": \"\u003c|Human|\u003e: hello\u003ceoh\u003e\\n\u003c|Inner Thoughts|\u003e: None.\u003ceot\u003e\\n\u003c|Commands|\u003e: None\u003ceoc\u003e\\n\u003c|Results|\u003e: None\u003ceor\u003e\\n\u003c|MOSS|\u003e: Hello! How may I assist you today?\u003ceom\u003e\", \\\n\"input_token_num\": 331, \\\n\"new_generations_token_num\": 10 \\\n\"new_generations\": \" Hello! How may I assist you today?\u003ceom\u003e\"\n}\n#date\n```\n\n\n## Metrics\nTo check the metrics:  \n```\nhttp 127.0.0.1:21333/metrics\n```\n\nLogs Format:\n```\n2023-04-18 00:50:46,707 - 210 - INFO - mosec_server.py:652 - \u003c|Human|\u003e: 写一段python快排代码\u003ceoh\u003e\n\u003c|Inner Thoughts|\u003e: None\u003ceot\u003e\n\u003c|Commands|\u003e: None\u003ceoc\u003e\n\u003c|Results|\u003e: None\u003ceor\u003e\n\u003c|MOSS|\u003e: 这里是一个简单的Python快速排序的代码示例：\n\n`python\ndef quick_sort(arr):\n    if len(arr) \u003c= 1:\n        return arr\n\n    pivot = arr[0]  # 选择第一个元素作为基准点\n    left = [x for x in arr[1:] if x \u003c= pivot]\n    right = [x for x in arr[1:] if x \u003e pivot]\n\n    return quick_sort(left) + right\n\n# 示例\nprint(quick_sort([3, 6, 8, 10, 1, 2]))\n`\n\n输出结果为 `[1, 2, 3, 6, 8, 10]`。\u003ceom\u003e\n2023-04-18 00:51:30,113 - 213 - INFO - mosec_server.py:743 - [MOSEC] [FORWARD] First Token Generation Cost: 0.09637761116027832\n2023-04-18 00:51:30,461 - 213 - INFO - mosec_server.py:747 - [MOSEC] [FORWARD] Recent Token Generation Cost: 0.04331459999084473\n2023-04-18 00:51:30,912 - 213 - INFO - mosec_server.py:542 - [MOSEC] [STREAM] Graceful close websockets \n2023-04-18 00:51:30,912 - 213 - INFO - mosec_server.py:623 - [MOSEC] [INFER] Request Cost: 0.8990638256072998\n```\n\n## Configuration\n\nThe following configurations are available for Moss Vortex:  \n- CUDA Version: 11.7  \n- GPU: 8 * A800 (Recommended)  \n- Default Batch Size: 8   \n- Default Wait Time for Batching: 10  \n- Default Infer Timeout: 70,000 ms  \n- Port: 21333 (used for Nginx)  \n- Mosec Version: 0.6  \n\n\n## Citation\nIf you use Moss Vortex in your work, please cite it as follows:\n```\n@software{MOSS_Vortex2023,  \n  title = {{Moss Vortex: An advanced deployment and inference backend for MOSS based on MOSEC and Torch}},  \n  author = {Xiaotian Zhang, Zhengfu He, Tianxiang Sun},  \n  url = {https://github.com/piglaker/Vortex},  \n  year = {2023}  \n}\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenlmlab%2Fmoss_vortex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenlmlab%2Fmoss_vortex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenlmlab%2Fmoss_vortex/lists"}