{"id":18673789,"url":"https://github.com/opencsgs/llm-inference","last_synced_at":"2025-04-12T01:32:00.383Z","repository":{"id":226007525,"uuid":"764401124","full_name":"OpenCSGs/llm-inference","owner":"OpenCSGs","description":"llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.","archived":false,"fork":false,"pushed_at":"2024-04-14T13:51:16.000Z","size":508,"stargazers_count":31,"open_issues_count":13,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-14T16:25:16.572Z","etag":null,"topics":["deepspeed","llama-cpp","llm-inference","ray","transformer","vllm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenCSGs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"Roadmap.md","authors":null,"dei":null}},"created_at":"2024-02-28T02:15:07.000Z","updated_at":"2024-04-16T15:30:11.214Z","dependencies_parsed_at":"2024-04-16T15:30:04.017Z","dependency_job_id":null,"html_url":"https://github.com/OpenCSGs/llm-inference","commit_stats":null,"previous_names":["opencsgs/llm-inference"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fllm-inference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fllm-inference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fllm-inference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fllm-inference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenCSGs","download_url":"https://codeload.github.com/OpenCSGs/llm-inference/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248504291,"owners_count":21115142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deepspeed","llama-cpp","llm-inference","ray","transformer","vllm"],"created_at":"2024-11-07T09:16:33.558Z","updated_at":"2025-04-12T01:32:00.026Z","avatar_url":"https://github.com/OpenCSGs.png","language":"Python","readme":"# LLM Inference - Quickly Deploy Productive LLM Service\n\n[中文文档](./README_cn.md)\n\n`LLM Inference` is a large language model serving solution for deploying productive LLM services.\n\nWe gained a great deal of inspiration and motivation from [this open source project](https://github.com/ray-project/ray-llm). We are incredibly grateful to them for providing us with the chance to further explore and innovate by standing on the shoulders of giants.\n\n\u003cimg src=\"./docs/llm-inference.png\" alt=\"image\" width=600 height=\"auto\"\u003e\n\n### TL;DR\n\nLlm-inference is a platform for deploying and managing LLM (Lifelong Learning Machine) inference tasks with the following features:\n\n- Utilizes Ray technology to organize multiple nodes into a cluster, achieving centralized management of computational resources and distributing resources required for each inference task.\n- Provides a comprehensive management interface to monitor various states of LLM inference tasks, including resource utilization, the number of replicas, logs, etc.\n- Supports automatic scaling out of inference tasks, dynamically adjusting computational resources based on the volume of requests to meet user needs at different times and optimizing resource usage.\n- Implements serverless inference by automatically shutting down resources when there are no active inference tasks, preventing unnecessary resource waste.\n- Supports various inference frameworks and formats, including hg transformer (PyTorch), DeepSpeed, GGUF, VLLM, etc., with an ongoing expansion of supported frameworks.\n- Establishes user-friendly inference task publishing standards using YAML configurations for model inference loading and execution parameters, such as the framework used, batch size, serverless scaling policies, and more, to lower the barrier to entry for users.\n- Provides REST API or User Interface (UI) support, facilitating access to and management of model inference tasks.\n- Enables streaming capabilities.\n- Supports multiple methods for retrieving models, including from OpenCSG Model Hub, Huggingface Hub, or through customized S3 storage and local storage solutions.\n\nMore features in [Roadmap](./Roadmap.md) are coming soon.\n\n\n## Deployment\n\n### Install `LLM Inference` and dependencies\n\nYou can start by cloning the repository and pip install `llm-serve`. It is recommended to deploy `llm-serve` with Python 3.10+.\n\n```\ngit clone https://github.com/OpenCSGs/llm-inference.git\ncd llm-inference\n```\n\nInstall specified dependencies by components:\n\n```\npip install '.[backend]'\n```\n\n**Note:** `vllm` is optional, since it requires GPU:\n\n```\npip install '.[vllm]'\n```\n\nInstall `llm-inference`:\n```\npip install .\n```\n\n### Start a Ray Cluster locally\n\n```\nray start --head --port=6379 --dashboard-host=0.0.0.0 --dashboard-port=8265\n```\n\n### Quick start\n\nYou can follow the [quick start](./docs/quick_start.md) to run an end-to-end case.\n\n\n## FAQ\n\n### How to use model from local path or git server or S3 storage or OpenCSG Hub\n\nSee the [guide](./docs/git_server_s3_storage.md) for how to use model from local path or git server or S3 storage.\n\n### How to add new models using LLMServe Model Registry\n\nLLMServe allows you to easily add new models by adding a single configuration file.\nTo learn more about how to customize or add new models, see the [LLMServe Model Registry](./models/README.md).\n\n### Developer Guide\n\nSee the [Developer Guide](./docs/developer.md) for how to setup a development environment so you can get started contributing.\n\n### Common Issues\n\nSee the [document](./docs/common_issues.md) for some common issues.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencsgs%2Fllm-inference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopencsgs%2Fllm-inference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencsgs%2Fllm-inference/lists"}