{"id":13633238,"url":"https://github.com/tensorchord/modelz-llm","last_synced_at":"2025-04-05T05:03:46.349Z","repository":{"id":167122157,"uuid":"641211854","full_name":"tensorchord/modelz-llm","owner":"tensorchord","description":"OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others)","archived":false,"fork":false,"pushed_at":"2023-10-11T01:47:08.000Z","size":94,"stargazers_count":275,"open_issues_count":12,"forks_count":27,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-29T04:05:28.823Z","etag":null,"topics":["llm","nlp","openai-api","transformer"],"latest_commit_sha":null,"homepage":"https://modelz.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tensorchord.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-05-16T02:34:17.000Z","updated_at":"2025-03-25T21:36:28.000Z","dependencies_parsed_at":"2024-01-14T08:54:40.438Z","dependency_job_id":"40bd29d0-2421-45e4-bcba-fe7a313f1534","html_url":"https://github.com/tensorchord/modelz-llm","commit_stats":null,"previous_names":["tensorchord/modelz-llm"],"tags_count":35,"template":false,"template_full_name":"tensorchord/modelz-template-mosec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorchord%2Fmodelz-llm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorchord%2Fmodelz-llm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorchord%2Fmodelz-llm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorchord%2Fmodelz-llm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tensorchord","download_url":"https://codeload.github.com/tensorchord/modelz-llm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247289409,"owners_count":20914464,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","nlp","openai-api","transformer"],"created_at":"2024-08-01T23:00:31.483Z","updated_at":"2025-04-05T05:03:46.329Z","avatar_url":"https://github.com/tensorchord.png","language":"Python","funding_links":[],"categories":["Serving","Large Model Serving"],"sub_categories":["Large Model Serving"],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Modelz LLM\n\n\u003c/div\u003e\n\n\u003cp align=center\u003e\n\u003ca href=\"https://discord.gg/KqswhpVgdU\"\u003e\u003cimg alt=\"discord invitation link\" src=\"https://dcbadge.vercel.app/api/server/KqswhpVgdU?style=flat\"\u003e\u003c/a\u003e\n\u003ca href=\"https://twitter.com/TensorChord\"\u003e\u003cimg src=\"https://img.shields.io/twitter/follow/tensorchord?style=social\" alt=\"trackgit-views\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nModelz LLM is an inference server that facilitates the utilization of open source large language models (LLMs), such as FastChat, LLaMA, and ChatGLM, on either **local or cloud-based** environments with **OpenAI compatible API**.\n\n## Features\n\n- **OpenAI compatible API**: Modelz LLM provides an OpenAI compatible API for LLMs, which means you can use the OpenAI python SDK or LangChain to interact with the model.\n- **Self-hosted**: Modelz LLM can be easily deployed on either local or cloud-based environments.\n- **Open source LLMs**: Modelz LLM supports open source LLMs, such as FastChat, LLaMA, and ChatGLM.\n- **Cloud native**: We provide docker images for different LLMs, which can be easily deployed on Kubernetes, or other cloud-based environments (e.g. [Modelz](https://modelz.ai))\n\n## Quick Start\n\n### Install\n\n```bash\npip install modelz-llm\n# or install from source\npip install git+https://github.com/tensorchord/modelz-llm.git[gpu]\n```\n\n### Run the self-hosted API server\n\nPlease first start the self-hosted API server by following the instructions:\n\n```bash\nmodelz-llm -m bigscience/bloomz-560m --device cpu\n```\n\nCurrently, we support the following models:\n\n| Model Name | Huggingface Model | Docker Image | Recommended GPU\n| ---------- | ----------- | ---------------- | -- |\n| FastChat T5 | `lmsys/fastchat-t5-3b-v1.0` | [modelzai/llm-fastchat-t5-3b](https://hub.docker.com/repository/docker/modelzai/llm-fastchat-t5-3b/general) | Nvidia L4(24GB) |\n| Vicuna 7B Delta V1.1  | `lmsys/vicuna-7b-delta-v1.1` | [modelzai/llm-vicuna-7b](https://hub.docker.com/repository/docker/modelzai/llm-vicuna-7b/general) | Nvidia A100(40GB) |\n| LLaMA 7B    | `decapoda-research/llama-7b-hf` | [modelzai/llm-llama-7b](https://hub.docker.com/repository/docker/modelzai/llm-llama-7b/general) | Nvidia A100(40GB) |\n| ChatGLM 6B INT4    | `THUDM/chatglm-6b-int4` | [modelzai/llm-chatglm-6b-int4](https://hub.docker.com/repository/docker/modelzai/llm-chatglm-6b-int4/general) | Nvidia T4(16GB) |\n| ChatGLM 6B  | `THUDM/chatglm-6b` | [modelzai/llm-chatglm-6b](https://hub.docker.com/repository/docker/modelzai/llm-chatglm-6b/general) | Nvidia L4(24GB) |\n| Bloomz 560M | `bigscience/bloomz-560m` | [modelzai/llm-bloomz-560m](https://hub.docker.com/repository/docker/modelzai/llm-bloomz-560m/general) | CPU |\n| Bloomz 1.7B | `bigscience/bloomz-1b7` | | CPU |\n| Bloomz 3B | `bigscience/bloomz-3b` |  | Nvidia L4(24GB) |\n| Bloomz 7.1B | `bigscience/bloomz-7b1` | | Nvidia A100(40GB) |\n\n### Use OpenAI python SDK\n\nThen you can use the OpenAI python SDK to interact with the model:\n\n```python\nimport openai\nopenai.api_base=\"http://localhost:8000\"\nopenai.api_key=\"any\"\n\n# create a chat completion\nchat_completion = openai.ChatCompletion.create(model=\"any\", messages=[{\"role\": \"user\", \"content\": \"Hello world\"}])\n```\n\n### Integrate with Langchain\n\nYou could also integrate modelz-llm with langchain:\n\n```python\nimport openai\nopenai.api_base=\"http://localhost:8000\"\nopenai.api_key=\"any\"\n\nfrom langchain.llms import OpenAI\n\nllm = OpenAI()\n\nllm.generate(prompts=[\"Could you please recommend some movies?\"])\n```\n\n## Deploy on Modelz\n\nYou could also deploy the modelz-llm directly on [Modelz](https://docs.modelz.ai):\n\n[![](./docs/images/deploy.svg)](https://cloud.modelz.ai/deployment/template?templateId=5e884bb3-6c32-468e-bc62-95cee55c17d4)\n\n## Supported APIs\n\nModelz LLM supports the following APIs for interacting with open source large language models:\n\n- `/completions`\n- `/chat/completions`\n- `/embeddings`\n- `/engines/\u003cany\u003e/embeddings`\n- `/v1/completions`\n- `/v1/chat/completions`\n- `/v1/embeddings`\n\n## Acknowledgements\n\n- [FastChat](https://github.com/lm-sys/FastChat) for the prompt generation logic.\n- [Mosec](https://github.com/mosecorg/mosec) for the inference engine.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorchord%2Fmodelz-llm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftensorchord%2Fmodelz-llm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorchord%2Fmodelz-llm/lists"}