Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bentoml/OpenLLM
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint, locally and in the cloud.
https://github.com/bentoml/OpenLLM
ai bentoml falcon fine-tuning llama llama2 llm llm-inference llm-ops llm-serving llmops mistral ml mlops model-inference mpt open-source-llm openllm stablelm vicuna
Last synced: 23 days ago
JSON representation
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint, locally and in the cloud.
- Host: GitHub
- URL: https://github.com/bentoml/OpenLLM
- Owner: bentoml
- License: apache-2.0
- Created: 2023-04-19T00:27:52.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-13T01:36:10.000Z (7 months ago)
- Last Synced: 2024-04-14T06:06:05.896Z (7 months ago)
- Topics: ai, bentoml, falcon, fine-tuning, llama, llama2, llm, llm-inference, llm-ops, llm-serving, llmops, mistral, ml, mlops, model-inference, mpt, open-source-llm, openllm, stablelm, vicuna
- Language: Python
- Homepage: https://bentoml.com
- Size: 38.3 MB
- Stars: 8,657
- Watchers: 51
- Forks: 541
- Open Issues: 83
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Citation: CITATION.cff
- Codeowners: .github/CODEOWNERS
- Security: .github/SECURITY.md
Awesome Lists containing this project
- Awesome-LLM-Productization - OpenLLM from BentoML - an open-source platform designed to facilitate the deployment and operation of large language models (LLMs) in real-world applications. (Models and Tools / LLM Deployment)
- Self-Hosting-Guide - OpenLLM - tune, serve, deploy, and monitor any LLMs with ease. (Tools for Self-Hosting / LLMs)
- awesome-langchain-zh - OpenLLM
- awesome-llm-list - OpenLLM
- awesome-llmops - OpenLLM - tune, serve, deploy, and monitor any LLMs with ease. | ![GitHub Badge](https://img.shields.io/github/stars/bentoml/OpenLLM.svg?style=flat-square) | (Large Scale Deployment / ML Platforms)
- awesome-langchain - OpenLLM - tune, serve, deploy, and monitor any LLMs with ease using OpenLLM. ![GitHub Repo stars](https://img.shields.io/github/stars/bentoml/OpenLLM?style=social) (Other LLM Frameworks / Videos Playlists)
- StarryDivineSky - bentoml/OpenLLM
- Awesome-LLM - OpenLLM - Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at [BentoML](https://bentoml.com/) for LLMs-based applications. (LLM Deployment)
- awesome-local-llms - OpenLLM - source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud. | 9,961 | 635 | 21 | 31 | 131 | Apache License 2.0 | 3 days, 8 hrs, 51 mins | (Open-Source Local LLM Projects)
- awesome-production-machine-learning - OpenLLM - OpenLLM allows developers to run any open-source LLMs (Llama 3.1, Qwen2, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command. (Deployment and Serving)
- AiTreasureBox - bentoml/OpenLLM - 11-13_10043_7](https://img.shields.io/github/stars/bentoml/OpenLLM.svg)|An open platform for operating large language models (LLMs) in production. Fine-tune, serve, deploy, and monitor any LLMs with ease.| (Repos)
- Awesome-LLM-RAG-Application - OpenLLM
- awesome-ai-papers - [OpenLLM - llm](https://github.com/mlc-ai/mlc-llm)\]\[[ollama](https://github.com/jmorganca/ollama)\]\[[open-webui](https://github.com/open-webui/open-webui)\]\[[torchchat](https://github.com/pytorch/torchchat)\] (NLP / 3. Pretraining)
- awesome-ai-papers - [OpenLLM - llm](https://github.com/mlc-ai/mlc-llm)\]\[[ollama](https://github.com/jmorganca/ollama)\]\[[open-webui](https://github.com/open-webui/open-webui)\]\[[torchchat](https://github.com/pytorch/torchchat)\] (NLP / 3. Pretraining)
- awesome-LLM-resourses - OpenLLM - source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud. (推理 Inference)
- awesome-homelab - OpenLLM - source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud. | (Apps / AI)
- awesome-ai-data-github-repos - OpenLLM: An open platform for operating large language models (LLMs) in production
- awesome-ai-data-github-repos - OpenLLM: An open platform for operating large language models (LLMs) in production
README
# 🦾 OpenLLM: Self-Hosting LLMs Made Easy
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202-green.svg)](https://github.com/bentoml/OpenLLM/blob/main/LICENSE)
[![Releases](https://img.shields.io/pypi/v/openllm.svg?logo=pypi&label=PyPI&logoColor=gold)](https://pypi.org/project/openllm)
[![CI](https://results.pre-commit.ci/badge/github/bentoml/OpenLLM/main.svg)](https://results.pre-commit.ci/latest/github/bentoml/OpenLLM/main)
[![X](https://badgen.net/badge/icon/@bentomlai/000000?icon=twitter&label=Follow)](https://twitter.com/bentomlai)
[![Community](https://badgen.net/badge/icon/Community/562f5d?icon=slack&label=Join)](https://l.bentoml.com/join-slack)OpenLLM allows developers to run **any open-source LLMs** (Llama 3.2, Qwen2.5, Phi3 and [more](#supported-models)) or **custom models** as **OpenAI-compatible APIs** with a single command. It features a [built-in chat UI](#chat-ui), state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Docker, Kubernetes, and [BentoCloud](#deploy-to-bentocloud).
Understand the [design philosophy of OpenLLM](https://www.bentoml.com/blog/from-ollama-to-openllm-running-llms-in-the-cloud).
## Get Started
Run the following commands to install OpenLLM and explore it interactively.
```bash
pip install openllm # or pip3 install openllm
openllm hello
```![hello](https://github.com/user-attachments/assets/5af19f23-1b34-4c45-b1e0-a6798b4586d1)
## Supported models
OpenLLM supports a wide range of state-of-the-art open-source LLMs. You can also add a [model repository to run custom models](#set-up-a-custom-repository) with OpenLLM.
| Model | Parameters | Quantization | Required GPU | Start a Server |
| ---------------- | ---------- | ------------ | ------------- | ----------------------------------- |
| Llama 3.1 | 8B | - | 24G | `openllm serve llama3.1:8b` |
| Llama 3.1 | 8B | AWQ 4bit | 12G | `openllm serve llama3.1:8b-4bit` |
| Llama 3.1 | 70B | AWQ 4bit | 80G | `openllm serve llama3.1:70b-4bit` |
| Llama 3.2 | 1B | - | 12G | `openllm serve llama3.2:1b` |
| Llama 3.2 | 3B | - | 12G | `openllm serve llama3.2:3b` |
| Llama 3.2 Vision | 11B | - | 80G | `openllm serve llama3.2:11b-vision` |
| Mistral | 7B | - | 24G | `openllm serve mistral:7b` |
| Qwen 2.5 | 1.5B | - | 12G | `openllm serve qwen2.5:1.5b` |
| Gemma 2 | 9B | - | 24G | `openllm serve gemma2:9b` |
| Phi3 | 3.8B | - | 12G | `openllm serve phi3:3.8b` |...
For the full model list, see the [OpenLLM models repository](https://github.com/bentoml/openllm-models).
## Start an LLM server
To start an LLM server locally, use the `openllm serve` command and specify the model version.
> [!NOTE]
> OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models.
> 1. Create your Hugging Face token [here](https://huggingface.co/settings/tokens).
> 2. Request access to the gated model, such as [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B).
> 3. Set your token as an environment variable by running:
> ```bash
> export HF_TOKEN=
> ``````bash
openllm serve llama3:8b
```The server will be accessible at [http://localhost:3000](http://localhost:3000/), providing OpenAI-compatible APIs for interaction. You can call the endpoints with different frameworks and tools that support OpenAI-compatible APIs. Typically, you may need to specify the following:
- **The API host address**: By default, the LLM is hosted at [http://localhost:3000](http://localhost:3000/).
- **The model name:** The name can be different depending on the tool you use.
- **The API key**: The API key used for client authentication. This is optional.Here are some examples:
OpenAI Python client
```python
from openai import OpenAIclient = OpenAI(base_url='http://localhost:3000/v1', api_key='na')
# Use the following func to get the available models
# model_list = client.models.list()
# print(model_list)chat_completion = client.chat.completions.create(
model="meta-llama/Meta-Llama-3-8B-Instruct",
messages=[
{
"role": "user",
"content": "Explain superconductors like I'm five years old"
}
],
stream=True,
)
for chunk in chat_completion:
print(chunk.choices[0].delta.content or "", end="")
```LlamaIndex
```python
from llama_index.llms.openai import OpenAIllm = OpenAI(api_bese="http://localhost:3000/v1", model="meta-llama/Meta-Llama-3-8B-Instruct", api_key="dummy")
...
```## Chat UI
OpenLLM provides a chat UI at the `/chat` endpoint for the launched LLM server at http://localhost:3000/chat.
## Chat with a model in the CLI
To start a chat conversation in the CLI, use the `openllm run` command and specify the model version.
```bash
openllm run llama3:8b
```## Model repository
A model repository in OpenLLM represents a catalog of available LLMs that you can run. OpenLLM provides a default model repository that includes the latest open-source LLMs like Llama 3, Mistral, and Qwen2, hosted at [this GitHub repository](https://github.com/bentoml/openllm-models). To see all available models from the default and any added repository, use:
```bash
openllm model list
```To ensure your local list of models is synchronized with the latest updates from all connected repositories, run:
```bash
openllm repo update
```To review a model’s information, run:
```bash
openllm model get llama3:8b
```### Add a model to the default model repository
You can contribute to the default model repository by adding new models that others can use. This involves creating and submitting a Bento of the LLM. For more information, check out this [example pull request](https://github.com/bentoml/openllm-models/pull/1).
### Set up a custom repository
You can add your own repository to OpenLLM with custom models. To do so, follow the format in the default OpenLLM model repository with a `bentos` directory to store custom LLMs. You need to [build your Bentos with BentoML](https://docs.bentoml.com/en/latest/guides/build-options.html) and submit them to your model repository.
First, prepare your custom models in a `bentos` directory following the guidelines provided by [BentoML to build Bentos](https://docs.bentoml.com/en/latest/guides/build-options.html). Check out the [default model repository](https://github.com/bentoml/openllm-repo) for an example and read the [Developer Guide](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md) for details.
Then, register your custom model repository with OpenLLM:
```bash
openllm repo add
```**Note**: Currently, OpenLLM only supports adding public repositories.
## Deploy to BentoCloud
OpenLLM supports LLM cloud deployment via BentoML, the unified model serving framework, and BentoCloud, an AI inference platform for enterprise AI teams. BentoCloud provides fully-managed infrastructure optimized for LLM inference with autoscaling, model orchestration, observability, and many more, allowing you to run any AI model in the cloud.
[Sign up for BentoCloud](https://www.bentoml.com/) for free and [log in](https://docs.bentoml.com/en/latest/bentocloud/how-tos/manage-access-token.html). Then, run `openllm deploy` to deploy a model to BentoCloud:
```bash
openllm deploy llama3:8b
```> [!NOTE]
> If you are deploying a gated model, make sure to set HF_TOKEN in enviroment variables.Once the deployment is complete, you can run model inference on the BentoCloud console:
## Community
OpenLLM is actively maintained by the BentoML team. Feel free to reach out and join us in our pursuit to make LLMs more accessible and easy to use 👉 [Join our Slack community!](https://l.bentoml.com/join-slack)
## Contributing
As an open-source project, we welcome contributions of all kinds, such as new features, bug fixes, and documentation. Here are some of the ways to contribute:
- Repost a bug by [creating a GitHub issue](https://github.com/bentoml/OpenLLM/issues/new/choose).
- [Submit a pull request](https://github.com/bentoml/OpenLLM/compare) or help review other developers’ [pull requests](https://github.com/bentoml/OpenLLM/pulls).
- Add an LLM to the OpenLLM default model repository so that other users can run your model. See the [pull request template](https://github.com/bentoml/openllm-models/pull/1).
- Check out the [Developer Guide](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md) to learn more.## Acknowledgements
This project uses the following open-source projects:
- [bentoml/bentoml](https://github.com/bentoml/bentoml) for production level model serving
- [vllm-project/vllm](https://github.com/vllm-project/vllm) for production level LLM backend
- [blrchen/chatgpt-lite](https://github.com/blrchen/chatgpt-lite) for a fancy Web Chat UI
- [chujiezheng/chat_templates](https://github.com/chujiezheng/chat_templates)
- [astral-sh/uv](https://github.com/astral-sh/uv) for blazing fast model requirements installingWe are grateful to the developers and contributors of these projects for their hard work and dedication.