Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vtuber-plan/langport
Langport is a language model inference service
https://github.com/vtuber-plan/langport
api chatgpt chatgpt-api fauxpilot langchain language-model llama llama-cpp llm openai tabby
Last synced: 1 day ago
JSON representation
Langport is a language model inference service
- Host: GitHub
- URL: https://github.com/vtuber-plan/langport
- Owner: vtuber-plan
- License: mit
- Created: 2023-05-10T14:22:13.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-09T23:52:24.000Z (5 months ago)
- Last Synced: 2024-10-18T21:59:33.419Z (3 months ago)
- Topics: api, chatgpt, chatgpt-api, fauxpilot, langchain, language-model, llama, llama-cpp, llm, openai, tabby
- Language: Python
- Homepage:
- Size: 872 KB
- Stars: 93
- Watchers: 6
- Forks: 12
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
LangPort is a open-source large language model serving platform.
Our goal is to build a super fast LLM inference service.This project is inspired by [lmsys/fastchat](https://github.com/lm-sys/FastChat), we hope that the serving platform is lightweight and fast, but fastchat includes other features such as training and evaluation make it complicated.
The core features include:
- Huggingface transformers support.
- ggml (llama.cpp) support.
- A distributed serving system for state-of-the-art models.
- Streaming generation support with various decoding strategies.
- Batch inference for higher throughput.
- Support for encoder-only, decoder-only and encoder-decoder models.
- OpenAI-compatible RESTful APIs.
- FauxPilot-compatible RESTful APIs.
- HuggingFace-compatible RESTful APIs.
- Tabby-compatible RESTful APIs.## Support Model Architectures
* LLaMa, LLaMa2, GLM, Bloom, OPT, GPT2, GPT Neo, GPT Big Code and so on.## Tested Models
* NingYu, LLaMa, LLaMa2, Vicuna, ChatGLM, ChatGLM2, Falcon, Starcoder, WizardLM, InternLM, OpenBuddy, FireFly, CodeGen, Phoenix, RWKV, StableLM and so on.## News
- [2024/01/13] Introduce the `ChatProto`.
- [2023/08/04] Dynamic batch inference.
- [2023/07/16] Support int4 quantization.
- [2023/07/13] Support generation logprobs parameter.
- [2023/06/18] Add ggml (llama.cpp gpt.cpp starcoder.cpp etc.) worker support.
- [2023/06/09] Add LLama.cpp worker support.
- [2023/06/01] Add HuggingFace Bert embedding worker support.
- [2023/06/01] Add HuggingFace text generation API support.
- [2023/06/01] Add tabby API support.
- [2023/05/23] Add chat throughput test script.
- [2023/05/22] New distributed architecture.
- [2023/05/14] Batch inference supported.
- [2023/05/10] Langport project started.## Install
### Method 1: With pip
```bash
pip install langport
```or:
```bash
pip install git+https://github.com/vtuber-plan/langport.git
```If you need ggml generation worker, use this command:
```bash
pip install langport[ggml]
```If you want to use GPU:
```bash
CT_CUBLAS=1 pip install langport[ggml]
```### Method 2: From source
1. Clone this repository
```bash
git clone https://github.com/vtuber-plan/langport.git
cd langport
```2. Install the Package
```bash
pip install --upgrade pip
pip install -e .
```## Quick start
It is simple to start a local chat API service:First, start a worker process in the terminal:
``` bash
python -m langport.service.server.generation_worker --port 21001 --model-path
```Then, start a API service in another terminal:
``` bash
python -m langport.service.gateway.openai_api
```
Now, you can use the inference API by openai protocol.## Start the server
It is simple to start a single node chat API service:
``` bash
python -m langport.service.server.generation_worker --port 21001 --model-path
python -m langport.service.gateway.openai_api
```If you need a single node embeddings API server:
```bash
python -m langport.service.server.embedding_worker --port 21002 --model-path bert-base-chinese --gpus 0 --num-gpus 1
python -m langport.service.gateway.openai_api --port 8000 --controller-address http://localhost:21002
```If you need the embeddings API or other features, you can deploy a distributed inference cluster:
``` bash
python -m langport.service.server.dummy_worker --port 21001
python -m langport.service.server.generation_worker --model-path --neighbors http://localhost:21001
python -m langport.service.server.embedding_worker --model-path --neighbors http://localhost:21001
python -m langport.service.gateway.openai_api --controller-address http://localhost:21001
```In practice, the gateway can connect to any node to distribute inference tasks:
``` bash
python -m langport.service.server.dummy_worker --port 21001
python -m langport.service.server.generation_worker --port 21002 --model-path --neighbors http://localhost:21001
python -m langport.service.server.generation_worker --port 21003 --model-path --neighbors http://localhost:21001 http://localhost:21002
python -m langport.service.server.generation_worker --port 21004 --model-path --neighbors http://localhost:21001 http://localhost:21003
python -m langport.service.server.generation_worker --port 21005 --model-path --neighbors http://localhost:21001 http://localhost:21004
python -m langport.service.gateway.openai_api --controller-address http://localhost:21003 # 21003 is OK!
python -m langport.service.gateway.openai_api --controller-address http://localhost:21002 # Any worker is also OK!
```Run text generation with multi GPUs:
``` bash
python -m langport.service.server.generation_worker --port 21001 --model-path --gpus 0,1 --num-gpus 2
python -m langport.service.gateway.openai_api
```Run text generation with ggml worker:
```bash
python -m langport.service.server.ggml_generation_worker --port 21001 --model-path --gpu-layers
```Run OpenAI forward server:
```bash
python -m langport.service.server.chatgpt_generation_worker --port 21001 --api-url --api-key
```## License
langport is released under the Apache Software License.
## See also
- [langport-docs](https://github.com/vtuber-plan/langport/tree/main/docs)
- [langport-source](https://github.com/vtuber-plan/langport)## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=vtuber-plan/langport&type=Date)](https://star-history.com/#vtuber-plan/langport&Date)