Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/llukas22/vllm-haystack-adapter
Simply connect your haystack pipeline to an vLLM-API server
https://github.com/llukas22/vllm-haystack-adapter
Last synced: 19 days ago
JSON representation
Simply connect your haystack pipeline to an vLLM-API server
- Host: GitHub
- URL: https://github.com/llukas22/vllm-haystack-adapter
- Owner: LLukas22
- License: mit
- Created: 2023-09-07T07:51:44.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-04T15:23:10.000Z (12 months ago)
- Last Synced: 2024-10-18T00:45:54.591Z (30 days ago)
- Language: Python
- Size: 28.3 KB
- Stars: 6
- Watchers: 1
- Forks: 3
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# vLLM-haystack-adapter
[![PyPI - Version](https://img.shields.io/pypi/v/vllm-haystack.svg)](https://pypi.org/project/vllm-haystack)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/vllm-haystack.svg)](https://pypi.org/project/vllm-haystack)Simply use [vLLM](https://github.com/vllm-project/vllm) in your haystack pipeline, to utilize fast, self-hosted LLMs.
## Installation
Install the wrapper via pip: `pip install vllm-haystack`## Usage
This integration provides two invocation layers:
- `vLLMInvocationLayer`: To use models hosted on a vLLM server (or any other OpenAI compatible server)
- `vLLMLocalInvocationLayer`: To use locally hosted vLLM models### Use a Model Hosted on a vLLM Server
To utilize the wrapper the `vLLMInvocationLayer` has to be used.Here is a simple example of how a `PromptNode` can be created with the wrapper.
```python
from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMInvocationLayermodel = PromptModel(model_name_or_path="", invocation_layer_class=vLLMInvocationLayer, max_length=256, api_key="EMPTY", model_kwargs={
"api_base" : API, # Replace this with your API-URL
"maximum_context_length": 2048,
})prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)
```
The model will be inferred based on the model served on the vLLM server.
For more configuration examples, take a look at the unit-tests.#### Hosting a vLLM Server
To create an *OpenAI-Compatible Server* via vLLM you can follow the steps in the
Quickstart section of their [documentation](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html#openai-compatible-server).### Use a Model Hosted Locally
⚠️To run `vLLM` locally you need to have `vllm` installed and a supported GPU.If you don't want to use an API-Server this wrapper also provides a `vLLMLocalInvocationLayer` which executes the vLLM on the same node Haystack is running on.
Here is a simple example of how a `PromptNode` can be created with the `vLLMLocalInvocationLayer`.
```python
from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMLocalInvocationLayermodel = PromptModel(model_name_or_path=MODEL, invocation_layer_class=vLLMLocalInvocationLayer, max_length=256, model_kwargs={
"maximum_context_length": 2048,
})prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)
```