Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/llukas22/vllm-haystack-adapter

Simply connect your haystack pipeline to an vLLM-API server
https://github.com/llukas22/vllm-haystack-adapter

Last synced: 19 days ago
JSON representation

Simply connect your haystack pipeline to an vLLM-API server

Awesome Lists containing this project

README

        

# vLLM-haystack-adapter
[![PyPI - Version](https://img.shields.io/pypi/v/vllm-haystack.svg)](https://pypi.org/project/vllm-haystack)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/vllm-haystack.svg)](https://pypi.org/project/vllm-haystack)

Simply use [vLLM](https://github.com/vllm-project/vllm) in your haystack pipeline, to utilize fast, self-hosted LLMs.


vLLM

Haystack

## Installation
Install the wrapper via pip: `pip install vllm-haystack`

## Usage
This integration provides two invocation layers:
- `vLLMInvocationLayer`: To use models hosted on a vLLM server (or any other OpenAI compatible server)
- `vLLMLocalInvocationLayer`: To use locally hosted vLLM models

### Use a Model Hosted on a vLLM Server
To utilize the wrapper the `vLLMInvocationLayer` has to be used.

Here is a simple example of how a `PromptNode` can be created with the wrapper.
```python
from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMInvocationLayer

model = PromptModel(model_name_or_path="", invocation_layer_class=vLLMInvocationLayer, max_length=256, api_key="EMPTY", model_kwargs={
"api_base" : API, # Replace this with your API-URL
"maximum_context_length": 2048,
})

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)
```
The model will be inferred based on the model served on the vLLM server.
For more configuration examples, take a look at the unit-tests.

#### Hosting a vLLM Server

To create an *OpenAI-Compatible Server* via vLLM you can follow the steps in the
Quickstart section of their [documentation](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html#openai-compatible-server).

### Use a Model Hosted Locally
⚠️To run `vLLM` locally you need to have `vllm` installed and a supported GPU.

If you don't want to use an API-Server this wrapper also provides a `vLLMLocalInvocationLayer` which executes the vLLM on the same node Haystack is running on.

Here is a simple example of how a `PromptNode` can be created with the `vLLMLocalInvocationLayer`.
```python
from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMLocalInvocationLayer

model = PromptModel(model_name_or_path=MODEL, invocation_layer_class=vLLMLocalInvocationLayer, max_length=256, model_kwargs={
"maximum_context_length": 2048,
})

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)
```