https://github.com/llukas22/vllm-haystack-adapter

Simply connect your haystack pipeline to an vLLM-API server
https://github.com/llukas22/vllm-haystack-adapter

Last synced: 6 months ago
JSON representation

Simply connect your haystack pipeline to an vLLM-API server

Host: GitHub
URL: https://github.com/llukas22/vllm-haystack-adapter
Owner: LLukas22
License: mit
Created: 2023-09-07T07:51:44.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-12-04T15:23:10.000Z (over 1 year ago)
Last Synced: 2024-10-18T00:45:54.591Z (8 months ago)
Language: Python
Size: 28.3 KB
Stars: 6
Watchers: 1
Forks: 3
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # vLLM-haystack-adapter

[![PyPI - Version](https://img.shields.io/pypi/v/vllm-haystack.svg)](https://pypi.org/project/vllm-haystack)

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/vllm-haystack.svg)](https://pypi.org/project/vllm-haystack)

Simply use [vLLM](https://github.com/vllm-project/vllm) in your haystack pipeline, to utilize fast, self-hosted LLMs. 



    

    

        

    



## Installation

Install the wrapper via pip:  `pip install vllm-haystack`

## Usage

This integration provides two invocation layers:

- `vLLMInvocationLayer`: To use models hosted on a vLLM server (or any other OpenAI compatible server)

- `vLLMLocalInvocationLayer`: To use locally hosted vLLM models

### Use a Model Hosted on a vLLM Server

To utilize the wrapper the `vLLMInvocationLayer` has to be used. 

Here is a simple example of how a `PromptNode` can be created with the wrapper.

```python

from haystack.nodes import PromptNode, PromptModel

from vllm_haystack import vLLMInvocationLayer

model = PromptModel(model_name_or_path="", invocation_layer_class=vLLMInvocationLayer, max_length=256, api_key="EMPTY", model_kwargs={

        "api_base" : API, # Replace this with your API-URL

        "maximum_context_length": 2048,

    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

```

The model will be inferred based on the model served on the vLLM server.

For more configuration examples, take a look at the unit-tests.

#### Hosting a vLLM Server

To create an *OpenAI-Compatible Server* via vLLM you can follow the steps in the 

Quickstart section of their [documentation](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html#openai-compatible-server).

### Use a Model Hosted Locally

⚠️To run `vLLM` locally you need to have `vllm` installed and a supported GPU.

If you don't want to use an API-Server this wrapper also provides a `vLLMLocalInvocationLayer` which executes the vLLM on the same node Haystack is running on. 

Here is a simple example of how a `PromptNode` can be created with the `vLLMLocalInvocationLayer`.

```python

from haystack.nodes import PromptNode, PromptModel

from vllm_haystack import vLLMLocalInvocationLayer

model = PromptModel(model_name_or_path=MODEL, invocation_layer_class=vLLMLocalInvocationLayer, max_length=256, model_kwargs={

        "maximum_context_length": 2048,

    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/llukas22/vllm-haystack-adapter

Awesome Lists containing this project

README