https://github.com/adesoji1/vllm-docker

How vLLM and Docker are Changing the Game for LLM Deployments
https://github.com/adesoji1/vllm-docker

docker vllm

Last synced: 4 months ago
JSON representation

How vLLM and Docker are Changing the Game for LLM Deployments

Host: GitHub
URL: https://github.com/adesoji1/vllm-docker
Owner: Adesoji1
Created: 2024-11-27T15:46:36.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-11-27T16:35:54.000Z (8 months ago)
Last Synced: 2025-01-30T09:41:50.221Z (6 months ago)
Topics: docker, vllm
Language: Python
Homepage: https://collabnix.com/how-vllm-and-docker-are-changing-the-game-for-llm-deployments/
Size: 502 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# VLLM-Docker

How vLLM and Docker are Changing the Game for LLM Deployments

## Note

float16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX 2060 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

```bash

llm = LLM(model=model_name, dtype="half")
```

### Start the application

Create a virtual environment in the root of this project, you can see the requirements.txt file [here](app/requirements.txt)

In addition, install the requirements.txt

```
pip install -r requirements.xtx
```

Now proceed to start the application with the comand below depending on whether you use python or python3

```
python main.py
```

The server runs below like the content in the image below

![Server_Screenshot](server.png)

Now Interact with the LLM By making a curl request as seen in the image below

![Server_Screenshot](client.png)

```

curl -X POST http://localhost:5000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello, how are you?"}'
```

### Accessing the Model Response

Once your app is running using `python app/main.py`, the Flask application will start a local server on `http://0.0.0.0:5000`. You can interact with it by sending a POST request to the `/chat` endpoint.

#### Example Request:

You can use `curl` to test it:

```bash
curl -X POST http://localhost:5000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello, how are you?"}'
```

#### Expected Response:

If everything is set up correctly, the model will process the input message and return a response like this:

```json
{
"response": "I am just a model, but I'm doing well. How can I assist you today?"
}
``` 🚀

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adesoji1/vllm-docker

Awesome Lists containing this project

README