Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adesoji1/vllm-docker
How vLLM and Docker are Changing the Game for LLM Deployments
https://github.com/adesoji1/vllm-docker
docker vllm
Last synced: about 1 month ago
JSON representation
How vLLM and Docker are Changing the Game for LLM Deployments
- Host: GitHub
- URL: https://github.com/adesoji1/vllm-docker
- Owner: Adesoji1
- Created: 2024-11-27T15:46:36.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-11-27T16:35:54.000Z (about 1 month ago)
- Last Synced: 2024-11-27T17:33:29.331Z (about 1 month ago)
- Topics: docker, vllm
- Language: Python
- Homepage: https://collabnix.com/how-vllm-and-docker-are-changing-the-game-for-llm-deployments/
- Size: 502 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# VLLM-Docker
How vLLM and Docker are Changing the Game for LLM Deployments
## Note
float16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX 2060 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
```bash
llm = LLM(model=model_name, dtype="half")
```### Start the application
Create a virtual environment in the root of this project, you can see the requirements.txt file [here](app/requirements.txt)
In addition, install the requirements.txt
```
pip install -r requirements.xtx
```Now proceed to start the application with the comand below depending on whether you use python or python3
```
python main.py
```The server runs below like the content in the image below
![Server_Screenshot](server.png)
Now Interact with the LLM By making a curl request as seen in the image below
![Server_Screenshot](client.png)
```
curl -X POST http://localhost:5000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello, how are you?"}'
```### Accessing the Model Response
Once your app is running using `python app/main.py`, the Flask application will start a local server on `http://0.0.0.0:5000`. You can interact with it by sending a POST request to the `/chat` endpoint.
#### Example Request:
You can use `curl` to test it:
```bash
curl -X POST http://localhost:5000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello, how are you?"}'
```#### Expected Response:
If everything is set up correctly, the model will process the input message and return a response like this:
```json
{
"response": "I am just a model, but I'm doing well. How can I assist you today?"
}
``` 🚀