https://github.com/mapluisch/llava-websocket-server
Python-based WebSocket for CLI LLaVA inference.
https://github.com/mapluisch/llava-websocket-server
inference llama llama2 llava llm llm-inference python websocket websockets
Last synced: 3 months ago
JSON representation
Python-based WebSocket for CLI LLaVA inference.
- Host: GitHub
- URL: https://github.com/mapluisch/llava-websocket-server
- Owner: mapluisch
- License: mit
- Created: 2023-11-29T11:47:53.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-30T14:35:18.000Z (over 2 years ago)
- Last Synced: 2024-10-18T21:59:14.672Z (over 1 year ago)
- Topics: inference, llama, llama2, llava, llm, llm-inference, python, websocket, websockets
- Language: Python
- Homepage:
- Size: 26.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLaVA-WebSocket
Python-based WebSocket for CLI LLaVA inference. The WebSocket server receives prompts and images from other clients and returns the CLI-inference results via the socket connection.
## Overview
This project is a quick and simple implementation of LLaVA ([Website](https://llava-vl.github.io/), [GitHub](https://github.com/haotian-liu/LLaVA)) CLI-based inference via a Python WebSocket. It is based on LLaVA's own `cli.py`, adding WebSocket capabilites.
When running `python llava-websocket.py`, the checkpoint shards are loaded and stay in cache while a WebSocket server is started. Clients in your local network can send new prompts and images for inference without having to re-load checkpoint shards and without using Gradio.
This project, in its current form, is not designed for conversation, but rather for one-time prompt processing, enabling the inference of various images and prompts based on individual requests. I might add conversation capabilities in the near future.
This project was tested on Ubuntu.
## Setup
You should follow the LLaVA tutorial, so that you have the pretrained model / checkpoint shards ready. Then, put my script into your LLaVA directory and start it while in the LLaVA conda-environment (`conda activate llava`).
## Usage
```
python llava-websocket.py [ARGS]
```
### Arguments
Given that this project is based on LLaVA's `cli.py`, the following base arguments can be specified:
```
--model-path, default="liuhaotian/llava-v1.5-13b"
--model-base, default=None
--device, default="cuda"
--conv-mode, default=None
--temperature, default=0.2
--max-new-tokens, default=512
--load-8bit, action="store_true"
--load-4bit, action="store_true"
--debug, action="store_true"
```
Additionally added args:
```
--port, default=1995
--verbose, action="store_true"
--json, action="store_true"
```
Using `--port [int]`, you can specify your own WebSocket port.
Using `--verbose`, you will receive verbose output on the server-side console (WebSocket connection info, transmitted inference results).
Using `--json`, the WebSocket responses will be formatted as JSON, containing a timestamp and the inference result:
```json
{
"time": "11:48:52.632415",
"result": "The image features a wooden pier (...)"
}
```
## WebSocket Communication
In your local network, clients can access the WebSocket via `ws://[your-ip]:1995`. Specify your own port when calling the python script via `--port [int]`.
You can probably also make the WebSocket accessible from outside of your LAN via port forwarding.
The WebSocket server waits for a JSON object:
```json
{
"prompt": "Here goes your prompt (e.g., describe this image.)",
"image": "URL to an image file, or BASE64-encoded string of an image file"
}
```
and returns the written inference result via the socket connection.
### Demo
https://github.com/mapluisch/LLaVA-WebSocket-Server/assets/31780571/db3639e8-1aa8-4be3-8fb5-3ab9ad0e27d0
The input image is LLaVA's test image:
(Source: https://llava-vl.github.io/static/images/view.jpg)
## Disclaimer
This project is a prototype and serves as a basic example of using a WebSocket for LLaVA's CLI inference. Feel free to create a PR.