https://github.com/sammcj/vlm-ui

Web Interface for Vision Language Models Including InternVLM2
https://github.com/sammcj/vlm-ui

Last synced: 6 months ago
JSON representation

Web Interface for Vision Language Models Including InternVLM2

Host: GitHub
URL: https://github.com/sammcj/vlm-ui
Owner: sammcj
Created: 2024-07-28T21:24:37.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-07-29T09:30:51.000Z (about 1 year ago)
Last Synced: 2025-03-24T07:57:08.438Z (7 months ago)
Language: Python
Size: 1.46 MB
Stars: 20
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# VLM UI

VLM UI is a web-based user interface for interacting with various Vision Language Models (VLMs).

It provides a convenient way to upload images, ask questions, and receive responses from the model.

![VLM UI Screenshot](screenshot.png)

## Features

- Web-based interface using Gradio
- Support for multiple VLM models
- Image upload and processing
- Real-time streaming responses
- Dockerised deployment

## Prerequisites

- Docker
- NVIDIA GPU with CUDA support (for running models)

## Quick Start

1. Clone the repository:
```shell
git clone --depth=1 https://github.com/sammcj/vlm-ui.git
cd vlm-ui
```

2. Build and run the Docker container:
```shell
docker build -t vlm-ui .
docker run -d --gpus all -p 7860:7860 -e MODEL_NAME=OpenGVLab/InternVL2-8B vlm-ui
```

3. Open your browser and navigate to `http://localhost:7860` to access the VLM UI.

## Configuration

You can customize the behaviour of VLM UI by setting the following environment variables:

- `SYSTEM_MESSAGE`: The system message to use for the conversation (default: "Carefully follow the users request.")
- `TEMPERATURE`: Controls randomness in the model's output (default: 0.3)
- `TOP_P`: Controls diversity of the model's output (default: 0.7)
- `MAX_NEW_TOKENS`: Maximum number of tokens to generate (default: 2048)
- `MAX_INPUT_TILES`: Maximum number of image tiles to process (default: 12)
- `REPETITION_PENALTY`: Penalizes repetition in the model's output (default: 1.0)
- `MODEL_NAME`: The name of the model to use (default: OpenGVLab/InternVL2-8B)
- `LOAD_IN_8BIT`: Whether to load the model in 8-bit precision (default: 1)

Example:

```shell
docker run -d --gpus all -p 7860:7860 \
-e MODEL_NAME=OpenGVLab/InternVL2-8B \
-e TEMPERATURE=0.3 \
-e MAX_NEW_TOKENS=2048 \
vlm-ui
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

- Copyright Sam McLeod
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgements

This app builds on the work of the following projects:

- [haotian-liu/LLaVA](https://github.com/haotian-liu/LLaVA)
- [OpenGVLab/InternVL](https://github.com/OpenGVLab/InternVL)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sammcj/vlm-ui

Awesome Lists containing this project

README