https://github.com/cansik/vqa-service
VQA application that allows users to ask questions about images and receive answers.
https://github.com/cansik/vqa-service
gradio python service vqa
Last synced: 11 months ago
JSON representation
VQA application that allows users to ask questions about images and receive answers.
- Host: GitHub
- URL: https://github.com/cansik/vqa-service
- Owner: cansik
- License: mit
- Created: 2025-03-19T17:27:08.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-16T09:45:05.000Z (about 1 year ago)
- Last Synced: 2025-06-01T15:27:48.972Z (about 1 year ago)
- Topics: gradio, python, service, vqa
- Language: Python
- Homepage:
- Size: 42 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Visual Question Answering (VQA) Service
## Overview
The VQA Service is a machine learning based application that allows users to ask questions about images and receive answers.
It leverages state-of-the-art models to process images and generate accurate responses to user queries. The service is
designed to be flexible, supporting multiple backend models for different use cases.
## Installation
To install and set up the VQA Service, follow these steps:
1. **Clone the repository:**
```bash
git clone https://github.com/cansik/vqa-service.git
cd vqa-service
```
2. **Create a virtual environment:**
```bash
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```
3. **Install the required dependencies:**
```bash
pip install -r requirements.txt
```
## Usage
To start the VQA Service, use the following command:
```bash
python -m vqa
```
### Command-line Arguments
- `--host`: Specify the service host (default: `127.0.0.1`).
- `--port`: Specify the service port (default: `7840`).
- `--backend`: Choose the VQA backend model (default: `blip`).
### Example
To run the service on a specific host and port with a chosen backend, use:
```bash
python -m vqa --host 0.0.0.0 --port 8000 --backend blip2
```
### Accessing the Service
Once the service is running, you can access it through a web interface provided by Gradio. Open your web browser and
navigate to `http://:` to interact with the service.
## Supported Models
The following VLM backends are supported:
| Backend ID | Model | Description |
|------------|-------|-------------|
| `blip` | Salesforce/blip-vqa-base | BLIP base model for visual question answering |
| `blip2` | Salesforce/blip2-opt-2.7b | BLIP2 with OPT 2.7B language model |
| `blip2-flan` | Salesforce/blip2-flan-t5-xl | BLIP2 with Flan-T5-XL language model |
| `vilt` | dandelin/vilt-b32-finetuned-vqa | ViLT model fine-tuned for VQA tasks |
| `vlmmlx` | mlx-community/Qwen2-VL-2B-Instruct-4bit | Default MLX-based VLM for Apple Silicon |
| `vlmmlx-phi35` | mlx-community/Phi-3.5-vision-instruct-4bit | Phi-3.5 Vision model optimized for MLX |
| `vlmmlx-smolvlm2` | mlx-community/SmolVLM2-500M-Video-Instruct-mlx-8bit-skip-vision | SmolVLM2 optimized for MLX |
| `namo` | - | Namo VLM model |
| `moondream` | vikhyatk/moondream2 | Moondream2 model with GPU support |
| `moondream-cpu` | vikhyatk/moondream2 | Moondream2 model optimized for CPU inference |
| `smolvlm` | HuggingFaceTB/SmolVLM-256M-Instruct | Lightweight VLM model |
| `smolvlm2` | HuggingFaceTB/SmolVLM2-256M-Video-Instruct | SmolVLM2 with video instruction capabilities |
Each backend offers different capabilities and performance characteristics.