An open API service indexing awesome lists of open source software.

https://github.com/theoddysey/visual-answering-transformers-model

An Intelligent Image Analysis Application using BLIP and Streamlit
https://github.com/theoddysey/visual-answering-transformers-model

blip huggingface-transformers image-processing image-to-text python3 streamlit

Last synced: 6 months ago
JSON representation

An Intelligent Image Analysis Application using BLIP and Streamlit

Awesome Lists containing this project

README

          


πŸ“Έ Visual-Answering-Transformers-Model


An intelligent image analysis application using BLIP and Streamlit


Upload an image, ask questions, and get AI-powered answers instantly.


Python
Streamlit
Hugging Face


Project Banner


---

## πŸ“‹ Table of Contents

1. πŸ€– [Introduction](#introduction)
2. βš™οΈ [Tech Stack](#tech-stack)
3. πŸ”‹ [Features](#features)
4. πŸš€ [Quick Start](#quick-start)
5. πŸ•ΈοΈ [Code Snippets](#snippets)
6. πŸ”— [Links](#links)
7. πŸ“Œ [More](#more)

---

## πŸ€– Introduction

The **AI Image Question Answering** project is a **deep learning-powered web application** designed to process images and provide meaningful answers to user-posed questions. Built with **Streamlit**, this app leverages **Salesforce’s BLIP (Bootstrapped Language-Image Pretraining) model** to extract relevant insights from images.

Key functionalities include:

- **Uploading images (JPEG, PNG) for analysis.**
- **Asking both predefined and custom questions about the image.**
- **Generating AI-based answers using the BLIP model.**
- **Providing real-time inference with GPU acceleration (if available).**
- **Delivering a seamless and interactive user experience with Streamlit.**

Whether you're a researcher, developer, or AI enthusiast, this project serves as an excellent **introduction to multimodal AI** and **visual question answering (VQA) applications**.

---

## βš™οΈ Tech Stack

- **Python**
- **Streamlit**
- **Hugging Face Transformers**
- **PyTorch**
- **PIL (Pillow)**

---

## πŸ”‹ Features

πŸ‘‰ **AI-Powered Image Question Answering**: Upload an image and ask any question related to its content.

πŸ‘‰ **Predefined Questions for Instant Insights**: Select from a list of commonly asked questions for a quick analysis.

πŸ‘‰ **Custom Question Input**: Type your own question to get AI-generated responses tailored to your query.

πŸ‘‰ **Real-Time Processing with AI Feedback**: Experience **instantaneous results** with **dynamic loading indicators** while the AI processes your request.

πŸ‘‰ **Seamless Streamlit UI**: Intuitive **drag-and-drop** image upload and interactive **question submission** for smooth user experience.

πŸ‘‰ **Optimized for GPU Acceleration**: The model runs efficiently on CUDA-enabled devices for **faster inference**.

---

## πŸš€ Quick Start

Follow these steps to set up and run the project on your local machine.

### **Prerequisites**

Ensure you have the following installed:

- [Python 3.8+](https://www.python.org/downloads/)
- [pip](https://pip.pypa.io/en/stable/installation/)

### **Cloning the Repository**

```bash
git clone https://github.com/TheODDYSEY/Visual-Answering-Transformers-Model.git
cd ai-image-question-answering
```

### **Installation**

Install all required dependencies using:

```bash
pip install -r requirements.txt
```

### **Running the Project**

```bash
streamlit run streamlit_app.py
```

Open **[http://localhost:8501](http://localhost:8501)** in your browser to interact with the application.

---

## πŸ•ΈοΈ Code Snippets

### **1️⃣ Loading the BLIP Model**

```python
from transformers import BlipProcessor, BlipForQuestionAnswering
import torch

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

processor = BlipProcessor.from_pretrained("Salesforce/blip-vqa-base")
model = BlipForQuestionAnswering.from_pretrained("Salesforce/blip-vqa-base").to(DEVICE)
```

### **2️⃣ Processing Images and Questions**

```python
def get_answer(image, question):
"""Processes an image and question to return an AI-generated response."""
inputs = processor(image, question, return_tensors="pt").to(DEVICE)
output = model.generate(**inputs)
return processor.decode(output[0], skip_special_tokens=True)
```

### **3️⃣ Implementing Streamlit UI**

```python
import streamlit as st
from PIL import Image

st.title("πŸ“Έ AI Image Question Answering")

uploaded_image = st.file_uploader("Upload an image", type=["jpg", "png"])

if uploaded_image is not None:
image = Image.open(uploaded_image)
st.image(image, caption="Uploaded Image", use_column_width=True)

question = st.text_input("Ask a question about the image:")
if st.button("Get Answer"):
answer = get_answer(image, question)
st.success(f"**Q:** {question}")
st.write(f"**A:** {answer}")
```

---

## πŸ”— Links

- πŸ”— **Live Demo**: [Try it here](https://theoddysey-visual-answering-transformers-m-streamlit-app-vimg1u.streamlit.app/)
- πŸ“œ **Project Repository**: [GitHub](https://github.com/TheODDYSEY/Visual-Answering-Transformers-Model.git)
- πŸ“š **BLIP Model**: [Hugging Face](https://huggingface.co/Salesforce/blip-vqa-base)

---

## πŸ“Œ More

πŸ”Ή **Future Enhancements**
- βœ… Improve model inference speed for real-time responses.
- βœ… Extend support for **OCR-based text recognition**.
- βœ… Enhance UI with **interactive visualization** options.

## **License**
This project is **open-source** under the **MIT License**.

---