https://github.com/theoddysey/visual-answering-transformers-model

An Intelligent Image Analysis Application using BLIP and Streamlit
https://github.com/theoddysey/visual-answering-transformers-model

blip huggingface-transformers image-processing image-to-text python3 streamlit

Last synced: 6 months ago
JSON representation

An Intelligent Image Analysis Application using BLIP and Streamlit

Host: GitHub
URL: https://github.com/theoddysey/visual-answering-transformers-model
Owner: TheODDYSEY
Created: 2025-02-07T10:07:52.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-02-11T14:52:27.000Z (8 months ago)
Last Synced: 2025-02-11T15:37:55.383Z (8 months ago)
Topics: blip, huggingface-transformers, image-processing, image-to-text, python3, streamlit
Language: Jupyter Notebook
Homepage: https://theoddysey-visual-answering-transformers-m-streamlit-app-vimg1u.streamlit.app/
Size: 47 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          
 

  📸 Visual-Answering-Transformers-Model   

  An intelligent image analysis application using BLIP and Streamlit  

  Upload an image, ask questions, and get AI-powered answers instantly.
  

  

      

      

      

  
  

    

      

    

  
  

  

---

## 📋 Table of Contents

1. 🤖 [Introduction](#introduction)  

2. ⚙️ [Tech Stack](#tech-stack)  

3. 🔋 [Features](#features)  

4. 🚀 [Quick Start](#quick-start)  

5. 🕸️ [Code Snippets](#snippets)  

6. 🔗 [Links](#links)  

7. 📌 [More](#more)  

---

## 🤖 Introduction

The **AI Image Question Answering** project is a **deep learning-powered web application** designed to process images and provide meaningful answers to user-posed questions. Built with **Streamlit**, this app leverages **Salesforce’s BLIP (Bootstrapped Language-Image Pretraining) model** to extract relevant insights from images.  

Key functionalities include:  

- **Uploading images (JPEG, PNG) for analysis.**  

- **Asking both predefined and custom questions about the image.**  

- **Generating AI-based answers using the BLIP model.**  

- **Providing real-time inference with GPU acceleration (if available).**  

- **Delivering a seamless and interactive user experience with Streamlit.**  

Whether you're a researcher, developer, or AI enthusiast, this project serves as an excellent **introduction to multimodal AI** and **visual question answering (VQA) applications**.  

---

## ⚙️ Tech Stack

- **Python**  

- **Streamlit**  

- **Hugging Face Transformers**  

- **PyTorch**  

- **PIL (Pillow)**  

---

## 🔋 Features

👉 **AI-Powered Image Question Answering**: Upload an image and ask any question related to its content.  

👉 **Predefined Questions for Instant Insights**: Select from a list of commonly asked questions for a quick analysis.  

👉 **Custom Question Input**: Type your own question to get AI-generated responses tailored to your query.  

👉 **Real-Time Processing with AI Feedback**: Experience **instantaneous results** with **dynamic loading indicators** while the AI processes your request.  

👉 **Seamless Streamlit UI**: Intuitive **drag-and-drop** image upload and interactive **question submission** for smooth user experience.  

👉 **Optimized for GPU Acceleration**: The model runs efficiently on CUDA-enabled devices for **faster inference**.  

---

## 🚀 Quick Start

Follow these steps to set up and run the project on your local machine.  

### **Prerequisites**  

Ensure you have the following installed:  

- [Python 3.8+](https://www.python.org/downloads/)  

- [pip](https://pip.pypa.io/en/stable/installation/)  

### **Cloning the Repository**  

```bash

git clone https://github.com/TheODDYSEY/Visual-Answering-Transformers-Model.git

cd ai-image-question-answering

```

### **Installation**  

Install all required dependencies using:  

```bash

pip install -r requirements.txt

```

### **Running the Project**  

```bash

streamlit run streamlit_app.py

```

Open **[http://localhost:8501](http://localhost:8501)** in your browser to interact with the application.  

---

## 🕸️ Code Snippets

### **1️⃣ Loading the BLIP Model**  

```python

from transformers import BlipProcessor, BlipForQuestionAnswering

import torch

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

processor = BlipProcessor.from_pretrained("Salesforce/blip-vqa-base")

model = BlipForQuestionAnswering.from_pretrained("Salesforce/blip-vqa-base").to(DEVICE)

```

### **2️⃣ Processing Images and Questions**  

```python

def get_answer(image, question):

    """Processes an image and question to return an AI-generated response."""

    inputs = processor(image, question, return_tensors="pt").to(DEVICE)

    output = model.generate(**inputs)

    return processor.decode(output[0], skip_special_tokens=True)

```

### **3️⃣ Implementing Streamlit UI**  

```python

import streamlit as st

from PIL import Image

st.title("📸 AI Image Question Answering")

uploaded_image = st.file_uploader("Upload an image", type=["jpg", "png"])

if uploaded_image is not None:

    image = Image.open(uploaded_image)

    st.image(image, caption="Uploaded Image", use_column_width=True)

    question = st.text_input("Ask a question about the image:")

    if st.button("Get Answer"):

        answer = get_answer(image, question)

        st.success(f"**Q:** {question}")

        st.write(f"**A:** {answer}")

```

---

## 🔗 Links

- 🔗 **Live Demo**: [Try it here](https://theoddysey-visual-answering-transformers-m-streamlit-app-vimg1u.streamlit.app/)  

- 📜 **Project Repository**: [GitHub](https://github.com/TheODDYSEY/Visual-Answering-Transformers-Model.git)  

- 📚 **BLIP Model**: [Hugging Face](https://huggingface.co/Salesforce/blip-vqa-base)  

---

## 📌 More

🔹 **Future Enhancements**  

- ✅ Improve model inference speed for real-time responses.  

- ✅ Extend support for **OCR-based text recognition**.  

- ✅ Enhance UI with **interactive visualization** options.  

## **License**  

This project is **open-source** under the **MIT License**.  

---