https://github.com/theoddysey/visual-answering-transformers-model
An Intelligent Image Analysis Application using BLIP and Streamlit
https://github.com/theoddysey/visual-answering-transformers-model
blip huggingface-transformers image-processing image-to-text python3 streamlit
Last synced: 6 months ago
JSON representation
An Intelligent Image Analysis Application using BLIP and Streamlit
- Host: GitHub
- URL: https://github.com/theoddysey/visual-answering-transformers-model
- Owner: TheODDYSEY
- Created: 2025-02-07T10:07:52.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-11T14:52:27.000Z (8 months ago)
- Last Synced: 2025-02-11T15:37:55.383Z (8 months ago)
- Topics: blip, huggingface-transformers, image-processing, image-to-text, python3, streamlit
- Language: Jupyter Notebook
- Homepage: https://theoddysey-visual-answering-transformers-m-streamlit-app-vimg1u.streamlit.app/
- Size: 47 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
πΈ Visual-Answering-Transformers-Model
An intelligent image analysis application using BLIP and Streamlit
Upload an image, ask questions, and get AI-powered answers instantly.
![]()
![]()
![]()
---
## π Table of Contents
1. π€ [Introduction](#introduction)
2. βοΈ [Tech Stack](#tech-stack)
3. π [Features](#features)
4. π [Quick Start](#quick-start)
5. πΈοΈ [Code Snippets](#snippets)
6. π [Links](#links)
7. π [More](#more)---
The **AI Image Question Answering** project is a **deep learning-powered web application** designed to process images and provide meaningful answers to user-posed questions. Built with **Streamlit**, this app leverages **Salesforceβs BLIP (Bootstrapped Language-Image Pretraining) model** to extract relevant insights from images.
Key functionalities include:
- **Uploading images (JPEG, PNG) for analysis.**
- **Asking both predefined and custom questions about the image.**
- **Generating AI-based answers using the BLIP model.**
- **Providing real-time inference with GPU acceleration (if available).**
- **Delivering a seamless and interactive user experience with Streamlit.**Whether you're a researcher, developer, or AI enthusiast, this project serves as an excellent **introduction to multimodal AI** and **visual question answering (VQA) applications**.
---
- **Python**
- **Streamlit**
- **Hugging Face Transformers**
- **PyTorch**
- **PIL (Pillow)**---
π **AI-Powered Image Question Answering**: Upload an image and ask any question related to its content.
π **Predefined Questions for Instant Insights**: Select from a list of commonly asked questions for a quick analysis.
π **Custom Question Input**: Type your own question to get AI-generated responses tailored to your query.
π **Real-Time Processing with AI Feedback**: Experience **instantaneous results** with **dynamic loading indicators** while the AI processes your request.
π **Seamless Streamlit UI**: Intuitive **drag-and-drop** image upload and interactive **question submission** for smooth user experience.
π **Optimized for GPU Acceleration**: The model runs efficiently on CUDA-enabled devices for **faster inference**.
---
Follow these steps to set up and run the project on your local machine.
### **Prerequisites**
Ensure you have the following installed:
- [Python 3.8+](https://www.python.org/downloads/)
- [pip](https://pip.pypa.io/en/stable/installation/)### **Cloning the Repository**
```bash
git clone https://github.com/TheODDYSEY/Visual-Answering-Transformers-Model.git
cd ai-image-question-answering
```### **Installation**
Install all required dependencies using:
```bash
pip install -r requirements.txt
```### **Running the Project**
```bash
streamlit run streamlit_app.py
```Open **[http://localhost:8501](http://localhost:8501)** in your browser to interact with the application.
---
### **1οΈβ£ Loading the BLIP Model**
```python
from transformers import BlipProcessor, BlipForQuestionAnswering
import torchDEVICE = "cuda" if torch.cuda.is_available() else "cpu"
processor = BlipProcessor.from_pretrained("Salesforce/blip-vqa-base")
model = BlipForQuestionAnswering.from_pretrained("Salesforce/blip-vqa-base").to(DEVICE)
```### **2οΈβ£ Processing Images and Questions**
```python
def get_answer(image, question):
"""Processes an image and question to return an AI-generated response."""
inputs = processor(image, question, return_tensors="pt").to(DEVICE)
output = model.generate(**inputs)
return processor.decode(output[0], skip_special_tokens=True)
```### **3οΈβ£ Implementing Streamlit UI**
```python
import streamlit as st
from PIL import Imagest.title("πΈ AI Image Question Answering")
uploaded_image = st.file_uploader("Upload an image", type=["jpg", "png"])
if uploaded_image is not None:
image = Image.open(uploaded_image)
st.image(image, caption="Uploaded Image", use_column_width=True)question = st.text_input("Ask a question about the image:")
if st.button("Get Answer"):
answer = get_answer(image, question)
st.success(f"**Q:** {question}")
st.write(f"**A:** {answer}")
```---
## π Links
- π **Live Demo**: [Try it here](https://theoddysey-visual-answering-transformers-m-streamlit-app-vimg1u.streamlit.app/)
- π **Project Repository**: [GitHub](https://github.com/TheODDYSEY/Visual-Answering-Transformers-Model.git)
- π **BLIP Model**: [Hugging Face](https://huggingface.co/Salesforce/blip-vqa-base)---
## π More
πΉ **Future Enhancements**
- β Improve model inference speed for real-time responses.
- β Extend support for **OCR-based text recognition**.
- β Enhance UI with **interactive visualization** options.## **License**
This project is **open-source** under the **MIT License**.---