https://github.com/hk-kumawat/visual-qna-system
🔍 An AI tool for image-based Q&A and captioning, enabling users to upload images and receive concise answer to the question asked!
https://github.com/hk-kumawat/visual-qna-system
blip image-captioning vilt visual-question-answering
Last synced: 3 months ago
JSON representation
🔍 An AI tool for image-based Q&A and captioning, enabling users to upload images and receive concise answer to the question asked!
- Host: GitHub
- URL: https://github.com/hk-kumawat/visual-qna-system
- Owner: hk-kumawat
- License: mit
- Created: 2024-11-09T18:11:02.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-11T18:38:56.000Z (about 1 year ago)
- Last Synced: 2025-07-22T13:27:33.492Z (6 months ago)
- Topics: blip, image-captioning, vilt, visual-question-answering
- Language: Python
- Homepage: https://ask-visual.streamlit.app/
- Size: 77.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🔍 **Visual QnA System** 🖼️

## Overview
The **Visual QnA System** enables users to upload an image and ask specific questions about its content. Using cutting-edge models like **VILT** for **Visual Question Answering** and **BLIP** for **image captioning**, this system provides interactive and intelligent responses based on the image analysis. It is perfect for applications in AI-powered chatbots, image understanding, and automated analysis.
## Live Demo
Try out the Visual QnA System! 👉🏻 [](https://ask-visual.streamlit.app/)
_Below is a preview of the Visual QnA System in action. Upload an image and ask questions! 👇🏻_
## Table of Contents
1. [Features](#features)
2. [Models](#models)
3. [Installation](#installation)
4. [Usage](#usage)
5. [Technologies Used](#technologies-used)
6. [Results](#results)
7. [Conclusion](#conclusion)
8. [Future Enhancements](#future-enhancements)
9. [License](#license)
10. [Contact](#contact)
## Features🌟
- Upload an image and receive a generated caption.
- Choose from suggested questions or ask your own.
- Get answers to questions based on the image content.
- Built with **Streamlit** for an interactive and easy-to-use interface.
## Models🧠
### **VILT (Vision-and-Language Transformer)**
- A model used for Visual Question Answering.
- Uses a combination of image features and text input to provide answers.
### **BLIP (Bootstrapping Language-Image Pretraining)**
- A model for generating captions from images.
- The captions are used to generate possible questions for the user to ask.
## Installation🛠
1. **Clone the repository**:
```bash
https://github.com/hk-kumawat/Visual-QnA-System.git
```
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
## Usage🚀
1. **Run the Streamlit App**:
```bash
streamlit run app.py
```
2. **Upload Image**: Choose an image from your local drive.
3. **Select Question**: You can either pick a suggested question or write your own.
4. **Get Answer**: Click the "Predict Answer" button to receive an answer to your question about the image.
## Technologies Used💻
- **Programming Language**: Python
- **Libraries**:
- `Streamlit` for the web interface
- `PIL` for image handling
- `Transformers` from Hugging Face for pre-trained models
- **Models**:
- **VILT**: `dandelin/vilt-b32-finetuned-vqa`
- **BLIP**: `Salesforce/blip-image-captioning-base`
## Results🏆
The Visual QnA System offers an interactive experience where users can ask questions about images. It successfully generates captions and suggests questions based on image content, as well as providing accurate answers using the **VILT** model.
The **Visual QnA System** successfully answers questions based on image content. Here's an example of how the system works:
In this case, the system was asked, **_"What sport is being played?"_** and the response was **_"Soccer,"_** showcasing its ability to understand the context of images.
## Conclusion📚
The **Visual QnA System** is a powerful application of **computer vision** and **natural language processing**. By integrating **image captioning** and **question answering** models, it provides an engaging and intuitive way for users to interact with images. This project demonstrates the potential of **AI-driven image understanding** and its wide range of applications in fields like **AI chatbots**, **image search engines**, and **education and e-learning**.
With the ability to analyze and answer questions about images, it can enhance **customer support**, optimize **image-based search results**, and improve **personalized recommendations** based on visual content. Additionally, it has immense potential in areas like **healthcare for diagnostic imaging**, **security and surveillance**, and even in **autonomous vehicles**, where understanding the visual environment is critical.
## Future Enhancements🚀
While the **Visual QnA System** currently delivers concise, single-line responses, future improvements could enable more detailed, context-aware answers. Here are a few potential upgrades:
- **Extended Answer Generation**: Integrate advanced language models to generate detailed answers that provide in-depth information based on image content.
- **Context Awareness**: Enable the system to consider multiple objects and interactions in an image, enhancing its capability to answer complex questions.
- **Multilingual Support**: Add the ability to understand and answer questions in various languages, broadening accessibility.
- **Enhanced Accuracy with Fine-Tuning**: Train on diverse datasets for specialized fields, such as medical imaging or geographical scenes, to improve precision and expand application areas.
## License📝
This project is licensed under the **MIT License** - see the [LICENSE](./LICENSE) file for details.
## Contact
### 📬 Get in Touch!
I’d love to connect and discuss further:
- [](https://github.com/hk-kumawat) 💻 — Explore my projects and contributions.
- [](https://www.linkedin.com/in/harshal-kumawat/) 🌐 — Let’s connect professionally.
- [](mailto:harshalkumawat100@gmail.com) 📧 — Send me an email for discussions and queries.
---
## Thanks for exploring the **Visual QnA System**! 🙌👁️ I hope it sparked your curiosity and imagination!
> "Empowering machines to see, think, and answer – the future of visual intelligence!" - Anonymous