https://github.com/prachipatel15/ai-image-captioning
An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app provides enhanced captions by integrating detected objects into the generated text.
https://github.com/prachipatel15/ai-image-captioning
computer-vision image-processing streamlit transformers vit-gpt2 yolov8
Last synced: 2 months ago
JSON representation
An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app provides enhanced captions by integrating detected objects into the generated text.
- Host: GitHub
- URL: https://github.com/prachipatel15/ai-image-captioning
- Owner: PrachiPatel15
- Created: 2025-02-21T06:43:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-21T06:44:32.000Z (over 1 year ago)
- Last Synced: 2025-02-21T07:30:53.803Z (over 1 year ago)
- Topics: computer-vision, image-processing, streamlit, transformers, vit-gpt2, yolov8
- Language: Python
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI-Image-Captioning
An **AI-powered image captioning app** built with **Streamlit**, using **ViT-GPT2** for caption generation and **YOLOv8** for object detection. The app enhances captions by integrating detected objects into the generated text.
## 🔥 Features
- **AI-powered image captioning** using **ViT-GPT2**.
- **Object detection** with **YOLOv8** to enhance captions.
- **Dark-themed UI** with **Streamlit**.
- **Interactive settings** for enabling/disabling object detection.
- **Optimized inference** with GPU acceleration (CUDA support).
## 🚀 Demo
### 1️⃣ **Upload an Image**

### 2️⃣ **Enable Object Detection and Generate Captions**

### 3️⃣ **View Enhanced Caption and Detected Objects**

## 📂 Installation & Setup
### 1️⃣ **Clone the Repository**
```bash
git clone https://github.com/yourusername/AI-Image-Captioning.git
cd AI-Image-Captioning
```
### 2️⃣ **Create a Virtual Environment (Optional but Recommended)**
```bash
python -m venv venv
source venv/bin/activate # On macOS/Linux
venv\Scripts\activate # On Windows
```
### 3️⃣ **Install Dependencies**
```bash
pip install -r requirements.txt
```
### 4️⃣ **Run the Application**
```bash
streamlit run app.py
```
## 🧠 Models Used
### **1️⃣ ViT-GPT2** (Image Captioning)
- **Pretrained Model**: `nlpconnect/vit-gpt2-image-captioning`
- **Task**: Generates textual descriptions for input images.
### **2️⃣ YOLOv8** (Object Detection)
- **Pretrained Model**: `yolov8n.pt`
- **Task**: Detects objects in the image to enhance captions.
## ⚙️ Project Structure
```bash
AI-Image-Captioning/
│── app.py # Main Streamlit application
│── requirements.txt # Required dependencies
│── README.md # Documentation
│── assest/ # Store images/screenshots
```
## 🛠️ Usage Instructions
1. **Upload an image** in the app.
2. **Choose whether to enable object detection**.
3. **Click 'Analyze Image'** to generate a caption.
4. **View enhanced captions** and object detection results.
## 💡 Future Improvements
- [ ] Add multilingual captioning support.
- [ ] Optimize object detection performance.
- [ ] Implement additional caption refinement techniques.
## 🤝 Contributing
Contributions are welcome! Feel free to **fork** this repository and create a **pull request** with your improvements.