https://github.com/adi2334/image-caption-generator
This project implements an image captioning model using a CNN-LSTM architecture. The model takes an image as input and generates a descriptive caption using natural language processing techniques
https://github.com/adi2334/image-caption-generator
cnn computer-vision deep-learning imagecaptioning lstm machine-learning neural-network tensorflow
Last synced: 2 months ago
JSON representation
This project implements an image captioning model using a CNN-LSTM architecture. The model takes an image as input and generates a descriptive caption using natural language processing techniques
- Host: GitHub
- URL: https://github.com/adi2334/image-caption-generator
- Owner: Adi2334
- Created: 2025-03-16T16:04:32.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-03-18T13:14:21.000Z (2 months ago)
- Last Synced: 2025-03-18T13:45:59.233Z (2 months ago)
- Topics: cnn, computer-vision, deep-learning, imagecaptioning, lstm, machine-learning, neural-network, tensorflow
- Language: Python
- Homepage:
- Size: 37.4 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **Image Caption Generator using Deep Learning**
## **Overview**
This project implements an **image captioning model** using a **CNN-LSTM architecture**. The model takes an image as input and generates a descriptive caption using natural language processing techniques. It is trained on a dataset containing images and their corresponding textual descriptions.## **Dataset**
- The model is trained on **Flickr8k** dataset.
- It consists of **8000** images with multiple captions per image.
#### **Data Augmentation**
To improve model performance, images were **horizontally flipped**.## **Model Architecture**
The model consists of three main components:
1. **Image Feature Extractor (CNN)**
- Uses **Xception** to extract feature from images.
2. **Sequence Processor (LSTM)**
- An **embedding layer** processes input text sequences.
- An **LSTM network** learns dependencies between words in a sentence.
3. **Decoder (Dense Layer with Softmax)**
- Combines image features and text sequences.
- Generates the next word in the caption.
To view the model architecture in detail you may use [Netron](https://netron.app/) by uploading saved model.## **Evaluation Metrics**
The model is evaluated using the following metrics:
📌 **BLEU-1:** 0.6131
📌 **BLEU-2:** 0.5453
📌 **BLEU-3:** 0.4483
📌 **BLEU-4:** 0.3635
📌 **ROUGE-L:** 0.3314
📌 **CIDEr:** 0.0497
📌 **SPICE:** 0.0451## **How to Use**
#### **1. Clone the Repository**
```bash
git clone https://github.com/yourusername/image-captioning.git
cd image-captioning
```#### **2. Install Dependencies**
```bash
pip install -r requirements.txt
```
#### **3. Extract Features**
```bash
mkdir data
python utils/preprocess.py
python utils/feature_extract.py
python utils/data_loader.py
```#### **4. Training**
You can also use pretrained weigths.
```bash
python train.py
```#### **5. Run the Model**
To test the model with your own images:
```bash
python test.py --image_path path/to/image.jpg
```#### **6. Streamlit Web App**
Run the **Streamlit** interface for uploading images and generating captions:
```bash
streamlit run Streamlit.py
```#### **7. Evaluation of Model**
Evaluate the model based on some **NLP** metrics commonly used for :
```bash
python evaluation/test_cap.py
python evaluation/evaluation.py
```## **Results**
Example output from the model:| **Input Image** |  |
|---------------|-------------------|
|**Generated Caption** | *"man in the water"* |## **Future Improvements**
🔹 Train on a larger dataset for improved generalization.
🔹 Experiment with **Transformer-based models** (e.g., ViT + GPT-2, BLIP).
🔹 Implement **beam search** for better caption generation.
🔹 Optimize the model using **reinforcement learning (CIDEr optimization)**.## **Contributor**
👤 **[Aditya Nikam](https://www.linkedin.com/in/aditya-nikam-4885bb232/)** student at IIT Kanpur
- contact ([email protected] / [email protected])
----------------------------------------------------------------