https://github.com/tejas-130704/image-captioning
Image Captioning: This project utilizes deep learning for image captioning, combining a CNN-based feature extractor with an LSTM-based sequence generator.
https://github.com/tejas-130704/image-captioning
computer-vision deep-neural-networks image-caption-generator lstm-neural-networks nlp resnet-50
Last synced: 2 months ago
JSON representation
Image Captioning: This project utilizes deep learning for image captioning, combining a CNN-based feature extractor with an LSTM-based sequence generator.
- Host: GitHub
- URL: https://github.com/tejas-130704/image-captioning
- Owner: tejas-130704
- Created: 2025-02-23T14:50:53.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-02-23T15:36:14.000Z (3 months ago)
- Last Synced: 2025-02-23T16:20:08.528Z (3 months ago)
- Topics: computer-vision, deep-neural-networks, image-caption-generator, lstm-neural-networks, nlp, resnet-50
- Language: Jupyter Notebook
- Homepage:
- Size: 643 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Image Captioning using ResNet50 and LSTM
This project implements an Image Captioning model using **ResNet50** for feature extraction and **LSTM** for sequence generation. The model is trained on 8K images, which results in decent but not highly accurate captions due to the limited dataset size.
## Preview



## Features
- Uses **ResNet50** as the feature extractor
- LSTM-based caption generation
- Trained on **8K images** with a limited vocabulary
- **Streamlit application** provided for easy testing## Model Download & Usage
The trained model is **too large to be pushed to GitHub**. Instead, you can train your own model using the provided Jupyter Notebook. Alternatively, you can use the Streamlit application to test the model by adding the only `model.h5` and `tokenizer.json` file.### Steps to Use the Streamlit App
1. Clone this repository:
```bash
git clone https://github.com/tejas-130704/Image-Captioning.git
cd Image-Captioning
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Place your trained **`model.h5`** file in models folder and **`tokenizer.json`** file in the project directory.
4. Run the Streamlit application:
```bash
streamlit run app.py
```
5. Upload an image and get captions!## Limitations & Future Improvements
- **Limited dataset size (8K images)** results in **less accurate captions**.
- Accuracy can be significantly improved by training on **larger datasets** such as **COCO** or **Flickr30k**.
- Future work can involve adding an **Attention Mechanism** for better caption quality.## Dataset
The model is trained using [Flickr 8k Dataset](https://www.kaggle.com/datasets/adityajn105/flickr8k)## Contributions
Feel free to fork this repository and contribute improvements. Pull requests are welcome!