https://github.com/hgschandeepa/image-caption-generator

This project aims to generate captions for images using a Convolutional Neural Network (CNN) for feature extraction and a Long Short-Term Memory (LSTM) network for sequence modeling. The project utilizes the Flickr8k dataset for training and evaluation.
https://github.com/hgschandeepa/image-caption-generator

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/hgschandeepa/image-caption-generator
Owner: HGSChandeepa
Created: 2024-06-23T10:24:59.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-23T10:37:47.000Z (over 1 year ago)
Last Synced: 2025-01-11T23:45:57.339Z (9 months ago)
Language: Jupyter Notebook
Size: 2.77 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Image Caption Generator with CNN & LSTM

![Finished App](https://github.com/HGSChandeepa/Image-Caption-Generator/blob/main/core.png)

## Table of Contents

- [Introduction](#introduction)
- [Dataset](#dataset)
- [Installation](#installation)
- [Data Preprocessing](#data-preprocessing)
- [Model Architecture](#model-architecture)
- [Training the Model](#training-the-model)
- [Generating Captions](#generating-captions)
- [Evaluation](#evaluation)
- [Results](#results)
- [Conclusion](#conclusion)
- [Acknowledgements](#acknowledgements)

## Introduction

Image captioning involves generating textual descriptions for given images. This project combines the power of CNNs and LSTMs to create a model that can produce meaningful captions for images.

## Dataset

The dataset used in this project is the [Flickr8k dataset](https://www.kaggle.com/datasets/adityajn105/flickr8k/data). It consists of 8,000 images each paired with five different captions.

## Installation

1. Clone this repository:
```bash
git clone https://github.com/HGSChandeepa/Image-Caption-Generator.git
```
2. Navigate to the project directory:
```bash
cd Image-Caption-Generator
```
3. Install the required packages:
```bash
pip install -r requirements.txt
```

## Data Preprocessing

### Extract Features from Images

This block of code is responsible for extracting features from images using a pre-trained model. It loads each image from a specified directory, preprocesses the image to the required input format, and uses the pre-trained model to generate feature vectors for each image. These features are then stored in a dictionary with the image ID as the key.

### Store Features in Pickle

This block of code saves the extracted image features to a pickle file. The features dictionary, which contains the feature vectors for each image, is serialized and stored in a file named 'features.pkl' in the specified working directory. This allows for easy loading and reuse of the preprocessed features in future stages of the project.

### Tokenize the Text

This block of code tokenizes the text captions. It creates a `Tokenizer` object from the Keras library, which is used to vectorize a text corpus by turning each text into either a sequence of integers or a vector. The tokenizer is then fitted on all the captions, which builds the word index based on the frequency of words in the captions. The vocabulary size is determined by the number of unique tokens found, which is the length of the tokenizer's word index plus one (to account for the reserved index for padding).

### Split Data into Training and Testing Sets

This block of code splits the dataset into training and testing sets. It first creates a list of image IDs from the keys of the `mapping` dictionary, which presumably maps images to their corresponding captions. It then calculates a split index to divide the dataset, typically set to 90% for training and 10% for testing. The list of image IDs is sliced into two separate lists: `train` for training data and `test` for testing data.

## Model Architecture

The model combines a CNN for image feature extraction and an LSTM for generating captions.

## Training the Model

Train the model using the training dataset. Specify the loss function, optimizer, and run the training loop.

## Generating Captions

Use the trained model to generate captions for new images.

## Evaluation

Evaluate the model's performance using metrics like BLEU score and visualize the results with example images and their predicted captions.

## Results

Showcase some example images along with their generated captions and the BLEU scores.

## Conclusion

Summarize the project, discuss the results, and suggest potential improvements or future work.

## Acknowledgements

- The dataset used in this project is from Kaggle: [Flickr8k dataset](https://www.kaggle.com/datasets/adityajn105/flickr8k/data).
- This project was created with the help of various open-source libraries and pre-trained models.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hgschandeepa/image-caption-generator

Awesome Lists containing this project

README