https://github.com/roysti10/image_captioning

Image Captioning using Encoder Decoder network , Pretrained models given
https://github.com/roysti10/image_captioning

checkpoints encoder-decoder-model flickr8k image-captioning tensorflow

Last synced: 3 months ago
JSON representation

Image Captioning using Encoder Decoder network , Pretrained models given

Host: GitHub
URL: https://github.com/roysti10/image_captioning
Owner: roysti10
License: mit
Created: 2020-05-01T12:09:11.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-12-27T05:32:29.000Z (over 4 years ago)
Last Synced: 2025-01-29T13:14:59.902Z (5 months ago)
Topics: checkpoints, encoder-decoder-model, flickr8k, image-captioning, tensorflow
Language: Python
Homepage:
Size: 10.8 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Image Captioning

## Dataset Preparation
* Clone this repsoitory using
```bash
git clone https://github.com/lucasace/Image_Captioning.git
```
* Download the Flickr8k Image and Text dataset from [here](https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip) and [here](https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip) respectively
* Unzip both the dataset and text files and place it inside the repository folder

## I want to train the model
To train the model simply run
```bash
python3 main.py --type train --checkpoint_dir --cnnmodel --image_folder --caption_file --feature_extraction
```
* The checkpoint dir is the place where your model checkpoints are going to be saved.
* cnnmodel is either inception or vgg16,default is inception
* imagefolder is location of the folder with all the images
* caption_file is Location to 'Flickr8k.token.txt'
* feature_extraction - True or False,default is True
* True if you havent extracted the image features
* False if you have already extracted the image features
This saves time and memory when training again
* batch_size batch_size of training and validation default is 128

## Testing the model
```bash
python3 main.py --type test --checkpoint_dir --cnnmodel --image_folder --caption_file --feature_extraction
```
* Download the checkpoints from [here](https://drive.google.com/drive/u/1/folders/1-VJXewV_Da9TNLrNpwORY5EY0_slxT1g) if your cnn_model is inception ,if your cnn_model is vgg 16 download from [here](https://drive.google.com/drive/u/1/folders/1o020lkAFADNs_4vGJKAxGl_-NP41VHyN) or you can use your own trained checkpoints
* All arguments are same as in training model

## I just want to caption

```bash
python3 main.py --type caption --checkpoint_dir --cnnmodel --caption_file --to_caption
```
* Download the checkpoints from [here](https://drive.google.com/drive/u/1/folders/1-VJXewV_Da9TNLrNpwORY5EY0_slxT1g)
* Note these are inception checkpoints and for vgg16 download from [here](https://drive.google.com/drive/u/1/folders/1o020lkAFADNs_4vGJKAxGl_-NP41VHyN)
* captionfile is required to make the vocabulary

## Custom dataset
if you want to train it on a custom dataset kindly make changes in the dataset.py folder to make it suitable for your dataset

## Results
|Model Type|CNN_Model|Bleu_1|Bleu_2|Bleu_3|Bleu_4|Meteor|
| --- | --- | --- | --- | --- | --- | --- |
|Encoder-Decoder|Inception_V3|60.12|51.1|48.13|39.5|25.8|
| |VGG16|58.46|49.87 |47.50|39.37|26.32|

Here are some of the results:
* ![1](results/baseball.png)
* ![2](results/index.png)
* ![3](results/dogfrisbee.png)

## Things to Do
- [ ] beam search
- [ ] Image Captioning using Soft and Hard Attention
- [ ] Image Captioning using Adversarial Training

## Contributions

Any contributions are welcome

If there is any issue with the model or errors in the program, feel free to raise a issue or set up a PR.

## References
* O. Vinyals, A. Toshev, S. Bengio and D. Erhan, "Show and tell: A neural image caption generator," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3156-3164, doi: 10.1109/CVPR.2015.7298935.
* Tensorflow documentation on Image Captioning
* [Machine Learning Mastery](https://machinelearningmastery.com/develop-a-deep-learning-caption-generation-model-in-python/) for dataset
* nltk documentation for meteor score
* [RNN lecture by Standford University](https://www.youtube.com/watch?v=6niqTuYFZLQ&t=1731s)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/roysti10/image_captioning

Awesome Lists containing this project

README