https://github.com/wangleihitcs/imagecaptions

A base model for image captions.
https://github.com/wangleihitcs/imagecaptions

captioning rnn-model tensorflow

Last synced: 4 months ago
JSON representation

A base model for image captions.

Host: GitHub
URL: https://github.com/wangleihitcs/imagecaptions
Owner: wangleihitcs
Created: 2018-11-30T02:06:09.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-04-27T14:58:55.000Z (about 6 years ago)
Last Synced: 2025-01-09T08:27:23.328Z (6 months ago)
Topics: captioning, rnn-model, tensorflow
Language: Python
Homepage:
Size: 96.3 MB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        ### ImageCaptions

A base model for image captioning

### Config

- python 2.7

- tensorflow 1.8.0

- python package 

    * nltk

    * PIL

    * json

    * numpy

It is all of common tookits, so I don't give their links.

### DataDownload

- coco image dataset

    * you need to download [train2017.zip](http://images.cocodataset.org/zips/train2017.zip)

    * then unzip it to dir 'data/train2017/'

- coco image annotations

    * you need to download [annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)

    * then unzip it:

        * copy 'captions_train2017.json' to dir 'data/coco_annotations'

- pretrain inception model

    * you need to download [inception_v3.ckpt](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz

) to dir 'data/inception/'

## Train

#### First, get post proccess data

- get 'data/captions.json', 'data/captions_gt.json'

    ```shell

    $ cd preproccess

    $ python data_entry.py    

    ```

- get 'data/image_id_train.json', 'data/image_id_val.json', 'data/image_id_test.json'

    ```shell

    $ cd preproccess

    $ python image_id_split.py    

    ```

- get 'data/vocabulary.json'

    ```shell

    $ cd preproccess

    $ python vocabulary.py    

    ```

#### Second, get TFRecord files

Because dataset is too large, we should do some operations to purse speed and CPU|GPU efficiency.

You need to wait 30 mins to convert data to 'data/tfrecord/train-xx.tfrecord', I convert Train Data to 40 tfrecord files.

* get 'data/tfrecord/train-00.tfrecord' - 'data/tfrecord/train-39.tfrecord'

    ```shell

    $ python datasets.py    

    ```

* so you need get 'data/tfrecord_name_train.json' for tensorflow filename queue, it is easy

* the val dataset and test data is the same.

    

#### Third, let's go train

```shell

    $ python main.py    

```

## Experiments

Train/Val/Test Dataset, 82783/5000/5000, vocabulary size = 14643 and we not filter out word. We use greedy search not beam search.

#### CNN+RNN

|  | BLEU_1 | BLEU_2 | BLEU_3 | BLEU_4 | METEOR | ROUGE | CIDEr |

| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |

| Train Dataset | 0.7051 | 0.5322 | 0.3832 | 0.2682 | 0.2283 | 0.5128 | 0.7968 |

| Val Dataset | 0.6667 | 0.4866 | 0.3405 | 0.2337 | 0.2096 | 0.4831 | 0.7024 |

| Test Dataset | 0.6687 | 0.4879 | 0.3421 | 0.2364 | 0.2096 | 0.4838 | 0.6972 |

| Paper | 0.666 | 0.461 | 0.329 | 0.246 | - | - | - |

e.g. Show and Tell: A Neural Image Caption Generator, CVPR 2015([pdf](https://arxiv.org/pdf/1411.4555.pdf))

#### CNN+RNN+Soft-Attention

|  | BLEU_1 | BLEU_2 | BLEU_3 | BLEU_4 | METEOR | ROUGE | CIDEr |

| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |

| Val Dataset | 0.6467 | 0.4615 | 0.3180 | 0.2177 | 0.2014 | 0.4684 | 0.6310 |

| Test Dataset | 0.6482 | 0.4638 | 0.3210 | 0.2217 | 0.2013 | 0.4633 | 0.6245 |

| Paper | 0.707 | 0.492 | 0.344 | 0.243 | 0.2390 | - | - |

e.g. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, ICML([pdf](https://arxiv.org/pdf/1502.03044.pdf))

## Example

![examples](data/examples/example1.png)

## Summary

The model is very very*N simple, I never adjust the hyperparameter, so if you want, you could do.

## References

- [Tensorflow Model released im2text](https://github.com/tensorflow/models/tree/master/research/im2txt)

- [An Implementation in Tensorflow of Guoming Wang](https://github.com/DeepRNN/image_captioning)

- [MS COCO Caption Evaluation Tookit](https://github.com/tylin/coco-caption)

- Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

- Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning. 2015.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wangleihitcs/imagecaptions

Awesome Lists containing this project

README