https://github.com/markdtw/soft-attention-image-captioning
tensorflow implementation of show, attend and tell (ICML'15)
https://github.com/markdtw/soft-attention-image-captioning
image-captioning tensorflow
Last synced: 8 months ago
JSON representation
tensorflow implementation of show, attend and tell (ICML'15)
- Host: GitHub
- URL: https://github.com/markdtw/soft-attention-image-captioning
- Owner: markdtw
- Created: 2017-01-13T05:52:02.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-06-22T06:04:21.000Z (almost 9 years ago)
- Last Synced: 2025-05-05T17:32:39.093Z (about 1 year ago)
- Topics: image-captioning, tensorflow
- Language: Python
- Homepage:
- Size: 639 KB
- Stars: 19
- Watchers: 5
- Forks: 11
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Soft Attention Image Captioning
Tensorflow implementation of [Show, Attend and Tell](https://arxiv.org/abs/1502.03044) presented in ICML'15.
Huge re-factor from last update, compatible with tensorflow >= r1.0
## Prerequisites
- Python 2.7+
- [NumPy](http://www.numpy.org/)
- [Tensorflow r1.0+](https://www.tensorflow.org/install/)
- [Scikit-image](http://scikit-image.org/)
- [tqdm](https://pypi.python.org/pypi/tqdm)
## Data
- Training: [Microsoft COCO: Common Objects in Context](http://mscoco.org/dataset/#download) training and validation set
## Preparation
1. Clone this repo, create `data/` and `log/` folders:
```bash
git clone https://github.com/markdtw/soft-attention-image-captioning.git
cd soft-attention-image-captioning
mkdir data
mkdir log
```
2. Download and extract pre-trained `Inception V4` and `VGG 19` [from tf.slim](https://github.com/tensorflow/models/tree/master/slim#pre-trained-models) for feature extraction.
Save the ckpt files in `cnns/` as `inception_v4_imagenet.ckpt` and `vgg_19_imagenet.ckpt`.
3. We need the following files in our `data/` folder:
- `coco_raw.json`
- `coco_processed.json`
- `coco_dictionary.pkl`
- `coco_final.json`
- `train2014_vgg(inception).npy` and `val2014_vgg(inception).npy`
These files can be generated through `utils.py`, please refer to it before executing.
4. If you are not able to extract the features yourself, [here](https://drive.google.com/open?id=0B5j6QKJb0ztbRXRQWW12ME9uSGs) is the features download link:
- It may take a long time to download.
## Train
Train from scratch with default settings:
```bash
python main.py --train
```
Train from a pre-trained model from epoch X:
```bash
python main.py --train --model_path=log/model.ckpt-X
```
Check out tunable arguments:
```bash
python main.py
```
## Generate a caption
Using default(latest) model:
```bash
python main.py --generate --img_path=/path/to/image.jpg
```
Using model from epoch X:
```bash
python main.py --generate --img_path=/path/to/image.jpg --model_path=log/model.ckpt-X
```
## Others
- Features extracted are around 16 + 8 GB. Make sure you have enough CPU memory when loading the data.
- GPU memory usage for batch_size 128 is around 8GB.
- Utilize `tf.while_loop` for rnn implementation, `tf.slim` for feature extraction from their [github page](https://github.com/tensorflow/models/tree/master/slim).
- GRU cell is implemented, use it by setting `--use_gru=True` when training.
- Features can be extracted through [inceptionV4](https://arxiv.org/abs/1602.07261), if so, model.ctx_dim in `model.py` needs to be set to (64, 1536). (other modifications are needed)
- Issues are welcome!
## Resources
- [Show, attend and tell slides](http://www.slideshare.net/eunjileee/show-attend-and-tell-neural-image-caption-generation-with-visual-attention)
- [Attention Mechanism Blog Post](https://blog.heuritech.com/2016/01/20/attention-mechanism/)