https://github.com/markdtw/soft-attention-image-captioning

tensorflow implementation of show, attend and tell (ICML'15)
https://github.com/markdtw/soft-attention-image-captioning

image-captioning tensorflow

Last synced: 9 months ago
JSON representation

tensorflow implementation of show, attend and tell (ICML'15)

Host: GitHub
URL: https://github.com/markdtw/soft-attention-image-captioning
Owner: markdtw
Created: 2017-01-13T05:52:02.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2017-06-22T06:04:21.000Z (about 9 years ago)
Last Synced: 2025-05-05T17:32:39.093Z (about 1 year ago)
Topics: image-captioning, tensorflow
Language: Python
Homepage:
Size: 639 KB
Stars: 19
Watchers: 5
Forks: 11
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Soft Attention Image Captioning
Tensorflow implementation of [Show, Attend and Tell](https://arxiv.org/abs/1502.03044) presented in ICML'15.

Huge re-factor from last update, compatible with tensorflow >= r1.0

## Prerequisites
- Python 2.7+
- [NumPy](http://www.numpy.org/)
- [Tensorflow r1.0+](https://www.tensorflow.org/install/)
- [Scikit-image](http://scikit-image.org/)
- [tqdm](https://pypi.python.org/pypi/tqdm)

## Data
- Training: [Microsoft COCO: Common Objects in Context](http://mscoco.org/dataset/#download) training and validation set

## Preparation
1. Clone this repo, create `data/` and `log/` folders:
```bash
git clone https://github.com/markdtw/soft-attention-image-captioning.git
cd soft-attention-image-captioning
mkdir data
mkdir log
```
2. Download and extract pre-trained `Inception V4` and `VGG 19` [from tf.slim](https://github.com/tensorflow/models/tree/master/slim#pre-trained-models) for feature extraction.
Save the ckpt files in `cnns/` as `inception_v4_imagenet.ckpt` and `vgg_19_imagenet.ckpt`.

3. We need the following files in our `data/` folder:
- `coco_raw.json`
- `coco_processed.json`
- `coco_dictionary.pkl`
- `coco_final.json`
- `train2014_vgg(inception).npy` and `val2014_vgg(inception).npy`

These files can be generated through `utils.py`, please refer to it before executing.

4. If you are not able to extract the features yourself, [here](https://drive.google.com/open?id=0B5j6QKJb0ztbRXRQWW12ME9uSGs) is the features download link:
- It may take a long time to download.

## Train
Train from scratch with default settings:
```bash
python main.py --train
```
Train from a pre-trained model from epoch X:
```bash
python main.py --train --model_path=log/model.ckpt-X
```
Check out tunable arguments:
```bash
python main.py
```

## Generate a caption
Using default(latest) model:
```bash
python main.py --generate --img_path=/path/to/image.jpg
```
Using model from epoch X:
```bash
python main.py --generate --img_path=/path/to/image.jpg --model_path=log/model.ckpt-X
```

## Others
- Features extracted are around 16 + 8 GB. Make sure you have enough CPU memory when loading the data.
- GPU memory usage for batch_size 128 is around 8GB.
- Utilize `tf.while_loop` for rnn implementation, `tf.slim` for feature extraction from their [github page](https://github.com/tensorflow/models/tree/master/slim).
- GRU cell is implemented, use it by setting `--use_gru=True` when training.
- Features can be extracted through [inceptionV4](https://arxiv.org/abs/1602.07261), if so, model.ctx_dim in `model.py` needs to be set to (64, 1536). (other modifications are needed)
- Issues are welcome!

## Resources
- [Show, attend and tell slides](http://www.slideshare.net/eunjileee/show-attend-and-tell-neural-image-caption-generation-with-visual-attention)
- [Attention Mechanism Blog Post](https://blog.heuritech.com/2016/01/20/attention-mechanism/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/markdtw/soft-attention-image-captioning

Awesome Lists containing this project

README