Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/saicoco/mxnet_image_caption
mxnet image caption(NIC)
https://github.com/saicoco/mxnet_image_caption
Last synced: 2 months ago
JSON representation
mxnet image caption(NIC)
- Host: GitHub
- URL: https://github.com/saicoco/mxnet_image_caption
- Owner: saicoco
- License: mit
- Created: 2017-03-05T13:13:32.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-04-17T14:32:53.000Z (over 7 years ago)
- Last Synced: 2024-08-01T22:41:40.427Z (5 months ago)
- Language: Python
- Size: 182 KB
- Stars: 9
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-MXNet - Neural Image Caption
README
## image caption generation
This is a simple implementaion of paper Neural [Image Caption][^1] based on mxnet.
Some codes refer [where-to-image](https://github.com/mtanti/where-image)### Usage
1. Prepare the datasets and pre_train params to dirs `datasets` and `pre_train`, here we use pretrain-model is vgg-16, datasets are Filckr8k, you could replace it with your datasets and pretrain_model. For Flickr8k, which includes images and captions, captions are store in dataset.json, looks like following:
```
{"images":
[
{"sentids": [0, 1, 2, 3, 4],
"imgid": 0,
"sentences": [
{"tokens": ["a", "black", "dog", "is", "running", "after", "a", "white", "dog", "in", "the", "snow"], "raw": "A black dog is running after a white dog in the snow .", "imgid": 0, "sentid": 0},
{"tokens": ["black", "dog", "chasing", "brown", "dog", "through", "snow"], "raw": "Black dog chasing brown dog through snow", "imgid": 0, "sentid": 1},
{"tokens": ["two", "dogs", "chase", "each", "other", "across", "the", "snowy", "ground"], "raw": "Two dogs chase each other across the snowy ground .", "imgid": 0, "sentid": 2},
{"tokens": ["two", "dogs", "play", "together", "in", "the", "snow"], "raw": "Two dogs play together in the snow .", "imgid": 0, "sentid": 3},
{"tokens": ["two", "dogs", "running", "through", "a", "low", "lying", "body", "of", "water"], "raw": "Two dogs running through a low lying body of water .", "imgid": 0, "sentid": 4}
],
"split": "train", "filename": "2513260012_03d33305cf.jpg"}, ...
],
"datasets":Flickr8k}
```
or you can download processed data from [here](http://cs.stanford.edu/people/karpathy/deepimagesent/), which image are extracted from vgg networks 4096-dim, and unzip them into dir 'datasets',
then copy file which in "old" dir into root dir, and run it, this is a old version about NIC.2. After data downloading completes, you can run:
```
python 1_preprocess_data.py
```
when it runs over, there will be a directory named "processed_data" which include train, val and test datasets which are splited by "split" key in dataset.json .3. `python 2_train_val.py` to train model on your dataset and save you dataset.
4. There are something wrong with test stage(predict), (variable length for sym, I think I should use `mx.mod.BuckingModule`), I am trying~~~~~~~~~~~, if you find the solution, welcome to
issue me.### Reference
[^1]: Vinyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3156-3164.