https://github.com/donydchen/image-caption-cpp
A data driven query expansion approach for image caption, implemented in cpp
https://github.com/donydchen/image-caption-cpp
bleu cpp image-captioning
Last synced: 1 day ago
JSON representation
A data driven query expansion approach for image caption, implemented in cpp
- Host: GitHub
- URL: https://github.com/donydchen/image-caption-cpp
- Owner: donydchen
- Created: 2016-03-07T10:10:20.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2017-10-07T09:56:11.000Z (over 8 years ago)
- Last Synced: 2025-02-23T03:25:26.991Z (over 1 year ago)
- Topics: bleu, cpp, image-captioning
- Language: C++
- Homepage:
- Size: 670 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Image Caption
Automatically generating describing information for images is a fundamental problem for artificial intelligence connected with computer vision and natural language processing. The research is with great practical significance, for instance, to help visually impaired people to have a better understanding of the content of the images on web.
This project is the implementation of a data driven approach, used to automatically generate English captions for given images. The core algorithm of this project is mainly implemented by C++, while the image features are extracted by Caffe and the sentence vectors are extracted by Word2Vec.

## Algorithm Details
This project implements a data driven query expansion approach. And it can be roughly desribed by the following three steps:
* It begins with **nearest neighbor algorithm**, which is to find a set of nearest images in the training set that contains images as well as related captions.
* **Sentence vectors** are extracted from the captions of the nearest images by employing Word2Vec, and then averaged to generate a sentence vector for the input image.
* The generated vector is later used to **rerank** the captions of the K nearest neighbors and finally the nearest caption is **borrowed** as the generated caption for the given image.
The main process is illustrated as follow:

## Auxiliary Module
This project also provide two auxiliary modules, which should be useful for some other similar image caption projects. They are both self-contained and can be used independently.
* **BLEU**: It is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Check out the subfolder `bleu/` for more details.
* **Webpage Autogenerator**: It is a tool that can be used to generate a beautiful webpage based on the given images, captions and BLEU scores. Check out the subfolder `result/` for more details.
## Results
### Quantized Data

### Presentation Webpage

## How to Run
Run the following commands
```sh
bash setup.sh # download some required files
bash run.sh # compile the project
./demo # auto generate captions, run BLEU test and generate presentation webpage.
```
After it all finishes, the final result should be presented in `result/index.html`
## References
1. V. Ordonez, G. Kulkarni, and T. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011. 1, 2, 3.
2. Devlin Jacob, Gupta Saurabh, Girshick Ross, Mitchell Margaret, and Zitnick C Lawrence. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467, 2015b.
3. Mao Junhua, Xu Wei, Yang Yi, Wang Jiang, and Yuille Alan. Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv:1412.6632 [cs.CV], 2014. 12.