Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/li-xirong/flickr8kcn
A bilingual dataset for image captioning
https://github.com/li-xirong/flickr8kcn
Last synced: 3 months ago
JSON representation
A bilingual dataset for image captioning
- Host: GitHub
- URL: https://github.com/li-xirong/flickr8kcn
- Owner: li-xirong
- Created: 2018-05-09T15:31:39.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-10-28T13:52:03.000Z (about 4 years ago)
- Last Synced: 2024-08-01T02:24:40.003Z (6 months ago)
- Homepage:
- Size: 5.03 MB
- Stars: 17
- Watchers: 2
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-self-supervised-multimodal-learning - Link - xirong/flickr8kcn)| (Summary of Common Multimodal Datasets / Image-Text Datasets)
README
# Flickr8K-CN
Flickr8K-CN is a bilingual (English-to-Chinese) extension of the popular [Flickr8K](http://nlp.cs.illinois.edu/HockenmaierGroup/Framing_Image_Description/KCCA.html) set, used for evaluating image captioning in a cross-lingual setting.
| Chinese sentences | Flickr8k-train | Flickr8k-val | Flickr8k-test |
| -----:| -----:| -----:| -----:|
| human written | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| human translation | :x: | :x: | :white_check_mark: |
| machine translation (baidu) | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| machine translation (google) | :white_check_mark: | :white_check_mark: | :white_check_mark: |## Data
### Sentences
1. [Original English sentences](data/flickr8kenc.caption.txt)
2. [Chinese sentences written by native Chinese speakers](data/flickr8kzhc.caption.txt)
3. Chinese sentences generated by Baidu translation ([icmr2016 version](data/flickr8kzhb.caption.txt), [version 20160815](data/flickr8kzhb.caption.txt.v20160815))
4. Chinese sentences generated by Google translation ([icmr2016 version](data/flickr8kzhg.caption.txt), [version 20160816](data/flickr8kzhg.caption.txt.v20160816))
5. [Chinese sentences generated by human translation](data/flickr8kzhmtest.captions.txt) (only the test set is covered)### Dataset split
* imageids of [6K training images](data/flickr8ktrain.txt), [1k validation images](data/flickr8kval.txt), [1k test images](data/flickr8ktest.txt)
### Image features
1. [1,024-dim GoogleNet pool5](http://lixirong.net/data/icmr2016/flickr8k-pygooglenet-pool5_7x7_s1.tar.gz), read by [bigfile.py](https://github.com/li-xirong/jingwei/blob/master/util/simpleknn/bigfile.py)
## Citations
1. Xirong Li, Weiyu Lan, Jianfeng Dong, Hailong Liu, [Adding Chinese Captions to Images](icmr2016_chisent.pdf), ACM ICMR 2016