{"id":19602898,"url":"https://github.com/wangleihitcs/imagecaptions","last_synced_at":"2026-05-16T13:03:28.244Z","repository":{"id":201599845,"uuid":"159754640","full_name":"wangleihitcs/ImageCaptions","owner":"wangleihitcs","description":"A base model for image captions.","archived":false,"fork":false,"pushed_at":"2019-04-27T14:58:55.000Z","size":100967,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-17T23:25:09.918Z","etag":null,"topics":["captioning","rnn-model","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wangleihitcs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-11-30T02:06:09.000Z","updated_at":"2023-08-13T08:06:03.000Z","dependencies_parsed_at":"2023-11-10T11:33:22.383Z","dependency_job_id":null,"html_url":"https://github.com/wangleihitcs/ImageCaptions","commit_stats":null,"previous_names":["wangleihitcs/imagecaptions"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/wangleihitcs/ImageCaptions","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangleihitcs%2FImageCaptions","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangleihitcs%2FImageCaptions/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangleihitcs%2FImageCaptions/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangleihitcs%2FImageCaptions/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wangleihitcs","download_url":"https://codeload.github.com/wangleihitcs/ImageCaptions/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangleihitcs%2FImageCaptions/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33103970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-16T04:41:52.686Z","status":"ssl_error","status_checked_at":"2026-05-16T04:41:52.009Z","response_time":115,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["captioning","rnn-model","tensorflow"],"created_at":"2024-11-11T09:26:52.142Z","updated_at":"2026-05-16T13:03:28.219Z","avatar_url":"https://github.com/wangleihitcs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"### ImageCaptions\nA base model for image captioning\n\n### Config\n- python 2.7\n- tensorflow 1.8.0\n- python package \n    * nltk\n    * PIL\n    * json\n    * numpy\n\nIt is all of common tookits, so I don't give their links.\n\n### DataDownload\n- coco image dataset\n    * you need to download [train2017.zip](http://images.cocodataset.org/zips/train2017.zip)\n    * then unzip it to dir 'data/train2017/'\n- coco image annotations\n    * you need to download [annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)\n    * then unzip it:\n        * copy 'captions_train2017.json' to dir 'data/coco_annotations'\n- pretrain inception model\n    * you need to download [inception_v3.ckpt](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz\n) to dir 'data/inception/'\n\n## Train\n#### First, get post proccess data\n- get 'data/captions.json', 'data/captions_gt.json'\n    ```shell\n    $ cd preproccess\n    $ python data_entry.py    \n    ```\n- get 'data/image_id_train.json', 'data/image_id_val.json', 'data/image_id_test.json'\n    ```shell\n    $ cd preproccess\n    $ python image_id_split.py    \n    ```\n- get 'data/vocabulary.json'\n    ```shell\n    $ cd preproccess\n    $ python vocabulary.py    \n    ```\n#### Second, get TFRecord files\nBecause dataset is too large, we should do some operations to purse speed and CPU|GPU efficiency.\nYou need to wait 30 mins to convert data to 'data/tfrecord/train-xx.tfrecord', I convert Train Data to 40 tfrecord files.\n* get 'data/tfrecord/train-00.tfrecord' - 'data/tfrecord/train-39.tfrecord'\n    ```shell\n    $ python datasets.py    \n    ```\n* so you need get 'data/tfrecord_name_train.json' for tensorflow filename queue, it is easy\n* the val dataset and test data is the same.\n    \n#### Third, let's go train\n```shell\n    $ python main.py    \n```\n\n## Experiments\nTrain/Val/Test Dataset, 82783/5000/5000, vocabulary size = 14643 and we not filter out word. We use greedy search not beam search.\n#### CNN+RNN\n|  | BLEU_1 | BLEU_2 | BLEU_3 | BLEU_4 | METEOR | ROUGE | CIDEr |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| Train Dataset | 0.7051 | 0.5322 | 0.3832 | 0.2682 | 0.2283 | 0.5128 | 0.7968 |\n| Val Dataset | 0.6667 | 0.4866 | 0.3405 | 0.2337 | 0.2096 | 0.4831 | 0.7024 |\n| Test Dataset | 0.6687 | 0.4879 | 0.3421 | 0.2364 | 0.2096 | 0.4838 | 0.6972 |\n| Paper | 0.666 | 0.461 | 0.329 | 0.246 | - | - | - |\n\ne.g. Show and Tell: A Neural Image Caption Generator, CVPR 2015([pdf](https://arxiv.org/pdf/1411.4555.pdf))\n\n#### CNN+RNN+Soft-Attention\n|  | BLEU_1 | BLEU_2 | BLEU_3 | BLEU_4 | METEOR | ROUGE | CIDEr |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| Val Dataset | 0.6467 | 0.4615 | 0.3180 | 0.2177 | 0.2014 | 0.4684 | 0.6310 |\n| Test Dataset | 0.6482 | 0.4638 | 0.3210 | 0.2217 | 0.2013 | 0.4633 | 0.6245 |\n| Paper | 0.707 | 0.492 | 0.344 | 0.243 | 0.2390 | - | - |\n\ne.g. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, ICML([pdf](https://arxiv.org/pdf/1502.03044.pdf))\n## Example\n![examples](data/examples/example1.png)\n\n## Summary\nThe model is very very*N simple, I never adjust the hyperparameter, so if you want, you could do.\n\n## References\n- [Tensorflow Model released im2text](https://github.com/tensorflow/models/tree/master/research/im2txt)\n- [An Implementation in Tensorflow of Guoming Wang](https://github.com/DeepRNN/image_captioning)\n- [MS COCO Caption Evaluation Tookit](https://github.com/tylin/coco-caption)\n- Vinyals, Oriol, et al. \"Show and tell: A neural image caption generator.\" Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.\n- Xu, Kelvin, et al. \"Show, attend and tell: Neural image caption generation with visual attention.\" International conference on machine learning. 2015.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangleihitcs%2Fimagecaptions","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwangleihitcs%2Fimagecaptions","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangleihitcs%2Fimagecaptions/lists"}