{"id":24784504,"url":"https://github.com/roysti10/image_captioning","last_synced_at":"2025-09-01T08:10:26.889Z","repository":{"id":67609038,"uuid":"260450340","full_name":"roysti10/Image_Captioning","owner":"roysti10","description":"Image Captioning using Encoder Decoder network , Pretrained models given","archived":false,"fork":false,"pushed_at":"2020-12-27T05:32:29.000Z","size":11295,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-24T07:55:46.023Z","etag":null,"topics":["checkpoints","encoder-decoder-model","flickr8k","image-captioning","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/roysti10.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-05-01T12:09:11.000Z","updated_at":"2020-12-27T05:32:32.000Z","dependencies_parsed_at":null,"dependency_job_id":"eb010ecd-4fac-469d-a558-4f80bbc52dd7","html_url":"https://github.com/roysti10/Image_Captioning","commit_stats":null,"previous_names":["roysti10/image_captioning"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/roysti10/Image_Captioning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roysti10%2FImage_Captioning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roysti10%2FImage_Captioning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roysti10%2FImage_Captioning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roysti10%2FImage_Captioning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/roysti10","download_url":"https://codeload.github.com/roysti10/Image_Captioning/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roysti10%2FImage_Captioning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273093531,"owners_count":25044437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["checkpoints","encoder-decoder-model","flickr8k","image-captioning","tensorflow"],"created_at":"2025-01-29T13:15:04.408Z","updated_at":"2025-09-01T08:10:26.860Z","avatar_url":"https://github.com/roysti10.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Image Captioning\n\n## Dataset Preparation\n* Clone this repsoitory using \n  ```bash \n  git clone https://github.com/lucasace/Image_Captioning.git \n  ```\n* Download the Flickr8k Image and Text dataset from [here](https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip) and [here](https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip) respectively\n* Unzip both the dataset and text files and place it inside the repository folder\n\n## I want to train the model\nTo train the model simply run\n```bash\npython3 main.py --type train --checkpoint_dir \u003ccheckpointdir\u003e --cnnmodel \u003ccnnmodel\u003e --image_folder \u003cimagefolder location\u003e --caption_file \u003clocation to token.txt\u003e --feature_extraction \u003cTrue or False\u003e\n```\n* The checkpoint dir is the place where your model checkpoints are going to be saved.\n* cnnmodel is either inception or vgg16,default is inception\n* imagefolder is location of the folder with all the images\n* caption_file is Location to 'Flickr8k.token.txt'\n* feature_extraction - True or False,default is True\n  * True if you havent extracted the image features\n  * False if you have already extracted the image features\n  This saves time and memory when training again \n * batch_size batch_size of training and validation default is 128\n \n ## Testing the model\n ```bash\npython3 main.py --type test --checkpoint_dir \u003ccheckpointdir\u003e --cnnmodel \u003ccnnmodel\u003e --image_folder \u003cimagefolder location\u003e --caption_file \u003clocation to token,txt\u003e --feature_extraction \u003cTrue or False\u003e\n```\n* Download the checkpoints from [here](https://drive.google.com/drive/u/1/folders/1-VJXewV_Da9TNLrNpwORY5EY0_slxT1g) if your cnn_model is inception ,if your cnn_model is vgg 16 download from [here](https://drive.google.com/drive/u/1/folders/1o020lkAFADNs_4vGJKAxGl_-NP41VHyN) or you can use your own trained checkpoints\n* All arguments are same as in training model\n \n ## I just want to caption\n \n ```bash\n python3 main.py --type caption --checkpoint_dir \u003ccheckpointdir\u003e --cnnmodel \u003ccnnmodel\u003e --caption_file \u003clocation to token,txt\u003e --to_caption \u003cimage file path to caption\u003e\n ```\n * Download the checkpoints from [here](https://drive.google.com/drive/u/1/folders/1-VJXewV_Da9TNLrNpwORY5EY0_slxT1g)\n    * Note these are inception checkpoints and for vgg16 download from [here](https://drive.google.com/drive/u/1/folders/1o020lkAFADNs_4vGJKAxGl_-NP41VHyN) \n * captionfile is required to make the vocabulary\n \n ## Custom dataset\n  if you want to train it on a custom dataset kindly make changes in the dataset.py folder to make it suitable for your dataset\n  \n ## Results\n |Model Type|CNN_Model|Bleu_1|Bleu_2|Bleu_3|Bleu_4|Meteor|\n | --- | --- | --- | --- | --- | --- | --- |\n |Encoder-Decoder|Inception_V3|60.12|51.1|48.13|39.5|25.8|\n | |VGG16|58.46|49.87 |47.50|39.37|26.32|\n \n Here are some of the results:\n * ![1](results/baseball.png)\n * ![2](results/index.png)\n * ![3](results/dogfrisbee.png)\n \n ## Things to Do\n - [ ] beam search\n - [ ] Image Captioning using Soft and Hard Attention\n - [ ] Image Captioning using Adversarial Training\n \n ## Contributions\n\n Any contributions are welcome\n \n If there is any issue with the model or errors in the program, feel free to raise a issue or set up a PR.\n \n ## References\n * O. Vinyals, A. Toshev, S. Bengio and D. Erhan, \"Show and tell: A neural image caption generator,\" 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3156-3164, doi: 10.1109/CVPR.2015.7298935.\n * Tensorflow documentation on Image Captioning\n * [Machine Learning Mastery](https://machinelearningmastery.com/develop-a-deep-learning-caption-generation-model-in-python/) for dataset\n * nltk documentation for meteor score\n * [RNN lecture by Standford University](https://www.youtube.com/watch?v=6niqTuYFZLQ\u0026t=1731s)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froysti10%2Fimage_captioning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froysti10%2Fimage_captioning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froysti10%2Fimage_captioning/lists"}