{"id":13807169,"url":"https://github.com/neural-nuts/image-caption-generator","last_synced_at":"2025-05-14T00:31:06.110Z","repository":{"id":215185959,"uuid":"72290258","full_name":"neural-nuts/image-caption-generator","owner":"neural-nuts","description":"[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow","archived":true,"fork":false,"pushed_at":"2019-12-01T22:30:20.000Z","size":10112,"stargazers_count":148,"open_issues_count":12,"forks_count":57,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-11-18T23:52:13.735Z","etag":null,"topics":["artificial-intelligence","captioning-images","computer-vision","convolutional-neural-networks","image-captioning","lstm","lstm-neural-networks","natural-language-generation","neural-network","recurrent-neural-networks","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neural-nuts.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-10-29T14:41:47.000Z","updated_at":"2024-11-12T08:42:44.000Z","dependencies_parsed_at":"2024-01-07T10:52:51.517Z","dependency_job_id":null,"html_url":"https://github.com/neural-nuts/image-caption-generator","commit_stats":null,"previous_names":["neural-nuts/image-caption-generator"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neural-nuts%2Fimage-caption-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neural-nuts%2Fimage-caption-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neural-nuts%2Fimage-caption-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neural-nuts%2Fimage-caption-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neural-nuts","download_url":"https://codeload.github.com/neural-nuts/image-caption-generator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254046251,"owners_count":22005563,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","captioning-images","computer-vision","convolutional-neural-networks","image-captioning","lstm","lstm-neural-networks","natural-language-generation","neural-network","recurrent-neural-networks","tensorflow"],"created_at":"2024-08-04T01:01:21.935Z","updated_at":"2025-05-14T00:31:01.096Z","avatar_url":"https://github.com/neural-nuts.png","language":"Jupyter Notebook","funding_links":[],"categories":["Neural Natural Language Generation"],"sub_categories":[],"readme":"# [Deprecated] Image Caption Generator\n\n**Notice: This project uses an older version of TensorFlow, and is no longer supported. Please consider using other latest alternatives.**\n\nA Neural Network based generative model for captioning images.\n\n## Checkout the android app made using this image-captioning-model: [Cam2Caption](https://github.com/neural-nuts/Cam2Caption) and [the associated paper](http://ieeexplore.ieee.org/document/8272660/).\n\n### Work in Progress\n\n###### Updates(Jan 14, 2018):\n1. Some Code Refactoring.\n2. Added MSCOCO dataset support.\n\n###### Updates(Mar 12, 2017):\n1. Added Dropout Layer for LSTM, Xavier Glorot Initializer for Weights\n2. Significant Optimizations for Caption Generation i.e Decode Routine, computation time reduce from 3 seconds to 0.2 seconds\n3. Functionality to Freeze Graphs and Merge them.\n4. Direct Serving(Dual Graph and Single Graph) Routines in /util/\n5. Explored and chose the fastest and most efficient Image Preprocessing Method.\n5. Ported code to TensorFlow r1.0\n\n###### Updates(Feb 27, 2017):\n1. Added BLEU evaluation metric and batch processing of images to produce batches of captions.\n\n###### Updates(Feb 25, 2017):\n1. Added optimizations and one-time pre-processing of Flickr30K data\n2. Changed to a faster Image Preprocessing method using OpenCV\n\n###### To-Do(Open for Contribution):\n1. FIFO-queues in training\n2. Attention-Model\n3. Trained Models for Distribution.\n\n## Pre-Requisites:\n1. Tensorflow r1.0\n2. NLTK\n3. pandas\n4. Download Flickr30K OR MSCOCO images and captions.\n5. Download Pre-Trained InceptionV4 Tensorflow graph from DeepDetect available [here](https://deepdetect.com/models/tf/inception_v4.pb)\n\n## Procedure to Train and Generate Captions:\n1. Clone the Repository to preserve Directory Structure\n2. For flickr30k put results_20130124.token and Flickr30K images in flickr30k-images folder OR For MSCOCO put captions_val2014.json and MSCOCO images in COCO-images folder .\n3. Put inception_v4.pb in ConvNets folder\n4. Generate features(features.npy) corresponding to the images in the dataset folder by running-\n    - For Flickr30K: `python convfeatures.py --data_path Dataset/flickr30k-images --inception_path ConvNets/inception_v4.pb`\n    - For MSCOCO: `python convfeatures.py --data_path Dataset/COCO-images --inception_path ConvNets/inception_v4.pb`\n3. To Train the model run-\n    - For Flickr30K: `python main.py --mode train --caption_path ./Dataset/results_20130124.token --feature_path ./Dataset/features.npy --resume`\n    - For MSCOCO: `python main.py --mode train --caption_path ./Dataset/captions_val2014.json --feature_path ./Dataset/features.npy --data_is_coco --resume`\n4. To Generate Captions for an Image run\n    - `python main.py --mode test --image_path VALID_PATH`\n5. For usage as a python library see [Demo.ipynb](https://github.com/neural-nuts/image-caption-generator/blob/master/Demo.ipynb)\n\n(see `python main.py -h` for more)\n\n## Miscellaneous Notes:\n\n### Freezing the encoder and decoder Graphs\n1. It's necessary to save both encoder and decoder graphs while running test. This is a one-time necessary run before freezing the encoder/decoder.\n    - `python main.py --mode test --image_path ANY_TEST_IMAGE.jpg/png --saveencoder --savedecoder`\n2. In the project root directory use - `python utils/save_graph.py --mode encoder --model_folder model/Encoder/` additionally you may want to use `--read_file` if you want to freeze the encoder for directly generating caption for an image file(path). Similarly, for decoder use - `python utils/save_graph.py --mode decoder --model_folder model/Decoder/`, read_file argument is not necessary for the decoder.\n3. To use frozen encoder and decoder models as dual blackbox [Serve-DualProtoBuf.ipynb](https://github.com/neural-nuts/image-caption-generator/blob/master/utils/Serve-DualProtoBuf.ipynb). Note: You must freeze encoder graph with --read_file to run this notebook\n\n(see `python utils/save_graph.py -h` for more)\n\n### Merging the encoder and decoder graphs for serving the model as a blackbox:\n1. It's necessary to freeze the encoder and decoder as mentioned above.\n2. In the project root directory run-\n    - `python utils/merge_graphs.py --encpb ./model/Trained_Graphs/encoder_frozen_model.pb --decpb ./model/Trained_Graphs/decoder_frozen_model.pb` additionally you may want to use `--read_file` if you want to freeze the encoder for directly generating caption for an image file(path).\n3. To use merged encoder and decoder models as single frozen blackbox: [Serve-SingleProtoBuf.ipynb](https://github.com/neural-nuts/image-caption-generator/blob/master/utils/Serve-SingleProtoBuf.ipynb). Note: You must freeze and merge encoder graph with --read_file to run this notebook\n\n(see `python utils/merge_graphs.py -h` for more)\n\n### Training Steps vs Loss Graph in Tensorboard:\n1. `tensorboard --logdir model/log_dir`\n2. Navigate to `localhost:6006`\n\n## Citation:\n\nIf you use our model or code in your research, please cite the paper:\n\n```\n@article{Mathur2017,\n  title={Camera2Caption: A Real-time Image Caption Generator},\n  author={Pranay Mathur and Aman Gill and Aayush Yadav and Anurag Mishra and Nand Kumar Bansode},\n  journal={IEEE Conference Publication},\n  year={2017}\n}\n```\n\n## Reference:\nShow and Tell: A Neural Image Caption Generator\n\n-Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan\n\n## License:\nProtected Under BSD-3 Clause License.\n\n## Some Examples:\n\n![Alt text](/Images/gen_3126981064.jpg)\n![Alt text](/Images/gen_7148046575.jpg)\n![Alt text](/Images/gen_suitselfie.png)\n![Alt text](/Images/gen_6.png)\n![Alt text](/Images/gen_7526599338.jpg)\n![Alt text](/Images/gen_4013421575.jpg)\n![Alt text](/Images/gen_football.png)\n![Alt text](/Images/gen_plane.png)\n![Alt text](/Images/gen_comp.png)\n![Alt text](/Images/gen_womanbeach.png)\n![Alt text](/Images/gen_102617084.jpg)\n![Alt text](/Images/gen_2230458748.jpg)\n![Alt text](/Images/gen_7125476937.jpg)\n![Alt text](/Images/gen_4752984291.jpg)\n![Alt text](/Images/gen_cat2.png)\n![Alt text](/Images/gen_283252248.jpg)\n![Alt text](/Images/gen_3920626767.jpg)\n![Alt text](/Images/gen_manlaptop.png)\n![Alt text](/Images/gen_2461372011.jpg)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneural-nuts%2Fimage-caption-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneural-nuts%2Fimage-caption-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneural-nuts%2Fimage-caption-generator/lists"}