{"id":15663016,"url":"https://github.com/peri044/stt","last_synced_at":"2026-03-15T00:42:05.445Z","repository":{"id":202560998,"uuid":"163038945","full_name":"peri044/STT","owner":"peri044","description":"A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.","archived":false,"fork":false,"pushed_at":"2019-11-21T07:16:48.000Z","size":105,"stargazers_count":18,"open_issues_count":1,"forks_count":5,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-31T01:13:27.993Z","etag":null,"topics":["common-vector-space","cross-modal-retrieval","deep-learning","image-captioning","sentence-paraphrasing","sequence-to-sequence"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/peri044.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-12-25T02:39:39.000Z","updated_at":"2024-02-23T02:06:55.000Z","dependencies_parsed_at":null,"dependency_job_id":"754d8795-82b1-4a95-ac47-d56592750d34","html_url":"https://github.com/peri044/STT","commit_stats":null,"previous_names":["peri044/stt"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peri044%2FSTT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peri044%2FSTT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peri044%2FSTT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peri044%2FSTT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/peri044","download_url":"https://codeload.github.com/peri044/STT/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252611435,"owners_count":21776177,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["common-vector-space","cross-modal-retrieval","deep-learning","image-captioning","sentence-paraphrasing","sequence-to-sequence"],"created_at":"2024-10-03T13:35:14.693Z","updated_at":"2026-03-15T00:42:00.409Z","avatar_url":"https://github.com/peri044.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Show, Translate and Tell\r\n\r\nThis repo contains code for training and evaluation of a multi-task model which performs image captioning, cross modal retrieval and sentence paraphrasing.\r\nThe paper and results can be found at \u003ca href=\"https://arxiv.org/abs/1903.06275\"\u003e Show, Translate and Tell\u003c/a\u003e. This work has been accepted at \u003ca href=\"http://2019.ieeeicip.org/index.php\"\u003e ICIP 2019\u003c/a\u003e.\r\nThe proposed architecture is as shown in the figure\r\n![Alt text](figures/stt.PNG?raw=true)\r\n\r\n## Generate Data\r\nIn the data folder, you can find scripts for generating TF-records for mscoco dataset.\r\nUpdate (11/20/2019): `prepare_mscoco_pairs.py` is updated in the repo. It uses the `captions_train2014.json` annotations of MSCOCO to build paraphrases. This script should be used to generate `train_enc.txt` and `train_dec.txt` which are basically paraphrases.\r\nUsing 5 captions, it creates 20 permutations of paraphrases and writes them in the TF record (using coco_data_loader.py) along with the associated image. This script is not cleaned up and should only be used for reference (it might not be the final script that we used).\r\nCheckout command line arguments in the scripts for setting paths\r\n* To generate TF-records for MSCOCO\r\n```\r\npython -m data.coco_data_loader --num 10000\r\n```\r\nArgs:\r\n* `--num` : Number of images to be written in TF record. Do not specify this unless you want to generate a subset of entire dataset.\r\n\r\n* Generate TF-records with Image, predicted caption and groundtruth caption\r\n```\r\n\r\npython -m data.coco_data_loader --precompute \\\r\n                                --record_path para_att_pred.tfrecord \\\r\n                                --feature_path coco_precomp/testall_ims.npy \\\r\n                                --captions_path \u003cpath_to_coco_captions\u003e\r\n```\r\n\r\n* `feature_path`: This should be used if you have already extracted features for all the images. \r\nIn this case, a sample in TF record would look like (Feature (1x2048 dim vector for an image), caption A, caption B) which are paraphrases.\r\n\r\n## Training on COCO dataset\r\n\r\n```\r\nsh scripts/train_coco.sh\r\n```\r\n\r\n## Evaluation on COCO dataset\r\n\r\n```\r\nsh scripts/eval_coco.sh\r\n```\r\n\r\nIf you find this research or codebase useful in your experiments, consider citing\r\n\r\n```\r\n@article{stt2019,\r\n  title={Show, Translate and Tell},\r\n  author={Peri, Dheeraj and Sah, Shagan and Ptucha, Raymond},\r\n  journal={arXiv preprint arXiv:1903.06275},\r\n  year={2019}\r\n}\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperi044%2Fstt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fperi044%2Fstt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperi044%2Fstt/lists"}