{"id":13626885,"url":"https://github.com/layumi/Image-Text-Embedding","last_synced_at":"2025-04-16T19:30:53.283Z","repository":{"id":40625910,"uuid":"111048530","full_name":"layumi/Image-Text-Embedding","owner":"layumi","description":"TOMM2020 Dual-Path Convolutional Image-Text Embedding with Instance Loss  :feet:  https://arxiv.org/abs/1711.05535","archived":false,"fork":false,"pushed_at":"2025-01-12T05:59:17.000Z","size":6317,"stargazers_count":290,"open_issues_count":11,"forks_count":73,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-04-12T08:34:33.293Z","etag":null,"topics":["bidirectional-retrieval","cross-modal-retrieval","cross-modality","image-retrieval","image-search","language-retrieval","matconvnet","matlab","person-reidentification","visual-semantic"],"latest_commit_sha":null,"homepage":"","language":"MATLAB","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/layumi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-17T02:39:22.000Z","updated_at":"2025-04-07T07:29:33.000Z","dependencies_parsed_at":"2022-09-21T06:06:52.594Z","dependency_job_id":"5e35644c-8698-4ec7-bdfe-7c1f3e630543","html_url":"https://github.com/layumi/Image-Text-Embedding","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/layumi%2FImage-Text-Embedding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/layumi%2FImage-Text-Embedding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/layumi%2FImage-Text-Embedding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/layumi%2FImage-Text-Embedding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/layumi","download_url":"https://codeload.github.com/layumi/Image-Text-Embedding/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249268547,"owners_count":21240940,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bidirectional-retrieval","cross-modal-retrieval","cross-modality","image-retrieval","image-search","language-retrieval","matconvnet","matlab","person-reidentification","visual-semantic"],"created_at":"2024-08-01T22:00:24.236Z","updated_at":"2025-04-16T19:30:53.276Z","avatar_url":"https://github.com/layumi.png","language":"MATLAB","readme":"# Dual-Path Convolutional Image-Text Embedding with Instance Loss\n\n[[Paper]](https://arxiv.org/abs/1711.05535) [[Slide]](http://zdzheng.xyz/files/ZhedongZheng_CA_Talk_DualPath.pdf) :arrow_left: **I recommend to check this slide first.** :arrow_left:\n\nThis repository contains the code for our paper [Dual-Path Convolutional Image-Text Embedding](https://arxiv.org/abs/1711.05535). Thank you for your kindly attention. \n\n### Some News\n- Instance Loss (Pytorch version) is now available at https://github.com/layumi/Person_reID_baseline_pytorch/blob/master/instance_loss.py  \n\n**5 Sep 2021** I love the sentence that 'Define yourself via tell what you are different from others' (exemplar SVM), which also is the spirit of the instance loss. \n\n**11 June 2020** People live in the 3D world. We release one new person re-id code [Person Re-identification in the 3D Space](https://github.com/layumi/person-reid-3d), which conduct representation learning in the 3D space. You are welcomed to check out it.\n\n**30 April 2020** We have won the [AICity Challenge 2020](https://www.aicitychallenge.org/) in CVPR 2020,  yielding the 1st Place Submission to the retrieval track :red_car:. Check out [here](https://github.com/layumi/AICIty-reID-2020).\n\n**01 March 2020** We release one new image retrieval dataset, called [University-1652](https://github.com/layumi/University1652-Baseline), for drone-view target localization and drone navigation :helicopter:. It has a similar setting with the person re-ID. You are welcomed to check out it.\n\n![](http://zdzheng.xyz/images/fulls/ConvVSE.jpg)\n\n![](https://github.com/layumi/Image-Text-Embedding/blob/master/CUHK-show.jpg)\n\n**What's New**: We updated the paper to the second version, adding more illustration about the mechanism of the proposed instance loss.\n\n# Install Matconvnet\nI have included my Matconvnet in this repo, so you do not need to download it again.You just need to uncomment and modify some lines in gpu_compile.m and run it in Matlab. Try it~ (The code does not support cudnn 6.0. You may just turn off the Enablecudnn or try cudnn5.1)\n\nIf you fail in compilation, you may refer to http://www.vlfeat.org/matconvnet/install/\n\n# Prepocess Datasets\n1. Extract wrod2vec weights. Follow the instruction in `./word2vector_matlab`;\n\n2. Prepocess the dataset. Follow the instruction in `./dataset`. You can choose one dataset to run.\nThree datasets need different prepocessing. I write the instruction for [Flickr30k](https://github.com/layumi/Image-Text-Embedding/tree/master/dataset/Flickr30k-prepare), [MSCOCO](https://github.com/layumi/Image-Text-Embedding/tree/master/dataset/MSCOCO-prepare) and [CUHK-PEDES](https://github.com/layumi/Image-Text-Embedding/tree/master/dataset/CUHK-PEDES-prepare).\n\n3. Download the model pre-trained on ImageNet. And put the model into './data'.\n```\n(bash) wget http://www.vlfeat.org/matconvnet/models/imagenet-resnet-50-dag.mat\n```\nAlternatively, you may try [VGG16](http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-16.mat) or [VGG19](http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-19.mat). \n\nYou may have a different split with me. (Sorry, this is my fault. I used a random split.) Just for a backup, this is the [dictionary archive](https://drive.google.com/open?id=1Yp6B5GKhgQTD9bsmvmVkvxt-SnmHHjVA) used in the paper.\n\n# Trained Model\nYou may download the three trained models from ~~[GoogleDrive](https://drive.google.com/open?id=1QxIdJd3oQIJSVVlAxaIZquOoLQahMrWH)~~ [new GoogleDrive](https://drive.google.com/file/d/165nn3Q07nRPGaLqPCCxPI-K5Vi9yNhYu/view?usp=share_link).\n\n# Train\n* For Flickr30k, run `train_flickr_word2_1_pool.m` for **Stage I** training.\n\nRun `train_flickr_word_Rankloss_shift_hard` for **Stage II** training.\n\n* For MSCOCO, run `train_coco_word2_1_pool.m` for **Stage I** training.\n\nRun `train_coco_Rankloss_shift_hard.m` for **Stage II** training.\n\n* For CUHK-PEDES, run `train_cuhk_word2_1_pool.m` for **Stage I** training.\n\nRun `train_cuhk_word_Rankloss_shift` for **Stage II** training.\n\n# Test\nSelect one model and have fun!\n\n* For Flickr30k, run `test/extract_pic_feature_word2_plus_52.m` and to extract the feature from image and text. Note that you need to change the model path in the code. \n\n* For MSCOCO, run `test_coco/extract_pic_feature_word2_plus.m` and to extract the feature from image and text. Note that you need to change the model path in the code. \n\n* For CUHK-PEDES, run `test_cuhk/extract_pic_feature_word2_plus_52.m` and to extract the feature from image and text. Note that you need to change the model path in the code. \n\n\n### CheckList\n- [x] Get word2vec weight\n\n- [x] Data Preparation (Flickr30k)\n- [x] Train on Flickr30k\n- [x] Test on Flickr30k\n\n- [x] Data Preparation (MSCOCO)\n- [x] Train on MSCOCO\n- [x] Test on MSCOCO\n\n- [x] Data Preparation (CUHK-PEDES)\n- [x] Train on CUHK-PEDES\n- [x] Test on CUHK-PEDES\n\n- [ ] Run the code on another machine \n\n### Citation\n```bibtex\n@article{zheng2017dual,\n  title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},\n  author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},\n  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},\n  doi={10.1145/3383184},\n  note={\\mbox{doi}:\\url{10.1145/3383184}},\n  volume={16},\n  number={2},\n  pages={1--23},\n  year={2020},\n  publisher={ACM New York, NY, USA}\n}\n```\n","funding_links":[],"categories":["Uncategorized","Related Work","Codes"],"sub_categories":["Uncategorized","Show the retrieved Top-10 result"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flayumi%2FImage-Text-Embedding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flayumi%2FImage-Text-Embedding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flayumi%2FImage-Text-Embedding/lists"}