{"id":18888935,"url":"https://github.com/naver/artemis","last_synced_at":"2025-04-14T23:22:42.233Z","repository":{"id":59594475,"uuid":"484346740","full_name":"naver/artemis","owner":"naver","description":"Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)","archived":false,"fork":false,"pushed_at":"2023-02-09T16:23:26.000Z","size":1319,"stargazers_count":48,"open_issues_count":0,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-28T11:21:17.311Z","etag":null,"topics":["image-retrieval","multimodal-deep-learning","multimodal-retrieval"],"latest_commit_sha":null,"homepage":"https://europe.naverlabs.com/research/computer-vision/artemis","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/naver.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-22T07:58:40.000Z","updated_at":"2025-02-16T09:39:16.000Z","dependencies_parsed_at":"2024-11-08T07:46:51.439Z","dependency_job_id":"c2e18313-5403-4cfe-a317-5e997ccabbb3","html_url":"https://github.com/naver/artemis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver%2Fartemis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver%2Fartemis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver%2Fartemis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naver%2Fartemis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/naver","download_url":"https://codeload.github.com/naver/artemis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248975884,"owners_count":21192290,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image-retrieval","multimodal-deep-learning","multimodal-retrieval"],"created_at":"2024-11-08T07:46:40.328Z","updated_at":"2025-04-14T23:22:42.192Z","avatar_url":"https://github.com/naver.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ARTEMIS code release\n\nThis repository contains the code release of ARTEMIS, from our paper: \n\n[**ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity**](https://openreview.net/pdf?id=CVfLvQq9gLo) \nGinger Delmas, Rafael Sampaio de Rezende, Gabriela Csurka, Diane Larlus,\nICLR 2022.\n[\\[Project page\\]](https://europe.naverlabs.com/research/computer-vision/artemis)\n\nIf this code and/or paper is useful in your research, please cite:\n\n```bibtex\n@inproceedings{delmas2022artemis,\n  title={ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity},\n  author={Delmas, Ginger and Rezende, Rafael S and Csurka, Gabriela and Larlus, Diane},\n  booktitle={International Conference on Learning Representations},\n  year={2022}\n}\n```\n\n## License\nThis code is distributed under the CC BY-NC-SA 4.0 License. See [LICENSE](LICENSE) for more information.\n\n## The task\nWe address the problem of *image search with free-form text modifiers*, which consists in ranking a collection of images by relevance with respect to a bi-modal query: a *reference image* and a *modifier text*.\n\n![Task](figures/task_illustration.png)\n\n## Our method\n\nCurrent approaches typically combine the features of each of the two elements of the query into a single representation, which can then be compared to the ones of the potential target image.\\[1,2,3\\]\n\nDeparting from these strategies, we draw inspiration from two fields related to the task, *cross-modal* and *visual search*, and we advocate for a combination of the two components of the query taking into account their relationships with the target image. Each field is represented by an independent module in ARTEMIS. Our **Explicit Matching (EM)** module measures the compatibility of potential target images with the textual requirements and our **Implicit Similarity (IS)** module considers the relevance of the target images with respect to the properties of the reference image implied by the textual modifier. Both modules are trained jointly by a contrastive loss.\n\n![Artemis](figures/Artemis_model.png)\n\n## Preparations\n\n### Environment\n\nCreate the environment for running our code as follow:\n\n```\nconda create --name artemis python=3.8.12\nconda activate artemis\npip install -r requirements.txt\n```\n\n**Note:** using cuda version 10.2 (please modify `requirements.txt` otherwise)\n\n### ⚙️ Configuration\n\nYou need to modify the values of some variables in `config.py` to adapt it to your system and preferences:\n- `MAIN_DIR`: where to store data \u0026 results (default root for vocabulary files, model checkpoints, ranking files, heatmaps...); it should also be defined at the beginning of `release_script.sh`. This is default to the main directory of this code repository (ie. the parent directory of `config.py`).\n- `TORCH_HOME`, `GLOVE_DIR`: where ImageNet's pretrained models (resnet50/resnet18) weights and GloVe vectors (`glove.840B.300d.txt.pt`) are stored, locally on your machine\n- `\u003cdataset\u003e_IMAGE_DIR` (with `\u003cdataset\u003e` one of `FASHIONIQ`|`SHOES`|`CIRR`|`FASHION200K`): where to find the images of the different datasets (see next section)\n- `\u003cdataset\u003e_ANNOTATION_DIR` (idem): where to find the annotations (basically triplet \u0026 split information) for the different datasets.\n\n### :open_file_folder: Datasets\n\nAfter downloading the datasets you wish to use at your preferred location, please register associated paths in `config.py`.\n\n**FashionIQ** [4]: Please refer to the [FashionIQ repo](https://github.com/XiaoxiaoGuo/fashion-iq) for the list of image URLs, train-test-val splits (/image_splits folder) and train/validation candidate-captions-target triplets (/captions folder). \n\nAs of time of writing, as many image URL are broken, a link to the FashionIQ images can be found [here,](https://github.com/XiaoxiaoGuo/fashion-iq/issues/18) or at the [CosMo repo.](https://github.com/postBG/CosMo.pytorch#arrows_counterclockwise-update-dec-8th-2021)\nWe note that we do not have the license to release the FashionIQ test set annotations. Please contact the FashionIQ authors to obtain them.\n\n**Shoes** [5]: Download the images of [6] at their [webpage](http://tamaraberg.com/attributesDataset/index.html) and the annotations of [5] from the corresponding [repo](https://github.com/XiaoxiaoGuo/fashion-retrieval/tree/master/dataset). Our code assumes Shoes data is in a similar format as FashionIQ. Please run the following to run our code with Shoes data: `python prepare_data/prepare_shoes_data.py`.\n\n**CIRR** [7]: Please check out the [CIRR repo](https://github.com/Cuberick-Orion/CIRR#download-cirr-dataset) for instructions. Notice that most of the raw images are no longer available due to broken links. We follow the author's instructions and use the pre-extracted ResNet152 features (trained on ImageNet) as a replacement to the images.\n\nFor an evaluation on test split CIRR, our code produces a .json file compatible to the instructions of [Test-split Server on CIRR Dataset](https://github.com/Cuberick-Orion/CIRR/blob/main/Test-split_server.md)\n\n**Fashion200K** [8]: Please check out the [Fashion200K repo](https://github.com/xthan/fashion-200k) for instructions on how to download the images and the [TIRG repo](https://github.com/google/tirg#fashion200k-dataset) for downloading their generated test queries.\n\n### 📕 Vocabularies\n\nAfter downloading the annotations of a dataset, you can compute the corresponding vocabulary by running:\n\n```\npython vocab.py --data_name \u003cdataset\u003e\n```\n\nYou should obtain the following vocabulary size for the different datasets:\n- FashionIQ: 3775\n- Shoes: 1330\n- CIRR: 7101\n- Fashion200K: 4963\n\n## 📊 Evaluation\n\n```\nsh ./scripts/release_script.sh eval \u003cmodel\u003e \u003cdataset\u003e\n```\n- `\u003cmodel\u003e`: can be selected from `ARTEMIS` | `TIRG` | `cross-modal` | `visual-search` | `late-fusion` | `EM-only` | `IS-only`.\n- `\u003cdataset\u003e`: can be selected from `fashionIQ` | `shoes` | `cirr` | `fashion200k`.\n\nFor CIRR, our evaluation script also produces json files compatible with the dataset's [evaluation server.](https://cirr.cecs.anu.edu.au/)\n\n![barplot](figures/bar_plot.png)\n\n## 🚅 Train your own ARTEMIS model\n\nSimilarly to the evaluation code, simply run:\n\n```\nsh ./scripts/release_script.sh train \u003cmodel\u003e \u003cdataset\u003e\n```\n\n## 🔥🗺️ Generate heatmaps \n\nRun the following to generate 3 heatmaps per coefficient of each score (EM \u0026 IS), for 5 data examples yielding good recall results (you can optionally change this setting through some global parameters defined at the top of `generate_heatmaps.py`):\n\n```\nsh ./scripts/release_script.sh heatmaps \u003cmodel\u003e \u003cdataset\u003e\n```\n\nThis will produce several images and a metadata text file, in a specific directory created for each considered data example. Specifically:\n- `EM_coeff_\u003ccoeff_index\u003e_on_trg_heatmap.jpg` corresponds to the heatmap for the `\u003ccoeff_index\u003e`-th EM coefficient, shown on the target image (the EM score does not involve the reference image).\n- `IS_coeff_\u003ccoeff_index\u003e_on_\u003cimage\u003e_heatmap.jpg` corresponds to the heatmap for the `\u003ccoeff_index\u003e`-th IS coefficient, shown on the `\u003cimage\u003e` image (`\u003cimage\u003e` takes each one of `\"trg\"`|`\"src\"`, for the target or the reference image respectively).\n- `metadata.txt` has one row for each of the selected relevant coefficients, with the following format:\n  \n  `\u003cdata_example_index\u003e*\u003cIS|EM\u003e_coeff_\u003ccoeff_index\u003e*\u003cscore_contribution_value\u003e`\n  eg. `421*IS_coeff_472*0.0815` or `421*EM_coeff_37*0.0226` for data example 421.\n\n  and a last row giving the identifiers for the involved images and the corresponding modifier text:\n  `\u003cdata_example_index\u003e*\u003cmodifier_text\u003e*\u003creference_image_identifier\u003e*\u003ctarget_image_identifier\u003e`\n  eg. `421*have laces, not a Velcro closure*womens/womens_athletic_shoes/1/img_womens_athletic_shoes_1343.jpg*womens/womens_athletic_shoes/0/img_womens_athletic_shoes_971.jpg`\n\n![heatmaps](figures/heatmaps_example.png)\n\n## 💯 Efficiency study\n\nAdditionally run:\n```\npip install ptflops\n```\n\nThen:\n```\nsh ./scripts/release_script.sh ptflops \u003cmodel\u003e fashionIQ\n```\n\nThis script will output the number of GMacs (for a given fixed input) and the number of trainable parameters of `\u003cmodel\u003e`, similar to what we present in Table 7 of our ICLR paper.\n\n## 📜 References\n\n[1] Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays. [Composing text and image for image retrieval-an empirical odyssey.](https://arxiv.org/abs/1812.07119) CVPR 2019. \n\n[2] Yanbei Chen, Shaogang Gong, and Loris Bazzani. [Image search with text feedback by visiolinguistic attention learning.](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Image_Search_With_Text_Feedback_by_Visiolinguistic_Attention_Learning_CVPR_2020_paper.pdf) CVPR 2020. \n\n[3] Seungmin Lee, Dongwan Kim, and Bohyung Han. [Cosmo: Content-style modulation for image retrieval with text feedback.](https://openaccess.thecvf.com/content/CVPR2021/papers/Lee_CoSMo_Content-Style_Modulation_for_Image_Retrieval_With_Text_Feedback_CVPR_2021_paper.pdf) CVPR 2021. \n\n[4] Hui Wu, Yupeng Gao, Xiaoxiao Guo, Ziad Al-Halah, Steven Rennie, Kristen Grauman, and Rogerio Feris. [Fashion IQ: A new dataset towards retrieving images by natural language feedback.](https://openaccess.thecvf.com/content/CVPR2021/papers/Wu_Fashion_IQ_A_New_Dataset_Towards_Retrieving_Images_by_Natural_CVPR_2021_paper.pdf) CVPR 2021. \n\n[5] Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, and Rogerio Feris. [Dialog-based interactive image retrieval.](https://proceedings.neurips.cc/paper/2018/file/a01a0380ca3c61428c26a231f0e49a09-Paper.pdf) NeurIPS 2018. \n\n[6] Tamara L. Berg, Alexander C. Berg, Jonathan Shih. [Automatic Attribute Discovery and Characterization from Noisy Web Images.](http://tamaraberg.com/papers/attributediscovery.pdf)  ECCV, 2010\n\n[7] Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, and Stephen Gould. [Image retrieval on real-life images with pre-trained vision-and-language models.](https://arxiv.org/abs/2108.04024) ICCV 2021. \n\n[8] Xintong Han, Zuxuan Wu, Phoenix X Huang, Xiao Zhang, Menglong Zhu, Yuan Li, Yang Zhao, and Larry S Davis. [Automatic spatially-aware fashion concept discovery.](https://arxiv.org/pdf/1708.01311.pdf) ICCV 2017.\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaver%2Fartemis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnaver%2Fartemis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaver%2Fartemis/lists"}