{"id":13563187,"url":"https://github.com/miccunifi/SEARLE","last_synced_at":"2025-04-03T19:32:45.375Z","repository":{"id":176715400,"uuid":"618385049","full_name":"miccunifi/SEARLE","owner":"miccunifi","description":"[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion ","archived":false,"fork":false,"pushed_at":"2024-05-07T15:53:29.000Z","size":21076,"stargazers_count":166,"open_issues_count":0,"forks_count":9,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-02-10T12:23:30.701Z","etag":null,"topics":["circo","cirr","clip","composed-image-retrieval","fashion-iq","knowledge-distillation","multimodal-learning","pytorch","textual-inversion"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miccunifi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-24T11:00:11.000Z","updated_at":"2025-01-10T01:49:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"94d6025d-3bbf-4e01-a112-59eca8e34435","html_url":"https://github.com/miccunifi/SEARLE","commit_stats":null,"previous_names":["miccunifi/searle"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miccunifi%2FSEARLE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miccunifi%2FSEARLE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miccunifi%2FSEARLE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miccunifi%2FSEARLE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miccunifi","download_url":"https://codeload.github.com/miccunifi/SEARLE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247065458,"owners_count":20877784,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["circo","cirr","clip","composed-image-retrieval","fashion-iq","knowledge-distillation","multimodal-learning","pytorch","textual-inversion"],"created_at":"2024-08-01T13:01:16.147Z","updated_at":"2025-04-03T19:32:40.355Z","avatar_url":"https://github.com/miccunifi.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# SEARLE (ICCV 2023)\n\n### Zero-shot Composed Image Retrieval With Textual Inversion\n\n[![arXiv](https://img.shields.io/badge/ICCV2023-Paper-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2303.15247)\n[![Generic badge](https://img.shields.io/badge/Demo-Link-blue.svg)](https://circo.micc.unifi.it/demo)\n[![Generic badge](https://img.shields.io/badge/Video-YouTube-red.svg)](https://www.youtube.com/watch?v=qxpNb9qxDQI)\n[![Generic badge](https://img.shields.io/badge/Slides-Link-orange.svg)](/assets/Slides.pptx)\n[![Generic badge](https://img.shields.io/badge/Poster-Link-purple.svg)](/assets/Poster.pdf)\n[![GitHub Stars](https://img.shields.io/github/stars/miccunifi/SEARLE?style=social)](https://github.com/miccunifi/SEARLE)\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/isearle-improving-textual-inversion-for-zero/zero-shot-composed-image-retrieval-zs-cir-on-2)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-2?p=isearle-improving-textual-inversion-for-zero)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/isearle-improving-textual-inversion-for-zero/zero-shot-composed-image-retrieval-zs-cir-on-1)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-1?p=isearle-improving-textual-inversion-for-zero)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/isearle-improving-textual-inversion-for-zero/zero-shot-composed-image-retrieval-zs-cir-on)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on?p=isearle-improving-textual-inversion-for-zero)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/isearle-improving-textual-inversion-for-zero/zero-shot-composed-image-retrieval-zs-cir-on-6)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-6?p=isearle-improving-textual-inversion-for-zero)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/isearle-improving-textual-inversion-for-zero/zero-shot-composed-image-retrieval-zs-cir-on-4)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-4?p=isearle-improving-textual-inversion-for-zero)\n\n\n🔥🔥 **[2024/05/07] The extended version of our ICCV 2023 paper is now public: [iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval\n](https://arxiv.org/abs/2405.02951). The code will be released upon acceptance.**\n\nThis is the **official repository** of the [**ICCV 2023 paper**](https://arxiv.org/abs/2303.15247) \"*Zero-Shot Composed\nImage Retrieval with Textual Inversion*\" and its [**extended version**](https://arxiv.org/abs/2405.02951) \"*iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval*\".\n \n\u003e You are currently viewing the code and model repository. If you are looking for more information about the\n\u003e newly-proposed dataset **CIRCO** see the [repository](https://github.com/miccunifi/CIRCO).\n\n## Overview\n\n### Abstract\n\nComposed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a\nrelative caption that describes the difference between the two images. The high effort and cost required for labeling\ndatasets for CIR hamper the widespread usage of existing methods, as they rely on supervised learning. In this work, we\npropose a new task, Zero-Shot CIR (ZS-CIR), that aims to address CIR without requiring a labeled training dataset. Our\napproach, named zero-Shot composEd imAge Retrieval with textuaL invErsion (SEARLE), maps the visual features of the\nreference image into a pseudo-word token in CLIP token embedding space and integrates it with the relative caption. To\nsupport research on ZS-CIR, we introduce an open-domain benchmarking dataset named Composed Image Retrieval on Common\nObjects in context (CIRCO), which is the first dataset for CIR containing multiple ground truths for each query. The\nexperiments show that SEARLE exhibits better performance than the baselines on the two main datasets for CIR tasks,\nFashionIQ and CIRR, and on the proposed CIRCO.\n\n![](assets/intro.png \"Workflow of the method\")\n\nWorkflow of our method. *Top*: in the pre-training phase, we generate pseudo-word tokens of unlabeled images with an\noptimization-based textual inversion and then distill their knowledge to a textual inversion network. *Bottom*: at\ninference time on ZS-CIR, we map the reference image to a pseudo-word $S_*$ and concatenate it with the relative\ncaption. Then, we use CLIP text encoder to perform text-to-image retrieval.\n\n## Citation\n```bibtex\n@article{agnolucci2024isearle,\n  title={iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval}, \n  author={Agnolucci, Lorenzo and Baldrati, Alberto and Bertini, Marco and Del Bimbo, Alberto},\n  journal={arXiv preprint arXiv:2405.02951},\n  year={2024},\n}\n```\n\n```bibtex\n@inproceedings{baldrati2023zero,\n  title={Zero-Shot Composed Image Retrieval with Textual Inversion},\n  author={Baldrati, Alberto and Agnolucci, Lorenzo and Bertini, Marco and Del Bimbo, Alberto},\n  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},\n  pages={15338--15347},\n  year={2023}\n}\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ch2\u003eGetting Started\u003c/h2\u003e\u003c/summary\u003e\n\nWe recommend using the [**Anaconda**](https://www.anaconda.com/) package manager to avoid dependency/reproducibility\nproblems.\nFor Linux systems, you can find a conda installation\nguide [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html).\n\n### Installation\n\n1. Clone the repository\n\n```sh\ngit clone https://github.com/miccunifi/SEARLE\n```\n\n2. Install Python dependencies\n\n```sh\nconda create -n searle -y python=3.8\nconda activate searle\nconda install -y -c pytorch pytorch=1.11.0 torchvision=0.12.0\npip install comet-ml==3.33.6 transformers==4.24.0 tqdm pandas==1.4.2\npip install git+https://github.com/openai/CLIP.git\n```\n\n### Data Preparation\n\n#### FashionIQ\n\nDownload the FashionIQ dataset following the instructions in\nthe [**official repository**](https://github.com/XiaoxiaoGuo/fashion-iq).\n\nAfter downloading the dataset, ensure that the folder structure matches the following:\n\n```\n├── FashionIQ\n│   ├── captions\n|   |   ├── cap.dress.[train | val | test].json\n|   |   ├── cap.toptee.[train | val | test].json\n|   |   ├── cap.shirt.[train | val | test].json\n\n│   ├── image_splits\n|   |   ├── split.dress.[train | val | test].json\n|   |   ├── split.toptee.[train | val | test].json\n|   |   ├── split.shirt.[train | val | test].json\n\n│   ├── images\n|   |   ├── [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]\n```\n\n#### CIRR\n\nDownload the CIRR dataset following the instructions in the [**official repository**](https://github.com/Cuberick-Orion/CIRR).\n\nAfter downloading the dataset, ensure that the folder structure matches the following:\n\n```\n├── CIRR\n│   ├── train\n|   |   ├── [0 | 1 | 2 | ...]\n|   |   |   ├── [train-10108-0-img0.png | train-10108-0-img1.png | ...]\n\n│   ├── dev\n|   |   ├── [dev-0-0-img0.png | dev-0-0-img1.png | ...]\n\n│   ├── test1\n|   |   ├── [test1-0-0-img0.png | test1-0-0-img1.png | ...]\n\n│   ├── cirr\n|   |   ├── captions\n|   |   |   ├── cap.rc2.[train | val | test1].json\n|   |   ├── image_splits\n|   |   |   ├── split.rc2.[train | val | test1].json\n```\n\n#### CIRCO\n\nDownload the CIRCO dataset following the instructions in the [**official repository**](https://github.com/miccunifi/CIRCO).\n\nAfter downloading the dataset, ensure that the folder structure matches the following:\n\n```\n├── CIRCO\n│   ├── annotations\n|   |   ├── [val | test].json\n\n│   ├── COCO2017_unlabeled\n|   |   ├── annotations\n|   |   |   ├──  image_info_unlabeled2017.json\n|   |   ├── unlabeled2017\n|   |   |   ├── [000000243611.jpg | 000000535009.jpg | ...]\n```\n\n#### ImageNet\n\nDownload ImageNet1K (ILSVRC2012) test set following the instructions in\nthe [**official site**](https://image-net.org/index.php).\n\nAfter downloading the dataset, ensure that the folder structure matches the following:\n\n```\n├── ImageNet1K\n│   ├── test\n|   |   ├── [ILSVRC2012_test_[00000001 | ... | 00100000].JPEG]\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ch2\u003eSEARLE Inference with Pre-trained Models\u003c/h2\u003e\u003c/summary\u003e\n\n### Validation\n\nTo compute the metrics on the validation set of FashionIQ, CIRR or CIRCO using the SEARLE pre-trained models, simply run\nthe following command:\n\n```sh\npython src/validate.py --eval-type [searle | searle-xl] --dataset \u003cstr\u003e --dataset-path \u003cstr\u003e\n```\n\n```\n    --eval-type \u003cstr\u003e               if 'searle', uses the pre-trained SEARLE model to predict the pseudo tokens;\n                                    if 'searle-xl', uses the pre-trained SEARLE-XL model to predict the pseudo tokens, \n                                    options: ['searle', 'searle-xl']           \n    --dataset \u003cstr\u003e                 Dataset to use, options: ['fashioniq', 'cirr', 'circo']\n    --dataset-path \u003cstr\u003e            Path to the dataset root folder\n     \n    --preprocess-type \u003cstr\u003e         Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n```\nSince we release the pre-trained models via torch.hub, the models will be automatically downloaded when running the inference script.\n\nThe metrics will be printed on the screen.\n\n### Test\n\nTo generate the predictions file for uploading on the [CIRR Evaluation Server](https://cirr.cecs.anu.edu.au/) or\nthe [CIRCO Evaluation Server](https://circo.micc.unifi.it/) using the SEARLE pre-trained models,\nplease execute the following command:\n\n```sh\npython src/generate_test_submission.py --submission-name \u003cstr\u003e  --eval-type [searle | searle-xl] --dataset \u003cstr\u003e --dataset-path \u003cstr\u003e\n```\n\n```\n    --submission-name \u003cstr\u003e         Name of the submission file\n    --eval-type \u003cstr\u003e               if 'searle', uses the pre-trained SEARLE model to predict the pseudo tokens;\n                                    if 'searle-xl', uses the pre-trained SEARLE-XL model to predict the pseudo tokens, \n                                    options: ['searle', 'searle-xl']           \n    --dataset \u003cstr\u003e                 Dataset to use, options: ['cirr', 'circo']\n    --dataset-path \u003cstr\u003e            Path to the dataset root folder\n    \n    --preprocess-type \u003cstr\u003e         Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n```\nSince we release the pre-trained models via torch.hub, the models will be automatically downloaded when running the inference script.\n\nThe predictions file will be saved in the `data/test_submissions/{dataset}/` folder.\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ch2\u003eSEARLE Minimal Working Example\u003c/h2\u003e\u003c/summary\u003e\n\n```python\nimport torch\nimport clip\nfrom PIL import Image\n\n# set device\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\nimage_path = \"path to image to invert\"  # TODO change with your image path\nclip_model_name = \"ViT-B/32\"  # use ViT-L/14 for SEARLE-XL\n\n# load SEARLE model and custom text encoding function\nsearle, encode_with_pseudo_tokens = torch.hub.load(repo_or_dir='miccunifi/SEARLE', source='github', model='searle',\n                                                   backbone=clip_model_name)\nsearle.to(device)\n\n# load CLIP model and preprocessing function\nclip_model, preprocess = clip.load(clip_model_name)\n\n# NOTE: the preprocessing function used to train SEARLE is different from the standard CLIP preprocessing function. Here,\n# we use the standard one for simplicity, but if you want to reproduce the results of the paper you should use the one\n# provided in the SEARLE repository (named targetpad)\n\n# preprocess image and extract image features\nimage = preprocess(Image.open(image_path)).unsqueeze(0).to(device)\nimage_features = clip_model.encode_image(image).float()\n\n# use SEARLE to predict the pseudo tokens\nextimated_tokens = searle(image_features.to(device))\n\n# define a prompt (you can use any prompt you want as long as it contains the $ token)\nprompt = \"a photo of $\"  # The $ is a special token that will be replaced with the pseudo tokens\n\n# encode the prompt with the pseudo tokens\ntokenized_prompt = clip.tokenize([prompt]).to(device)\ntext_features = encode_with_pseudo_tokens(clip_model, tokenized_prompt, extimated_tokens)\n\n# compute similarity\nsimilarity = (100.0 * torch.cosine_similarity(image_features, text_features))\nprint(f\"similarity: {similarity.item():.2f}%\")\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ch2\u003eSEARLE\u003c/h2\u003e\u003c/summary\u003e\n\nThis section provides instructions for reproducing the results of the SEARLE method.\nIt covers the steps to train the textual inversion network and perform inference using the trained model.\n\n### 0. GPT phrases generation\n\nTo perform both the optimization-based textual inversion and the training of the textual inversion network phi, we need to generate\na set of phrases for each concept in the dictionary. The concepts are taken from\nthe [Open Images V7 dataset](https://storage.googleapis.com/openimages/web/index.html).\n\nRun the following command to generate the phrases:\n\n```sh\npython src/gpt_phrases_generation.py\n```\n\n```\n    --exp-name \u003cint\u003e                Name of the experiment (default=\"GPTNeo27B\")\n    --gpt-model \u003cstr\u003e               GPT model to use (default=\"EleutherAI/gpt-neo-2.7B\")\n    --max-length \u003cint\u003e              Maximum length of the generated phrases (default=35)\n    --num-return-sequences \u003cint\u003e    Number of generated phrases for each concept (default=256)\n    --temperature \u003cfloat\u003e           Temperature of the sampling (default=0.5)\n    --no-repeat-ngram-size \u003cint\u003e    Size of the n-gram to avoid repetitions (default=2)\n    --resume-experiment\u003cstore true\u003e Resume the experiment if it exists (default=false)\n```\n\nSince the phrase generation process can be time-consuming, you can download the pre-generated phrases used in our\nexperiments [**here**](https://github.com/miccunifi/SEARLE/releases/download/weights/GPTNeo27B.zip). After downloading, unzip the file in the `data/GPT_phrases` folder so that the\nfolder structure matches the following: `data/GPT_phrases/GPTNeo27B/concept_to_phrases.pkl`\n\n### 1. Image concepts association\n\nWe associate to each image a set of textual concepts taken from\nthe [Open Images V7 dataset](https://storage.googleapis.com/openimages/web/index.html).\n\nRun the following command to associate concepts with the images:\n\n```sh\npython src/image_concepts_association.py --clip-model-name \u003cstr\u003e --dataset imagenet --dataset-path \u003cstr\u003e --split test\n```\n\n```\n    --clip-model-name \u003cstr\u003e        CLIP model to use, e.g 'ViT-B/32', 'ViT-L/14'\n    --dataset-path \u003cstr\u003e           Path to the ImageNet root folder\n    --batch-size \u003cint\u003e             Batch size (default=32)\n    --num-workers \u003cint\u003e            Number of workers (default=8)\n    --preprocess-type \u003cstr\u003e        Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n```\n\nThe associations will be saved in a CSV file located in the `data/similar_concept/imagenet/test` folder.\n\n### 2. Optimization-based Textual Inversion (OTI)\n\nPerform the Optimization-based Textual Inversion on the ImageNet test set.\n\nRun the following command to perform OTI:\n\n```sh\npython src/oti_inversion.py --exp-name \u003cstr\u003e --clip-model-name \u003cstr\u003e --dataset imagenet --dataset-path \u003cstr\u003e --split test  \n```\n\n```\n    --exp-name \u003cstr\u003e                Name of the OTI experiment\n    --clip-model-name \u003cstr\u003e         CLIP model to use, e.g 'ViT-B/32', 'ViT-L/14'\n    --dataset-path \u003cstr\u003e            Path to the ImageNet root folder\n    --gpt-exp-name \u003cstr\u003e            Name of the GPT generation phrases experiment (should be the same as --exp-name in step 0)\n                                    (default=GPTNeo27B)\n    --learning-rate \u003cfloat\u003e         Learning rate (default=2e-2)\n    --weight-decay \u003cfloat\u003e          Weight decay (default=0.01)\n    --batch-size \u003cint\u003e              Batch size (default=32)\n    --preprocess-type \u003cstr\u003e         Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n    --top-k \u003cint\u003e                   Number of concepts associated to each image (default=15)\n    --oti-steps \u003cint\u003e               Number of steps for OTI (default=350)\n    --lambda_gpt \u003cfloat\u003e            Weight of the GPT loss (default=0.5)\n    --lambda_cos \u003cfloat\u003e            Weight of the cosine loss (default=1)\n    --ema-decay \u003cfloat\u003e             Decay for the exponential moving average (default=0.99)\n    --save-frequency \u003cint\u003e          Saving frequency expressed in batches (default=10)\n    --resume-experiment\u003cstore true\u003e Resume the experiment if it exists (default=false)\n    --seed \u003cint\u003e                    Seed for the random number generator (default=42)\n```\n\nThe OTI pre-inverted-tokens will be saved in the `data/oti_pseudo_tokens/imagenet/test/{exp_name}` folder.\n\n### 3. Textual Inversion Network Training\n\nFinally, train the Textual Inversion Network by distilling the knowledge from the OTI pre-inverted-tokens.\n\nIt is recommended to have a properly initialized Comet.ml account to have better logging of the metrics\n(nevertheless, all the metrics will also be logged on a csv file).\n\nTo train the Textual Inversion Network, run the following command:\n\n```sh\npython src/train_phi.py --exp-name \u003cstr\u003e --clip-model-name \u003cstr\u003e --imagenet-dataset-path \u003cstr\u003e --cirr-dataset-path \u003cstr\u003e --oti-exp-name \u003cstr\u003e --save-training \n```\n\n```\n    --exp-name \u003cstr\u003e                Name of the experiment\n    --clip-model-name \u003cstr\u003e         CLIP model to use, e.g 'ViT-B/32', 'ViT-L/14'\n    --imagenet-dataset-path \u003cstr\u003e   Path to the ImageNet dataset root folder\n    --cirr-dataset-path \u003cstr\u003e       Path to the CIRR dataset root folder\n    --oti-exp-name \u003cstr\u003e            Name of the ImageNet OTI tokens experiment (should be the same as --exp-name in step 2)\n    --gpt-exp-name \u003cstr\u003e            Name of the GPT generation phrases experiment (should be the same as --exp-name in step 0)\n                                    (default=GPTNeo27B)\n    --preprocess-type \u003cstr\u003e         Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n    --phi-dropout \u003cfloat\u003e           Dropout for the Textual Inversion Network (default=0.5)\n    --batch-size \u003cint\u003e              Phi training batch size (default=256)\n    --num-workers \u003cint\u003e             Number of workers (default=10)\n    --learning-rate \u003cfloat\u003e         Learning rate (default=1e-4)\n    --weight-decay \u003cfloat\u003e          Weight decay (default=0.01)\n    --num-epochs \u003cint\u003e              Number of epochs (default=100)\n    --lambda-distil \u003cfloat\u003e         Weight of the distillation loss (default=1)\n    --lambda-gpt \u003cfloat\u003e            Weight of the GPT loss (default=0.75)\n    --temperature \u003cfloat\u003e           Temperature for the distillation loss (default=0.25)\n    --validation-frequency \u003cint\u003e    Validation frequency expressed in epochs (default=1)\n    --save-frequency \u003cint\u003e          Saving frequency expressed in epochs (default=5)\n    --save-training \u003cstore_true\u003e    Whether save the model checkpoints or not\n    --top-k-concepts \u003cint\u003e          Number of concepts associated to each image (default=150)\n    --api-key \u003cstr\u003e                 API key for Comet (default=None)\n    --workspace \u003cstr\u003e               Workspace for Comet (default=None)\n    --seed \u003cint\u003e                    Seed for the random number generator (default=42)\n```\n\nThe Textual Inversion Network checkpoints will be saved in the `data/phi_models/{exp_name}` folder.\n\n### 4a. Val Set Evaluation\n\nTo evaluate the Textual Inversion Network on the validation sets, run the following command:\n\n```sh\npython src/validate.py --exp-name \u003cstr\u003e --eval-type phi --dataset \u003cstr\u003e --dataset-path \u003cstr\u003e --phi-checkpoint-name \u003cstr\u003e\n```\n\n```\n    --exp-name \u003cstr\u003e                Name of the experiment (should be the same as --exp-name in step 3)\n    --dataset \u003cstr\u003e                 Dataset to use, options: ['fashioniq', 'cirr', 'circo']\n    --dataset-path \u003cstr\u003e            Path to the dataset root folder\n    --phi-checkpoint-name \u003cstr\u003e     Name of the Textual Inversion Network checkpoint, e.g. 'phi_20.pt'   \n    --preprocess-type \u003cstr\u003e         Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n```\n\nThe metrics will be printed on the screen.\n\n### 4b. Test Set Evaluation\n\nTo generate the predictions file to be uploaded on the [CIRR Evaluation Server](https://cirr.cecs.anu.edu.au/) or on the\n[CIRCO Evaluation Server](https://circo.micc.unifi.it/) run the following command:\n\n```sh\npython src/generate_test_submission.py --submission-name \u003cstr\u003e --exp-name \u003cstr\u003e --eval-type phi --dataset \u003cstr\u003e --dataset-path \u003cstr\u003e --phi-checkpoint-name \u003cstr\u003e \n```\n\n```\n    --submission-name \u003cstr\u003e         Name of the submission file\n    --exp-name \u003cstr\u003e                Name of the experiment (should be the same as --exp-name in step 3)\n    --dataset \u003cstr\u003e                 Dataset to use, options: ['cirr', 'circo']\n    --dataset-path \u003cstr\u003e            Path to the dataset root folder\n    --phi-checkpoint-name \u003cstr\u003e     Name of the Textual Inversion Network checkpoint, e.g. 'phi_20.pt'   \n    --preprocess-type \u003cstr\u003e         Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n```\n\nThe predictions file will be saved in the `data/test_submissions/{dataset}/` folder.\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ch2\u003eSEARLE-OTI\u003c/h2\u003e\u003c/summary\u003e\n\nThis section provides instructions on reproducing the SEARLE-OTI experiments, which involve performing\nOptimization-based Textual Inversion (OTI) directly on the benchmark datasets.\n\n### 0. GPT phrases generation\n\nPlease refer to [step 0](https://github.com/miccunifi/SEARLE#0-gpt-phrases-generation) of the SEARLE section for the instructions on how to generate the GPT phrases.\n\n### 1. Image concepts association\n\nWe associate to each image a set of textual concepts taken from\nthe [Open Images V7 dataset](https://storage.googleapis.com/openimages/web/index.html)\n\nRun the following command to associate concepts with the images:\n\n```sh\npython src/image_concepts_association.py --clip-model-name \u003cstr\u003e --dataset \u003cstr\u003e --dataset-path \u003cstr\u003e --split \u003cstr\u003e --dataset-mode relative\n```\n\n```\n    --clip-model-name \u003cstr\u003e         CLIP model to use, e.g 'ViT-B/32', 'ViT-L/14'\n    --dataset \u003cstr\u003e                 Dataset to use, options: ['fashioniq', 'cirr', 'circo']\n    --dataset-path \u003cstr\u003e            Path to the dataset root folder\n    --split \u003cstr\u003e                   Dataset split to use, options: ['val', 'test']\n    --batch-size \u003cint\u003e              Batch size (default=32)\n    --num-workers \u003cint\u003e             Number of workers (default=8)\n    --preprocess-type               Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n```\n\nThe associations will be saved in a CSV file located in the `data/similar_concept/{dataset}/{split}` folder.\n\n### 2. Optimization-based Textual Inversion (OTI)\n\nPerform Optimization-based Textual Inversion on the benchmark datasets.\n\nRun the following command to perform OTI:\n\n```sh\npython src/oti_inversion.py --exp-name \u003cstr\u003e --clip-model-name \u003cstr\u003e --dataset \u003cstr\u003e --dataset-path \u003cstr\u003e --split \u003cstr\u003e   \n```\n\n```\n    --exp-name \u003cstr\u003e                Name of the OTI experiment\n    --clip-model-name \u003cstr\u003e         CLIP model to use, e.g 'ViT-B/32', 'ViT-L/14'\n    --dataset \u003cstr\u003e                 Dataset to use, options: ['fashioniq', 'cirr', 'circo']\n    --dataset-path \u003cstr\u003e            Path to the dataset root folder\n    --split \u003cstr\u003e                   Dataset split to use, options: in ['val', 'test']\n    --gpt-exp-name \u003cstr\u003e            Name of the GPT generation phrases experiment (should be the same as --exp-name in step 0)\n                                    (default=GPTNeo27B)\n    --learning-rate \u003cfloat\u003e         Learning rate (default=2e-2)\n    --weight-decay \u003cfloat\u003e          Weight decay (default=0.01)\n    --batch-size \u003cint\u003e              Batch size (default=32)\n    --preprocess-type \u003cstr\u003e         Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n    --top-k \u003cint\u003e                   Number of concepts associated to each image (default=15)\n    --oti-steps \u003cint\u003e               Number of steps for OTI (default=350)\n    --lambda_gpt \u003cfloat\u003e            Weight of the GPT loss (default=0.5)\n    --lambda_cos \u003cfloat\u003e            Weight of the cosine loss (default=1)\n    --ema-decay \u003cfloat\u003e             Decay for the exponential moving average (default=0.99)\n    --save-frequency \u003cint\u003e          Saving frequency expressed in batches (default=10)\n    --resume-experiment\u003cstore true\u003e Resume the experiment if it exists (default=false)\n    --seed \u003cint\u003e                    Seed for the random number generator (default=42)\n```\n\nThe OTI pre-inverted-tokens will be saved in the `data/oti_pseudo_tokens/{dataset}/{split}/{exp_name}` folder.\n\n### 3a. Validation Set Evaluation (split=val)\n\nTo evaluate the performance of the OTI pre-inverted tokens on the validation set, run the following command:\n\n```sh\npython src/validate.py --exp-name \u003cstr\u003e --eval-type oti --dataset \u003cstr\u003e --dataset-path \u003cstr\u003e \n```\n\n```\n    --exp-name \u003cstr\u003e                Name of the experiment (should be the same as --exp-name in step 2)\n    --dataset \u003cstr\u003e                 Dataset to use, options: ['fashioniq', 'cirr', 'circo']\n    --dataset-path \u003cstr\u003e            Path to the dataset root folder\n    --preprocess-type \u003cstr\u003e         Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n```\n\nThe metrics will be printed on the screen.\n\n### 3b. Test Set Evaluation (split=test)\n\nTo generate the predictions file for uploading on the [CIRR Evaluation Server](https://cirr.cecs.anu.edu.au/) or\nthe [CIRCO Evaluation Server](https://circo.micc.unifi.it/) using the OTI inverted tokens,\nplease execute the following command:\n\n```sh\npython src/generate_test_submission.py --submission-name \u003cstr\u003e --exp-name \u003cstr\u003e --eval-type oti --dataset \u003cstr\u003e --dataset-path \u003cstr\u003e\n```\n\n```\n    --submission-name \u003cstr\u003e         Name of the submission file\n    --exp-name \u003cstr\u003e                Name of the experiment (should be the same as --exp-name of step 2\n    --dataset \u003cstr\u003e                 Dataset to use, options: ['cirr', 'circo']\n    --dataset-path \u003cstr\u003e            Path to the dataset root folder\n    --preprocess-type \u003cstr\u003e         Preprocessing type, options: ['clip', 'targetpad'] (default=targetpad)\n```\n\nThe predictions file will be saved in the `data/test_submissions/{dataset}/` folder.\n\u003c/details\u003e\n\n## Authors\n\n* [**Alberto Baldrati**](https://scholar.google.com/citations?hl=en\u0026user=I1jaZecAAAAJ)**\\***\n* [**Lorenzo Agnolucci**](https://scholar.google.com/citations?user=hsCt4ZAAAAAJ\u0026hl=en)**\\***\n* [**Marco Bertini**](https://scholar.google.com/citations?user=SBm9ZpYAAAAJ\u0026hl=en)\n* [**Alberto Del Bimbo**](https://scholar.google.com/citations?user=bf2ZrFcAAAAJ\u0026hl=en)\n\n**\\*** Equal contribution. Author ordering was determined by coin flip.\n\n## Acknowledgements\n\nThis work was partially supported by the European Commission under European Horizon 2020 Programme, grant number\n101004545 - ReInHerit.\n\n## LICENSE\n\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc/4.0/\"\u003e\u003cimg alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc/4.0/88x31.png\" /\u003e\u003c/a\u003e\u003cbr /\u003eAll material is made available under [Creative Commons BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). You can **use, redistribute, and adapt** the material for **non-commercial purposes**, as long as you give appropriate credit by **citing our paper** and **indicate any changes** that you've made.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiccunifi%2FSEARLE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiccunifi%2FSEARLE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiccunifi%2FSEARLE/lists"}