{"id":13429640,"url":"https://github.com/KaiyangZhou/CoOp","last_synced_at":"2025-03-16T03:32:02.339Z","repository":{"id":37494663,"uuid":"402092616","full_name":"KaiyangZhou/CoOp","owner":"KaiyangZhou","description":"Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)","archived":false,"fork":false,"pushed_at":"2024-05-20T16:58:40.000Z","size":1442,"stargazers_count":1885,"open_issues_count":62,"forks_count":210,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-03-08T08:58:53.067Z","etag":null,"topics":["foundation-models","multimodal-learning","prompt-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KaiyangZhou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-01T14:33:35.000Z","updated_at":"2025-03-07T03:51:22.000Z","dependencies_parsed_at":"2024-11-20T11:19:30.205Z","dependency_job_id":"b7d32ee7-4d78-473e-8547-b24e5e059727","html_url":"https://github.com/KaiyangZhou/CoOp","commit_stats":{"total_commits":63,"total_committers":4,"mean_commits":15.75,"dds":0.2698412698412699,"last_synced_commit":"ff61507c790454bce7c5052c3ac39e60772f1f89"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KaiyangZhou%2FCoOp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KaiyangZhou%2FCoOp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KaiyangZhou%2FCoOp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KaiyangZhou%2FCoOp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KaiyangZhou","download_url":"https://codeload.github.com/KaiyangZhou/CoOp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243822309,"owners_count":20353496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["foundation-models","multimodal-learning","prompt-learning"],"created_at":"2024-07-31T02:00:42.966Z","updated_at":"2025-03-16T03:32:01.843Z","avatar_url":"https://github.com/KaiyangZhou.png","language":"Python","funding_links":[],"categories":["2 Foundation Models","Python","其他_机器视觉","\u003cspan id=\"head1\"\u003e *Keywords* \u003c/span\u003e"],"sub_categories":["2.3 Multimodal Foundation Models","网络服务_其他","[Prompt PEFT](#content)"],"readme":"# Prompt Learning for Vision-Language Models\n\nThis repo contains the codebase of a series of research projects focused on adapting vision-language models like [CLIP](https://arxiv.org/abs/2103.00020) to downstream datasets via *prompt learning*:\n\n* [Conditional Prompt Learning for Vision-Language Models](https://arxiv.org/abs/2203.05557), in CVPR, 2022.\n* [Learning to Prompt for Vision-Language Models](https://arxiv.org/abs/2109.01134), IJCV, 2022.\n\n## Updates\n\n- **07.10.2022**: Just added to both [CoOp](https://arxiv.org/abs/2109.01134) and [CoCoOp](https://arxiv.org/abs/2203.05557) (in their appendices) the results on the newly proposed DOSCO (DOmain Shift in COntext) benchmark, which focuses on contextual domain shift and covers a diverse set of classification problems. (The paper about DOSCO is [here](https://arxiv.org/abs/2209.07521) and the code for running CoOp/CoCoOp on DOSCO is [here](https://github.com/KaiyangZhou/on-device-dg).)\n\n- **17.09.2022**: [Call for Papers](https://kaiyangzhou.github.io/assets/cfp_ijcv_lvms.html): IJCV Special Issue on *The Promises and Dangers of Large Vision Models*.\n\n- **16.07.2022**: CoOp has been accepted to IJCV for publication!\n\n- **10.06.2022**: Our latest work, [Neural Prompt Search](https://arxiv.org/abs/2206.04673), has just been released on arxiv. It provides a novel perspective for fine-tuning large vision models like [ViT](https://arxiv.org/abs/2010.11929), so please check it out if you're interested in parameter-efficient fine-tuning/transfer learning. The code is also made public [here](https://github.com/Davidzhangyuanhan/NOAH).\n\n- **08.06.2022**: If you're looking for the code to draw the few-shot performance curves (like the ones we show in the CoOp's paper), see `draw_curves.py`.\n\n- **09.04.2022**: The pre-trained weights of CoOp on ImageNet are released [here](#pre-trained-models).\n\n- **11.03.2022**: The code of our CVPR'22 paper, \"[Conditional Prompt Learning for Vision-Language Models](https://arxiv.org/abs/2203.05557),\" is released.\n\n- **15.10.2021**: We find that the `best_val` model and the `last_step` model achieve similar performance, so we set `TEST.FINAL_MODEL = \"last_step\"` for all datasets to save training time. Why we used `best_val`: the ([tiny](https://github.com/KaiyangZhou/CoOp/blob/main/datasets/oxford_pets.py#L32)) validation set was designed for the linear probe approach, which requires extensive tuning for its hyperparameters, so we used the `best_val` model for CoOp as well for fair comparison (in this way, both approaches have access to the validation set).\n\n- **09.10.2021**: Important changes are made to Dassl's transforms.py. Please pull the latest commits from https://github.com/KaiyangZhou/Dassl.pytorch and this repo to make sure the code works properly. In particular, 1) `center_crop` now becomes a default transform in testing (applied after resizing the smaller edge to a certain size to keep the image aspect ratio), and 2) for training, `Resize(cfg.INPUT.SIZE)` is deactivated when `random_crop` or `random_resized_crop` is used. Please read this [issue](https://github.com/KaiyangZhou/CoOp/issues/8) on how these changes might affect the performance.\n\n- **18.09.2021**: We have fixed an error in Dassl which could cause a training data loader to have zero length (so no training will be performed) when the dataset size is smaller than the batch size (due to `drop_last=True`). Please pull the latest commit for Dassl (\u003e= `8eecc3c`). This error led to lower results for CoOp in EuroSAT's 1- and 2-shot settings (others are all correct). We will update the paper on arxiv to fix this error.\n\n## How to Install\nThis code is built on top of the awesome toolbox [Dassl.pytorch](https://github.com/KaiyangZhou/Dassl.pytorch) so you need to install the `dassl` environment first. Simply follow the instructions described [here](https://github.com/KaiyangZhou/Dassl.pytorch#installation) to install `dassl` as well as PyTorch. After that, run `pip install -r requirements.txt` under `CoOp/` to install a few more packages required by [CLIP](https://github.com/openai/CLIP) (this should be done when `dassl` is activated). Then, you are ready to go.\n\nFollow [DATASETS.md](DATASETS.md) to install the datasets.\n\n## How to Run\n\nClick a paper below to see the detailed instructions on how to run the code to reproduce the results.\n\n* [Learning to Prompt for Vision-Language Models](COOP.md)\n* [Conditional Prompt Learning for Vision-Language Models](COCOOP.md)\n\n## Models and Results\n\n- The pre-trained weights of CoOp (both M=16 \u0026 M=4) on ImageNet based on RN50, RN101, ViT-B/16 and ViT-B/32 can be downloaded altogether via this [link](https://drive.google.com/file/d/18ypxfd82RR0pizc5MM1ZWDYDk4j0BtPF/view?usp=sharing). The weights can be used to reproduce the results in Table 1 of CoOp's paper (i.e., the results on ImageNet and its four variants with domain shift). To load the weights and run the evaluation code, you will need to specify `--model-dir` and `--load-epoch` (see this [script](https://github.com/KaiyangZhou/CoOp/blob/main/scripts/eval.sh) for example).\n- The raw numerical results can be found at this [google drive link](https://docs.google.com/spreadsheets/d/12_kaFdD0nct9aUIrDoreY0qDunQ9q9tv/edit?usp=sharing\u0026ouid=100312610418109826457\u0026rtpof=true\u0026sd=true).\n\n## Citation\nIf you use this code in your research, please kindly cite the following papers\n\n```bash\n@inproceedings{zhou2022cocoop,\n    title={Conditional Prompt Learning for Vision-Language Models},\n    author={Zhou, Kaiyang and Yang, Jingkang and Loy, Chen Change and Liu, Ziwei},\n    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    year={2022}\n}\n\n@article{zhou2022coop,\n    title={Learning to Prompt for Vision-Language Models},\n    author={Zhou, Kaiyang and Yang, Jingkang and Loy, Chen Change and Liu, Ziwei},\n    journal={International Journal of Computer Vision (IJCV)},\n    year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKaiyangZhou%2FCoOp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FKaiyangZhou%2FCoOp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKaiyangZhou%2FCoOp/lists"}