{"id":13442334,"url":"https://github.com/OpenGVLab/CaFo","last_synced_at":"2025-03-20T13:33:28.452Z","repository":{"id":108412923,"uuid":"608812881","full_name":"OpenGVLab/CaFo","owner":"OpenGVLab","description":"[CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners","archived":false,"fork":false,"pushed_at":"2023-06-01T20:27:41.000Z","size":7512,"stargazers_count":345,"open_issues_count":10,"forks_count":18,"subscribers_count":12,"default_branch":"main","last_synced_at":"2024-10-28T05:13:07.149Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenGVLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-02T19:40:39.000Z","updated_at":"2024-10-12T01:30:36.000Z","dependencies_parsed_at":"2024-01-16T02:46:33.286Z","dependency_job_id":"897056cd-2100-4de8-be10-ec8d29045025","html_url":"https://github.com/OpenGVLab/CaFo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FCaFo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FCaFo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FCaFo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FCaFo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenGVLab","download_url":"https://codeload.github.com/OpenGVLab/CaFo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244619285,"owners_count":20482393,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:44.478Z","updated_at":"2025-03-20T13:33:27.400Z","avatar_url":"https://github.com/OpenGVLab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Prompt, Generate, then Cache\n\nOfficial implementation of ['Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners'](https://arxiv.org/pdf/2303.02151.pdf).\n\nThe paper has been accepted by **CVPR 2023** 🔥.\n\n## News\n* Please check our latest work ['Point-NN, Parameter is Not All You Need'](https://arxiv.org/pdf/2303.08134.pdf) with [code](https://github.com/ZrrSkywalker/Point-NN), accepted by **CVPR 2023** 🔥, which conducts 3D understanding without ant parameters or training.\n* CaFo cascaded with [ChatGPT](https://openai.com/blog/chatgpt) and [Stable Diffusion](https://github.com/CompVis/stable-diffusion) on Caltech-101 dataset has been released 📌.\n* The code of CaFo has been released.\n* The CaFo model is developed based on [Tip-Adapter](https://arxiv.org/pdf/2207.09519), accepted by **ECCV 2022** and [open-sourced](https://github.com/gaopengcuhk/Tip-Adapter).\n\n## Introduction\nWe propose **CaFo**, a **Ca**scade of **Fo**undation models that incorporates diverse prior knowledge of various pre-trianing paradigms for better few-shot learning, including CLIP, DINO, DALL-E, and GPT-3. Specifically, CaFo works by **`Prompt, Generate, then Cache'**. We leverage GPT-3 to prompt CLIP with rich linguistic semantics and generate synthetic images via DALL-E to expand the few-shot training data. Then, we introduce a learnable cache model to adaptively blend the predictions from CLIP and DINO. By such collaboration, CaFo can fully unleash the potential of different pre-training methods and unify them to perform *state-of-the-art* for few-shot classification.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"CaFo.png\"/\u003e\n\u003c/div\u003e\n\n## Requirements\n\n### Installation\nCreate a conda environment and install dependencies:\n```bash\ngit clone https://github.com/ZrrSkywalker/CaFo.git\ncd CaFo\n\nconda create -n cafo python=3.7\nconda activate cafo\n\npip install -r requirements.txt\n\n# Install the according versions of torch and torchvision\nconda install pytorch torchvision cudatoolkit\n```\n\n### Dataset\nPlease follow [DATASET.md](https://github.com/gaopengcuhk/Tip-Adapter/blob/main/DATASET.md) to download official ImageNet and other 10 datasets.\n\n### Foundation Models\n* The pre-tained weights of **CLIP** will be automatically downloaded by running.\n* The prompts produced by **GPT-3** have been stored at `gpt_file/`.\n* Please download **DINO's** pre-trained ResNet-50 from [here](https://dl.fbaipublicfiles.com/dino/dino_resnet50_pretrain/dino_resnet50_pretrain.pth), and put it under `dino/`.\n* Please download **DALL-E's** generated images from [here](https://drive.google.com/drive/folders/1e249OgUFCmpfEDPsxCVR-nNb6Q1VaZVW?usp=sharing), and organize them with the official datasets like\n```\n$DATA/\n|–– imagenet/\n|–– caltech-101/\n|–– oxford_pets/\n|–– ...\n|–– dalle_imagenet/\n|–– dalle_caltech-101/\n|–– dalle_oxford_pets/\n|–– ...\n|–– sd_caltech-101/\n```\n* For Caltech-101 dataset, we also provide **Stable Diffusion's** images from [here](https://drive.google.com/drive/folders/1e249OgUFCmpfEDPsxCVR-nNb6Q1VaZVW?usp=sharing), and **ChatGPT's** prompts in `gpt_file/`.\n\n## Get Started\n### Configs\nThe running configurations for different `[dataset]` with `[k]` shots can be modified in `configs/[dataset]/[k]shot.yaml`, including visual encoders and hyperparamters. We have provided the configurations for reproducing the results in the paper. You can edit the `search_scale`, `search_step`, `init_beta` and `init_alpha` for fine-grained tuning and better results.\n\nNote that the default `load_cache` and `load_pre_feat` are `False` for the first running, which will store the cache model and val/test features in `configs/dataset/`. For later running, they can be set as `True` for faster hyperparamters tuning.\n\nFor Caltech101 dataset, the config of Stable Diffusion's images and ChatGPT's prompts is respectively in `configs/sd_caltech101` and `configs/chat_caltech101`.\n\n### Running\nFor 16-shot ImageNet dataset:\n```bash\nCUDA_VISIBLE_DEVICES=0 python main_imagenet.py --config configs/imagenet/16shot.yaml\n```\nFor other 10 datasets:\n```bash\nCUDA_VISIBLE_DEVICES=0 python main.py --config configs/dataset/16shot.yaml\n```\n\n### Numerical Results\n\nWe provide CaFo's numerical results on 11 datasets from 1 to 16 shots at [exp_Cafo.log](https://github.com/ZrrSkywalker/CaFo/blob/main/exp.log).\nThe results for Tip-Adapter and Tip-Adapter-F is at [exp_Tip.log](https://github.com/gaopengcuhk/Tip-Adapter/blob/main/exp.log).\n\n\n## Acknowledgement\nThis repo benefits from [Tip-Adapter](https://github.com/gaopengcuhk/Tip-Adapter), [CLIP](https://github.com/openai/CLIP), [DINO](https://github.com/facebookresearch/dino), [DALL-E](https://github.com/borisdayma/dalle-mini) and [CuPL](https://github.com/sarahpratt/CuPL). Thanks for their wonderful works.\n\n\n## Citation\n```bash\n@article{zhang2023prompt,\n  title={Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners},\n  author={Renrui Zhang and Xiangfei Hu and Bohao Li and Siyuan Huang and Hanqiu Deng and Hongsheng Li and Yu Qiao and Peng Gao},\n  journal={arXiv preprint arXiv:2303.02151},\n  year={2023}\n}\n```\n\n## Contributors\n[Renrui Zhang](https://github.com/ZrrSkywalker), [Xiangfei Hu](https://github.com/hxf42), [Bohao Li](https://github.com/Bohao-Lee)\n\n## Contact\nIf you have any question about this project, please feel free to contact zhangrenrui@pjlab.org.cn and sjtuhxf@sjtu.edu.cn.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenGVLab%2FCaFo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOpenGVLab%2FCaFo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenGVLab%2FCaFo/lists"}