{"id":13958414,"url":"https://github.com/raoyongming/DenseCLIP","last_synced_at":"2025-07-20T23:31:36.387Z","repository":{"id":40634569,"uuid":"434300016","full_name":"raoyongming/DenseCLIP","owner":"raoyongming","description":"[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting","archived":false,"fork":false,"pushed_at":"2023-09-15T03:17:33.000Z","size":16127,"stargazers_count":504,"open_issues_count":10,"forks_count":37,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-08-09T13:18:46.945Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raoyongming.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-02T16:43:45.000Z","updated_at":"2024-08-08T03:34:46.000Z","dependencies_parsed_at":"2023-01-17T19:00:20.753Z","dependency_job_id":null,"html_url":"https://github.com/raoyongming/DenseCLIP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raoyongming%2FDenseCLIP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raoyongming%2FDenseCLIP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raoyongming%2FDenseCLIP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raoyongming%2FDenseCLIP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raoyongming","download_url":"https://codeload.github.com/raoyongming/DenseCLIP/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226845021,"owners_count":17691143,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-08T13:01:33.052Z","updated_at":"2024-11-28T01:32:01.970Z","avatar_url":"https://github.com/raoyongming.png","language":"Python","funding_links":[],"categories":["对象检测、分割"],"sub_categories":["网络服务_其他"],"readme":"# DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting\n\nCreated by [Yongming Rao](https://raoyongming.github.io/)\\*, [Wenliang Zhao](https://wl-zhao.github.io/)\\*, [Guangyi Chen](https://chengy12.github.io/), [Yansong Tang](https://andytang15.github.io/), [Zheng Zhu](http://www.zhengzhu.net/), Guan Huang, [Jie Zhou](https://scholar.google.com/citations?user=6a79aPwAAAAJ\u0026hl=en\u0026authuser=1), and [Jiwen Lu](https://scholar.google.com/citations?user=TN8uDQoAAAAJ\u0026hl=en\u0026authuser=1).\n\nThis repository contains PyTorch implementation for DenseCLIP (CVPR 2022).\n\nDenseCLIP is a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from\nCLIP. Specifically, we convert the original image-text matching\nproblem in CLIP to a pixel-text matching problem and\nuse the pixel-text score maps to guide the learning of dense\nprediction models. By further using the contextual information\nfrom the image to prompt the language model, we are\nable to facilitate our model to better exploit the pre-trained\nknowledge. Our method is model-agnostic, which can be\napplied to arbitrary dense prediction systems and various\npre-trained visual backbones including both CLIP models\nand ImageNet pre-trained models.\n\n![intro](framework.gif)\n\nOur code is based on mmsegmentation and mmdetection.\n\n[[Project Page]](https://denseclip.ivg-research.xyz/) [[arXiv]](https://arxiv.org/abs/2112.01518)\n\n## Usage\n\n### Requirements\n\n- torch\u003e=1.8.0\n- torchvision\n- timm\n- mmcv-full==1.3.17\n- mmseg==0.19.0\n- mmdet==2.17.0\n- regex\n- ftfy\n- fvcore\n\nTo use our code, please first install the `mmcv-full` and `mmseg`/`mmdet` following the official guidelines ([`mmseg`](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/get_started.md), [`mmdet`](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/get_started.md)) and prepare the datasets accordingly.\n\n### Pre-trained CLIP Models\n\nDownload the pre-trained CLIP models (`RN50.pt`, `RN101.pt`, `VIT-B-16.pt`) and save them to the `pretrained` folder. The download links can be found in [the official CLIP repo](https://github.com/openai/CLIP/blob/a1d071733d7111c9c014f024669f959182114e33/clip/clip.py#L30).\n\n### Segmentation\n\n#### Model Zoo\nWe provide DenseCLIP models for Semantic FPN framework.\n\n| Model | FLOPs (G) | Params (M) | mIoU(SS) | mIoU(MS) | config | url |\n|-------|-----------|------------|--------|--------|--------|-----| \n|RN50-CLIP|248.8|31.0|39.6|41.6|[config](segmentation/configs/fpn_clipres50_512x512_80k.py)|-| \n|RN50-DenseCLIP|269.2|50.3|43.5|44.7|[config](segmentation/configs/denseclip_fpn_res50_512x512_80k.py)|[Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/8636d4a95c60418ba63c/?dl=1)| \n|RN101-CLIP|326.6|50.0|42.7|44.3|[config](segmentation/configs/fpn_clipres101_512x512_80k.py)|-| \n|RN101-DenseCLIP|346.3|67.8|45.1|46.5|[config](segmentation/configs/denseclip_fpn_res101_512x512_80k.py)|[Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/87387f3625ac42a68da5/?dl=1)| \n|ViT-B-CLIP|1037.4|100.8|49.4|50.3|[config](segmentation/configs/fpn_clipvit-b_640x640_80k.py)|-| \n|ViT-B-DenseCLIP|1043.1|105.3|50.6|51.3|[config](segmentation/configs/denseclip_fpn_vit-b_640x640_80k.py)|[Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/d056a1a943214d479cd1/?dl=1)| \n\n#### Training \u0026 Evaluation on ADE20K\n\nTo train the DenseCLIP model based on CLIP ResNet-50, run:\n\n```\nbash dist_train.sh configs/denseclip_fpn_res50_512x512_80k.py 8\n```\n\nTo evaluate the performance with multi-scale testing, run:\n\n```\nbash dist_test.sh configs/denseclip_fpn_res50_512x512_80k.py /path/to/checkpoint 8 --eval mIoU --aug-test\n```\n\nTo better measure the complexity of the models, we provide a tool based on `fvcore` to accurately compute the FLOPs of `torch.einsum` and other operations:\n```\npython get_flops.py /path/to/config --fvcore\n```\nYou can also remove the `--fvcore` flag to obtain the FLOPs measured by `mmcv` for comparisons.\n\n###  Detection\n\n#### Model Zoo\nWe provide models for both RetinaNet and Mask-RCNN framework.\n\n##### RetinaNet\n| Model | FLOPs (G) | Params (M) | box AP | config | url |\n|-------|-----------|------------|--------|--------|-----| \n|RN50-CLIP|265|38|36.9|[config](detection/configs/retinanet_clip_r50_fpn_1x_coco.py)|-| \n|RN50-DenseCLIP|285|60|37.8|[config](detection/configs/retinanet_denseclip_r50_fpn_1x_coco.py)|[Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/bfb64768d2124e99b79c/?dl=1)| \n|RN101-CLIP|341|57|40.5|[config](detection/configs/retinanet_clip_r101_fpn_1x_coco.py)|-| \n|RN101-DenseCLIP|360|78|41.1|[config](detection/configs/retinanet_denseclip_r101_fpn_1x_coco.py)|[Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/cfb8cdf85dfb453eb786/?dl=1)| \n\n##### Mask R-CNN\n| Model | FLOPs (G) | Params (M) | box AP | mask AP | config | url |\n|-------|-----------|------------|--------|---------|--------|-----| \n|RN50-CLIP|301|44|39.3|36.8|[config](detection/configs/mask_rcnn_clip_r50_fpn_1x_coco.py)|-| \n|RN50-DenseCLIP|327|67|40.2|37.6|[config](detection/configs/mask_rcnn_denseclip_r50_fpn_1x_coco.py)|[Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/4adf197e693e4480bf26/?dl=1)| \n|RN101-CLIP|377|63|42.2|38.9|[config](detection/configs/mask_rcnn_clip_r101_fpn_1x_coco.py)|-| \n|RN101-DenseCLIP|399|84|42.6|39.6|[config](detection/configs/mask_rcnn_denseclip_r101_fpn_1x_coco.py)|[Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/ca072b19676942c3be82/?dl=1)| \n\n\n\n#### Training \u0026 Evaluation on COCO\nTo train our DenseCLIP-RN50 using RetinaNet framework, run\n```bash\n bash dist_train.sh configs/retinanet_denseclip_r50_fpn_1x_coco.py 8\n```\n\nTo evaluate the box AP of RN50-DenseCLIP (RetinaNet), run\n```bash\nbash dist_test.sh configs/retinanet_denseclip_r50_fpn_1x_coco.py /path/to/checkpoint 8 --eval bbox\n```\nTo evaluate both the box AP and the mask AP of RN50-DenseCLIP (Mask-RCNN), run\n```bash\nbash dist_test.sh configs/mask_rcnn_denseclip_r50_fpn_1x_coco.py /path/to/checkpoint 8 --eval bbox segm\n```\n\n## License\nMIT License\n\n## Citation\nIf you find our work useful in your research, please consider citing:\n```\n@inproceedings{rao2021denseclip,\n  title={DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting},\n  author={Rao, Yongming and Zhao, Wenliang and Chen, Guangyi and Tang, Yansong and Zhu, Zheng and Huang, Guan and Zhou, Jie and Lu, Jiwen},\n  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraoyongming%2FDenseCLIP","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraoyongming%2FDenseCLIP","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraoyongming%2FDenseCLIP/lists"}