{"id":13711772,"url":"https://github.com/RockeyCoss/Prompt-Segment-Anything","last_synced_at":"2025-05-06T21:32:16.471Z","repository":{"id":152324557,"uuid":"625729642","full_name":"RockeyCoss/Prompt-Segment-Anything","owner":"RockeyCoss","description":"This is an implementation of zero-shot instance segmentation using Segment Anything.","archived":false,"fork":false,"pushed_at":"2023-04-14T11:09:44.000Z","size":2104,"stargazers_count":299,"open_issues_count":6,"forks_count":15,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-11-13T22:35:04.355Z","etag":null,"topics":["instance-segmentation","mmdetection","segment-anything"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RockeyCoss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-10T01:22:08.000Z","updated_at":"2024-10-30T15:02:20.000Z","dependencies_parsed_at":"2024-04-18T23:33:47.333Z","dependency_job_id":null,"html_url":"https://github.com/RockeyCoss/Prompt-Segment-Anything","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RockeyCoss%2FPrompt-Segment-Anything","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RockeyCoss%2FPrompt-Segment-Anything/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RockeyCoss%2FPrompt-Segment-Anything/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RockeyCoss%2FPrompt-Segment-Anything/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RockeyCoss","download_url":"https://codeload.github.com/RockeyCoss/Prompt-Segment-Anything/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252772138,"owners_count":21801861,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["instance-segmentation","mmdetection","segment-anything"],"created_at":"2024-08-02T23:01:11.443Z","updated_at":"2025-05-06T21:32:13.551Z","avatar_url":"https://github.com/RockeyCoss.png","language":"Python","funding_links":[],"categories":["Recent Works","Open Source Projects","Application","🎨 \"Anything\" Projects Ecosystem"],"sub_categories":["Follow-up Papers","Image Detection/Segmentation","🎯 2023 Classic Projects"],"readme":"# Prompt-Segment-Anything\nThis is an implementation of zero-shot instance segmentation using [Segment Anything](https://github.com/facebookresearch/segment-anything). Thanks to the authors of Segment Anything for their wonderful work! \n\nThis repository is based on [MMDetection](https://github.com/open-mmlab/mmdetection) and includes some code from [H-Deformable-DETR](https://github.com/HDETR/H-Deformable-DETR) and [FocalNet-DINO](https://github.com/FocalNet/FocalNet-DINO).\n\n![example1](assets/example1.jpg)\n\n## News\n\n**2023.04.12** Multimask output mode and cascade prompt mode is available now.\n\n**2023.04.11** Our [demo](https://huggingface.co/spaces/rockeycoss/Prompt-Segment-Anything-Demo) is available now. Please feel free to check it out.\n\n**2023.04.11** [Swin-L+H-Deformable-DETR + SAM](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py)/[FocalNet-L+DINO + SAM](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py) achieves strong COCO instance segmentation results: mask AP=46.8/49.1 by simply prompting SAM with boxes predicted by Swin-L+H-Deformable-DETR/FocalNet-L+DINO. (mask AP=46.5 based on ViTDet)🍺\n\n## Catalog\n\n- [x] Support Swin-L+H-Deformable-DETR+SAM\n- [x] Support FocalNet-L+DINO+SAM\n- [x] Support R50+H-Deformable-DETR+SAM/Swin-T+H-Deformable-DETR\n- [x] Support HuggingFace gradio demo\n- [x] Support cascade prompts (box prompt + mask prompt)\n\n## Box-as-Prompt Results\n\n|         Detector         |    SAM    |    multimask ouput    | Detector's Box AP | Mask AP |                            Config                            |\n| :--------------------- | :-------: | :---------------: | :-----: | :----------------------------------------------------------: | ----------------------- |\n|  R50+H-Deformable-DETR   | sam-vit-b | :x: |       50.0        |  38.2   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b.py) |\n| R50+H-Deformable-DETR | sam-vit-b | :heavy_check_mark: | 50.0 | 39.9 | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi.py) |\n|  R50+H-Deformable-DETR   | sam-vit-l | :x: |       50.0        |  41.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-l.py) |\n| Swin-T+H-Deformable-DETR | sam-vit-b | :x: |       53.2        |  40.0   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-t-hdetr_sam-vit-b.py) |\n| Swin-T+H-Deformable-DETR | sam-vit-l | :x: |       53.2        |  43.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-t-hdetr_sam-vit-l.py) |\n| Swin-L+H-Deformable-DETR | sam-vit-b | :x: |       58.0        |  42.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-b.py) |\n| Swin-L+H-Deformable-DETR | sam-vit-l | :x: |       58.0        |  46.3   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-l.py) |\n| Swin-L+H-Deformable-DETR | sam-vit-h | :x: |       58.0        |  46.8   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py) |\n|     FocalNet-L+DINO      | sam-vit-b | :x: |       63.2        |  44.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-b.py) |\n|     FocalNet-L+DINO      | sam-vit-l | :x: |       63.2        |  48.6   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-l.py) |\n|     FocalNet-L+DINO      | sam-vit-h | :x: |       63.2        |  49.1   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h.py) |\n\n## Cascade-Prompt Results\n\n|       Detector        |    SAM    |  multimask ouput   | Detector's Box AP | Mask AP | Config                                                       |\n| :------------------- | :-------: | :----------------: | :---------------: | :-----: | ------------------------------------------------------------ |\n| R50+H-Deformable-DETR | sam-vit-b |        :x:         |       50.0        |  38.8   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b_cascade.py) |\n| R50+H-Deformable-DETR | sam-vit-b | :heavy_check_mark: |       50.0        |  40.5   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/r50-hdetr_sam-vit-b_best-in-multi_cascade.py) |\n| Swin-L+H-Deformable-DETR | sam-vit-h | :heavy_check_mark: |       58.0        |  47.3   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h_best-in-multi_cascade.py) |\n|     FocalNet-L+DINO      | sam-vit-h | :heavy_check_mark: |       63.2        |  49.6   | [config](https://github.com/RockeyCoss/Instance-Segment-Anything/blob/master/projects/configs/hdetr/swin-l-hdetr_sam-vit-h_best-in-multi_cascade.py) |\n\n***Note***\n\n**multimask ouput**: If multimask output is :heavy_check_mark:, SAM will predict three masks for each prompt, and the segmentation result will be the one with the highest predicted IoU. Otherwise, if multimask output is :x:, SAM will return only one mask for each prompt, which will be used as the segmentation result.\n\n**cascade-prompt**: In the cascade-prompt setting, the segmentation process involves two stages. In the first stage, a coarse mask is predicted with a bounding box prompt. The second stage then utilizes both the bounding box and the coarse mask as prompts to predict the final segmentation result. Note that if multimask output is :heavy_check_mark:, the first stage will predict three coarse masks, and the second stage will use the mask with the highest predicted IoU as the prompt.\n\n## Installation\n\n🍺🍺🍺 Add dockerhub enviroment \n\n```\ndocker pull kxqt/prompt-sam-torch1.12-cuda11.6:20230410\nnvidia-docker run -it --shm-size=4096m -v {your_path}:{path_in_docker} kxqt/prompt-sam-torch1.12-cuda11.6:20230410\n```\n\nWe test the models under `python=3.7.10,pytorch=1.10.2,cuda=10.2`. Other versions might be available as well.\n\n1. Clone this repository\n\n```\ngit clone https://github.com/RockeyCoss/Instance-Segment-Anything\ncd Instance-Segment-Anything\n```\n\n2. Install PyTorch\n\n```bash\n# an example\npip install torch torchvision\n```\n\n3. Install MMCV\n\n```\npip install -U openmim\nmim install \"mmcv-full\u003c2.0.0\"\n```\n\n4. Install MMDetection's requirements\n\n```\npip install -r requirements.txt\n```\n\n5. Compile CUDA operators\n\n```bash\ncd projects/instance_segment_anything/ops\npython setup.py build install\ncd ../../..\n```\n\nPlease note that the ``mmdet`` package does not need to be installed. If your environment already has the ``mmdet`` package installed, you can run the following command before executing other scripts:\n\n```bash\nexport PYTHONPATH=$(pwd)\n```\n\n## Prepare COCO Dataset\n\nPlease refer to [data preparation](https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html).\n\n## Prepare Checkpoints\n\n1. Install wget\n\n```\npip install wget\n```\n\n2. SAM checkpoints\n\n```bash\nmkdir ckpt\ncd ckpt\npython -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth\npython -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth\npython -m wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\ncd ..\n```\n\n3. Here are the checkpoints for the detection models. You can download only the checkpoints you need.\n\n```bash\n# R50+H-Deformable-DETR\ncd ckpt\npython -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/r50_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o r50_hdetr.pth\ncd ..\npython tools/convert_ckpt.py ckpt/r50_hdetr.pth ckpt/r50_hdetr.pth\n\n# Swin-T+H-Deformable-DETR\ncd ckpt\npython -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/swin_tiny_hybrid_branch_lambda1_group6_t1500_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o swin_t_hdetr.pth\ncd ..\npython tools/convert_ckpt.py ckpt/swin_t_hdetr.pth ckpt/swin_t_hdetr.pth\n\n# Swin-L+H-Deformable-DETR\ncd ckpt\npython -m wget https://github.com/HDETR/H-Deformable-DETR/releases/download/v0.1/decay0.05_drop_path0.5_swin_large_hybrid_branch_lambda1_group6_t1500_n900_dp0_mqs_lft_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage_36eps.pth -o swin_l_hdetr.pth\ncd ..\npython tools/convert_ckpt.py ckpt/swin_l_hdetr.pth ckpt/swin_l_hdetr.pth\n\n# FocalNet-L+DINO\ncd ckpt\npython -m wget https://projects4jw.blob.core.windows.net/focalnet/release/detection/focalnet_large_fl4_o365_finetuned_on_coco.pth -o focalnet_l_dino.pth\ncd ..\npython tools/convert_ckpt.py ckpt/focalnet_l_dino.pth ckpt/focalnet_l_dino.pth\n```\n\n## Run Evaluation\n\n1. Evaluate Metrics\n\n```bash\n# single GPU\npython tools/test.py path/to/the/config/file --eval segm\n# multiple GPUs\nbash tools/dist_test.sh path/to/the/config/file num_gpus --eval segm\n```\n\n2. Visualize Segmentation Results\n\n```bash\npython tools/test.py path/to/the/config/file --show-dir path/to/the/visualization/results\n```\n## Gradio Demo\n\nWe also provide a UI for displaying the segmentation results that is built with gradio. To launch the demo, simply run the following command in a terminal:\n\n```bash\npip install gradio\npython app.py\n```\n\nThis demo is also hosted on HuggingFace [here](https://huggingface.co/spaces/rockeycoss/Prompt-Segment-Anything-Demo).\n\n## More Segmentation Examples\n\n![example2](assets/example2.jpg)\n![example3](assets/example3.jpg)\n![example4](assets/example4.jpg)\n![example5](assets/example5.jpg)\n\n## Citation\n\n**Segment Anything**\n\n```latex\n@article{kirillov2023segany,\n  title={Segment Anything}, \n  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\\'a}r, Piotr and Girshick, Ross},\n  journal={arXiv:2304.02643},\n  year={2023}\n}\n```\n**H-Deformable-DETR**\n\n```latex\n@article{jia2022detrs,\n  title={DETRs with Hybrid Matching},\n  author={Jia, Ding and Yuan, Yuhui and He, Haodi and Wu, Xiaopei and Yu, Haojun and Lin, Weihong and Sun, Lei and Zhang, Chao and Hu, Han},\n  journal={arXiv preprint arXiv:2207.13080},\n  year={2022}\n}\n```\n**Swin Transformer**\n\n```latex\n@inproceedings{liu2021Swin,\n  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},\n  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},\n  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},\n  year={2021}\n}\n```\n**DINO**\n\n```latex\n@misc{zhang2022dino,\n      title={DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection}, \n      author={Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel M. Ni and Heung-Yeung Shum},\n      year={2022},\n      eprint={2203.03605},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n**FocalNet**\n\n```latex\n@misc{yang2022focalnet,  \n  author = {Yang, Jianwei and Li, Chunyuan and Dai, Xiyang and Yuan, Lu and Gao, Jianfeng},\n  title = {Focal Modulation Networks},\n  publisher = {arXiv},\n  year = {2022},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRockeyCoss%2FPrompt-Segment-Anything","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRockeyCoss%2FPrompt-Segment-Anything","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRockeyCoss%2FPrompt-Segment-Anything/lists"}