{"id":15903104,"url":"https://github.com/frank-xwang/UnSAM","last_synced_at":"2025-10-18T06:30:40.952Z","repository":{"id":246845976,"uuid":"821510908","full_name":"frank-xwang/UnSAM","owner":"frank-xwang","description":"[NeurIPS 2024] Code release for \"Segment Anything without Supervision\"","archived":false,"fork":false,"pushed_at":"2024-07-09T09:08:19.000Z","size":22426,"stargazers_count":360,"open_issues_count":7,"forks_count":25,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-10-06T12:01:58.514Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/frank-xwang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-28T17:52:35.000Z","updated_at":"2024-10-06T08:25:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"44755d20-7222-49b7-8ac0-6dab25265bdb","html_url":"https://github.com/frank-xwang/UnSAM","commit_stats":null,"previous_names":["frank-xwang/unsam"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frank-xwang%2FUnSAM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frank-xwang%2FUnSAM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frank-xwang%2FUnSAM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frank-xwang%2FUnSAM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/frank-xwang","download_url":"https://codeload.github.com/frank-xwang/UnSAM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236907712,"owners_count":19223639,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T12:00:59.552Z","updated_at":"2025-10-18T06:30:40.946Z","avatar_url":"https://github.com/frank-xwang.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Paper List"],"sub_categories":["Follow-up Papers"],"readme":"# Segment Anything without Supervision\n\nUnsupervised SAM (UnSAM) is a \"segment anything\" model for promptable and automatic whole-image segmentation which does not require human annotations. \n\n\u003cp align=\"center\"\u003e \n  \u003cimg width=\"1301\" alt=\"teaser_unsam\" src=\"https://github.com/frank-xwang/UnSAM/assets/58996472/0c53071c-bdc8-4424-9e9e-40b8c8c31a18\" align=\"center\" \u003e\n\u003c/p\u003e\n\n\n\u003e [**Segment Anything without Supervision**](http://arxiv.org/abs/2406.20081)            \n\u003e [XuDong Wang](https://frank-xwang.github.io/), [Jingfeng Yang](https://jingfeng0705.github.io/), [Trevor Darrell](https://people.eecs.berkeley.edu/~trevor/)      \n\u003e UC Berkeley            \n\u003e NeurIPS 2024            \n\n[[`project page`](https://people.eecs.berkeley.edu/~xdwang/projects/UnSAM/)] [[`arxiv`](http://arxiv.org/abs/2406.20081)] [[`colab (UnSAM)`](https://drive.google.com/file/d/1KyxbFb2JC76RZ1jg7F8Ee4TEmOlpYMe7/view?usp=sharing)] [[`colab (pseudo-label)`](https://drive.google.com/file/d/1aFObIt-xlQmCKk3G7dD8KQxaWhM_RTEd/view?usp=sharing)] [[`bibtex`](#citation)]             \n\n\n## Updates\n- 07/01/2024 Initial commit\n\n\n## Features\n- The performance gap between unsupervised segmentation models and SAM can be significantly reduced. UnSAM not only advances the state-of-the-art in unsupervised segmentation by 10% but also achieves comparable performance with the labor-intensive, fully-supervised SAM.\n- The supervised SAM can also benefit from our self-supervised labels. By training UnSAM with only 1% of SA-1B images, a lightly semi-supervised UnSAM can often segment entities overlooked by supervised SAM, exceeding SAM’s AR by over 6.7% and AP by 3.9% on SA-1B. \n\n\n## Installation\nSee [installation instructions](INSTALL.md).\n\n## Dataset Preparation\nSee [Preparing Datasets for UnSAM](datasets/README.md).\n\n## Method Overview\n\nUnSAM has two major stages: 1) generating pseudo-masks with divide-and-conquer and 2) learning unsupervised segmentation models from pseudo-masks of unlabeled data.\n\n### 1. Multi-granular Pseudo-mask Generation with Divide-and-Conquer\n\nOur Divide-and-Conquer approach can be used to provide multi-granular masks without human supervision.\n\n### Divide-and-Conquer Demo\n\nTry out the demo using Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11K2mHhISA7RYY8pKgyeyHO9gnExn-EXl)\n\nIf you want to run Divide-and-Conquer locally, we provide `demo_dico.py` that is able to visualize the pseudo-masks.\nPlease download the CutLER's checkpoint from [here](http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth), and then run it with:\n```\ncd divide_and_conquer\npython demo_dico.py \\\n    --input /path/to/input/image \\\n    --output /path/to/save/output \\\n    --preprocess true \\\n    --postprocess true \\ #postprocess requires gpu \n    --opts MODEL.WEIGHTS /path/to/cutler_checkpoint \\\n    MODEL.DEVICE gpu\n```\nWe give a few demo images in docs/demos/. Following, we give some visualizations of the pseudo-masks on the demo images.\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/frank-xwang/UnSAM/assets/58996472/6ea40b0a-7fd3-436b-9b3f-37acbc122fc3\" width=100%\u003e\n\u003c/p\u003e\n\n\n### 2. Segment Anything without Supervision\n\n### Inference Demo for UnSAM with Pre-trained Models (whole image segmentation)\nTry out the UnSAM demo using Colab (no GPU needed): [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ZHdql8SVHYqQG0BSkpgCPYkfWdiLeor6)\n\nIf you want to run UnSAM or UnSAM+ demos locally, we provide `demo_whole_image.py` that is able to demo builtin configs. \nPlease download UnSAM/UnSAM+'s checkpoints from the [model zoo](#model-zoo). \nRun it with:\n```\ncd whole_image_segmentation\npython demo_whole_image.py \\\n    --input /path/to/input/image \\\n    --output /path/to/save/output \\\n    --opts \\\n    MODEL.WEIGHTS /path/to/UnSAM_checkpoint \\\n    MODEL.DEVICE cpu\n```\nThe configs are made for training, therefore we need to specify `MODEL.WEIGHTS` to a model from model zoo for evaluation.\nThis command will run the inference and save the results in the local path.\n\u003c!-- For details of the command line arguments, see `demo.py -h` or look at its source code\nto understand its behavior. Some common arguments are: --\u003e\n* To run __on cpu__, add `MODEL.DEVICE cpu` after `--opts`.\n* To save outputs to a directory (for images) or a file (for webcam or video), use `--output`.\n\nFollowing, we give some visualizations of the model predictions on the demo images.\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/frank-xwang/UnSAM/assets/58996472/83f9d9ee-0c2e-4b65-83f7-77852d169d2d\" width=100%\u003e\n\u003c/p\u003e\n\n\n### Gradio Demo for UnSAM with Pre-trained Models (promptable image segmentation)\n\nThe following command will pops up a gradio website link in the terminal, on which users can interact with our model. \nPlease download UnSAM/UnSAM+'s checkpoints from the [model zoo](#model-zoo). \nFor details of the command line arguments, see `demo_promptable.py -h` or look at its source code\nto understand its behavior.\n* To run __on cpu__, add `cpu` after `--device`.\n```\npython demo_promptable.py \\\n    --ckpt /path/to/UnSAM_checkpoint \\\n    --conf_files configs/semantic_sam_only_sa-1b_swinT.yaml \\\n    --device gpu\n```\n\nFollowing, we give some visualizations of the model predictions on the demo images.\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/frank-xwang/UnSAM/assets/58996472/1b7eb492-2c3d-426f-9f90-bc117ea322eb\" width=100%\u003e\n\u003c/p\u003e\n\n\n### Model Evaluation\nTo evaluate a model's performance on 7 different datasets, please refer to [datasets/README.md](datasets/README.md) for \ninstructions on preparing the datasets. Next, select a model from the model zoo, specify the \"model_weights\", \"config_file\" \nand the path to \"DETECTRON2_DATASETS\" in `tools/eval.sh`, then run the script.\n```\nbash tools/{promptable, whole_image}_eval.sh\n```\n\n### Model Zoo\n\n#### Whole image segmentation\nUnSAM achieves the state-of-the-art results on unsupervised image segmentation, using a backbone of ResNet50 and training \nwith only 1% of SA-1B data. We show zero-shot unsupervised image segmentation performance on 7 different datasets, \nincluding COCO, LVIS, ADE20K, Entity, SA-1B, Part-ImageNet and PACO.   \n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"bottom\"\u003eMethods\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eModels\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eBackbone\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003e# of Train Images\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eAvg.\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eCOCO\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eLVIS\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eADE20K\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eEntity\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eSA-1B\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003ePtIn\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003ePACO\u003c/th\u003e\n\u003c!-- TABLE BODY --\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd align=\"center\"\u003ePrev. Unsup. SOTA\u003c/td\u003e\n\u003ctd valign=\"center\"\u003e-\u003c/td\u003e\n\u003ctd valign=\"center\"\u003eViT-Base\u003c/th\u003e\n\u003ctd align=\"center\"\u003e0.2M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e30.1\u003c/td\u003e\n\u003ctd align=\"center\"\u003e30.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e29.1\u003c/td\u003e\n\u003ctd align=\"center\"\u003e31.1\u003c/td\u003e\n\u003ctd align=\"center\"\u003e33.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e33.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e36.0\u003c/td\u003e\n\u003ctd align=\"center\"\u003e17.1\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd align=\"center\"\u003eUnSAM (ours)\u003c/td\u003e\n\u003ctd valign=\"center\"\u003e-\u003c/td\u003e\n\u003ctd valign=\"center\"\u003eResNet50\u003c/th\u003e\n\u003ctd align=\"center\"\u003e0.1M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e39.2\u003c/td\u003e\n\u003ctd align=\"center\"\u003e40.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e37.7\u003c/td\u003e\n\u003ctd align=\"center\"\u003e35.7\u003c/td\u003e\n\u003ctd align=\"center\"\u003e39.6\u003c/td\u003e\n\u003ctd align=\"center\"\u003e41.9\u003c/td\u003e\n\u003ctd align=\"center\"\u003e51.6\u003c/td\u003e\n\u003ctd align=\"center\"\u003e27.5\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd align=\"center\"\u003eUnSAM (ours)\u003c/td\u003e\n\u003ctd valign=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/12DvjnXIQsOtBSAAEicd9uhW0TCpnMFyZ/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n\u003ctd valign=\"center\"\u003eResNet50\u003c/th\u003e\n\u003ctd align=\"center\"\u003e0.4M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e41.1\u003c/td\u003e\n\u003ctd align=\"center\"\u003e42.0\u003c/td\u003e\n\u003ctd align=\"center\"\u003e40.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e37.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e41.0\u003c/td\u003e\n\u003ctd align=\"center\"\u003e44.5\u003c/td\u003e\n\u003ctd align=\"center\"\u003e52.7\u003c/td\u003e\n\u003ctd align=\"center\"\u003e29.7\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\u003c/table\u003e\n\nUnSAM+ can outperform SAM on most experimented benchmarks (including SA-1B), when training UnSAM on 1% of SA-1B with both \nground truth masks and our unsupervised labels. This demonstrates that the supervised SAM can also benefit from our self-supervised labels.\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"bottom\"\u003eMethods\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eModels\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eBackbone\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003e# of Train Images\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eAvg.\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eCOCO\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eLVIS\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eADE20K\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eEntity\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eSA-1B\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003ePtIn\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003ePACO\u003c/th\u003e\n\u003c!-- TABLE BODY --\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd align=\"center\"\u003eSAM\u003c/td\u003e\n\u003ctd valign=\"center\"\u003e-\u003c/td\u003e\n\u003ctd valign=\"center\"\u003eViT-Base\u003c/td\u003e\n\u003ctd align=\"center\"\u003e11M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e42.1\u003c/td\u003e\n\u003ctd align=\"center\"\u003e49.6\u003c/td\u003e\n\u003ctd align=\"center\"\u003e46.1\u003c/td\u003e\n\u003ctd align=\"center\"\u003e45.8\u003c/td\u003e\n\u003ctd align=\"center\"\u003e45.9\u003c/td\u003e\n\u003ctd align=\"center\"\u003e60.8\u003c/td\u003e\n\u003ctd align=\"center\"\u003e28.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e18.1\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd align=\"center\"\u003eUnSAM+ (ours)\u003c/td\u003e\n\u003ctd valign=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1MaCoMLIR6-baaP7p_WriZVhuJoozxTn8/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n\u003ctd valign=\"center\"\u003eResNet50\u003c/td\u003e\n\u003ctd align=\"center\"\u003e0.1M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e48.8\u003c/td\u003e\n\u003ctd align=\"center\"\u003e52.2\u003c/td\u003e\n\u003ctd align=\"center\"\u003e50.8\u003c/td\u003e\n\u003ctd align=\"center\"\u003e45.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e49.8\u003c/td\u003e\n\u003ctd align=\"center\"\u003e64.8\u003c/td\u003e\n\u003ctd align=\"center\"\u003e46.0\u003c/td\u003e\n\u003ctd align=\"center\"\u003e32.3\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\u003c/table\u003e\n\n#### Promptable image segmentation\nDespite using a backbone that is 3× smaller and being trained on only 1% of SA-1B, our lightly semi-supervised UnSAM+ surpasses the fully-supervised SAM in promptable segmentation task on COCO.\n\u003ctable\u003e\u003ctbody\u003e\n\u003c!-- START TABLE --\u003e\n\u003c!-- TABLE HEADER --\u003e\n\u003cth valign=\"bottom\"\u003eMethods\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eModels\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003eBackbone\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003e# of Train Images\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003ePoint (Max)\u003c/th\u003e\n\u003cth valign=\"bottom\"\u003ePoint (Oracle)\u003c/th\u003e\n\u003c!-- TABLE BODY --\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd align=\"center\"\u003eSAM\u003c/td\u003e\n\u003ctd valign=\"center\"\u003e-\u003c/td\u003e\n\u003ctd align=\"center\"\u003eViT-B/8 (85M)\u003c/td\u003e\n\u003ctd align=\"center\"\u003e11M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e52.1\u003c/td\u003e\n\u003ctd align=\"center\"\u003e68.2\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd align=\"center\"\u003eUnSAM (ours)\u003c/td\u003e\n\u003ctd valign=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/18IilJNw170sKsKBhyIvjfUZ7cZwKyBx7/view?usp=drive_link\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eSwin-Tiny (25M)\u003c/td\u003e\n\u003ctd align=\"center\"\u003e0.1M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e37.6\u003c/td\u003e\n\u003ctd align=\"center\"\u003e57.9\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd align=\"center\"\u003eUnSAM (ours)\u003c/td\u003e\n\u003ctd valign=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1x5tXWV-HKwQ8dJRjbPPweuHEgsJxN0JF/view?usp=drive_link\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eSwin-Tiny (25M)\u003c/td\u003e\n\u003ctd align=\"center\"\u003e0.4M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e41.3\u003c/td\u003e\n\u003ctd align=\"center\"\u003e59.1\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd align=\"center\"\u003eUnSAM+ (ours)\u003c/td\u003e\n\u003ctd valign=\"center\"\u003e\u003ca href=\"https://drive.google.com/file/d/1M3lOnSOutQRK4IqBkc3e4vGZ-u2oTkeW/view?usp=sharing\"\u003edownload\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003eSwin-Tiny (25M)\u003c/td\u003e\n\u003ctd align=\"center\"\u003e0.1M\u003c/td\u003e\n\u003ctd align=\"center\"\u003e52.4\u003c/td\u003e\n\u003ctd align=\"center\"\u003e69.5\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\u003c/table\u003e\n\n## License\nThe majority of UnSAM, CutLER, Detectron2 and DINO are licensed under the [CC-BY-NC license](LICENSE), however portions of the project are available under separate license terms: Mask2Former, Semantic-SAM, CascadePSP, Bilateral Solver and CRF are licensed under the MIT license; If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0.\n\n## Acknowledgement\nThis codebase is based on CutLER, SAM, Mask2Former, Semantic-SAM, CascadePSP, BFS, CRF, DINO and Detectron2. We appreciate the authors for open-sourcing their codes. \n\n## Ethical Considerations\nUnSAM's wide range of detection capabilities may introduce similar challenges to many other visual recognition methods.\nAs the image can contain arbitrary instances, it may impact the model output.\n\n## How to get support from us?\nIf you have any general questions, feel free to email us at [XuDong Wang](mailto:xdwang@eecs.berkeley.edu). If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others). \n\n## Citation\nIf you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.\n```\n@article{wang2024segment,\n  title={Segment Anything without Supervision},\n  author={Wang, XuDong and Yang, Jingfeng and Darrell, Trevor},\n  journal={arXiv preprint arXiv:2406.20081},\n  year={2024}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrank-xwang%2FUnSAM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrank-xwang%2FUnSAM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrank-xwang%2FUnSAM/lists"}