{"id":13711990,"url":"https://github.com/MaybeShewill-CV/segment-anything-u-specify","last_synced_at":"2025-05-06T21:33:17.746Z","repository":{"id":155080965,"uuid":"625938966","full_name":"MaybeShewill-CV/segment-anything-u-specify","owner":"MaybeShewill-CV","description":"using clip and sam to segment any instance you specify with text prompt of any instance names","archived":false,"fork":false,"pushed_at":"2023-07-04T02:42:42.000Z","size":31679,"stargazers_count":161,"open_issues_count":1,"forks_count":9,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-08-03T23:23:59.892Z","etag":null,"topics":["deep-learning","instance-segmentation","object-detection","sam-model","segment-anything"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MaybeShewill-CV.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-10T12:55:33.000Z","updated_at":"2024-07-31T19:05:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"909cf54b-c9c3-466d-adab-f0948bb27faa","html_url":"https://github.com/MaybeShewill-CV/segment-anything-u-specify","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaybeShewill-CV%2Fsegment-anything-u-specify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaybeShewill-CV%2Fsegment-anything-u-specify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaybeShewill-CV%2Fsegment-anything-u-specify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaybeShewill-CV%2Fsegment-anything-u-specify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MaybeShewill-CV","download_url":"https://codeload.github.com/MaybeShewill-CV/segment-anything-u-specify/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224535728,"owners_count":17327582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","instance-segmentation","object-detection","sam-model","segment-anything"],"created_at":"2024-08-02T23:01:13.731Z","updated_at":"2025-05-06T21:33:17.739Z","avatar_url":"https://github.com/MaybeShewill-CV.png","language":"Python","funding_links":[],"categories":["Recent Works","Open Source Projects"],"sub_categories":["Follow-up Papers"],"readme":"# Segment-Anything-U-Specify\nUse SAM and CLIP model to segment unique instances you want.\nYou may use this repo to segment any instances in the picture with\ntext prompts.\n\nThe main network architecture is as follows:\n\n`Clip Model Architecture`\n![CLIP_MODEL](./data/resources/clip_model.png)\n\n`SAM Model Architecture`\n![SAM](./data/resources/sam_model.png)\n\n## Installation\n\nInstall python packages via commands:\n```\npip3 install -r requirements.txt\n```\nDownload pretrained model weights\n```\ncd PROJECT_ROOT_DIR\nbash scripts/download_pretrained_ckpt.sh\n```\n\n## Instance Segmentation With Text Prompts\nInstance segmentor first using sam model to get all obj's mask of the input image. Second using clip model to classify each mask with both\nimage features and your text prompts features.\n\n```\ncd PROJECT_ROOT_DIR\nexport PYTHONPATH=$PWD:$PYTHONPATH\npython tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_bear.jpg --text bear\n```\n\n`Bear Instance Segmentation Result, Text Prompt: bear`\n![bear_insseg_result](./data/resources/test_bear_insseg_result.jpg)\n\n`Athelete Instance Segmentation Result, Text Prompt: athlete`\n![athlete_insseg_result](./data/resources/test_baseball_insseg_result.jpg)\n\n`Horse Instance Segmentation Result, Text Prompt: horse`\n![horse_insseg_result](./data/resources/test_horse_insseg_result_after.jpg)\n\n`Dog Instance Segmentation Result, Text Prompt: dog`\n![dog_insseg_result](./data/resources/test_dog_insseg_result.jpg)\n\n`Fish Instance Segmentation Result, Text Prompt: fish`\n![fish_insseg_result](./data/resources/test_fish_insseg_result.jpg)\n\n`Strawberry Instance Segmentaton Result, Text Prompt: strawberry`\n![strawberry_insseg_result](./data/resources/test_strawberry_insseg_result.jpg)\n\n`Glasses Instance Segmentaton Result, Text Prompt: glasses`\n![glasses_insseg_result](./data/resources/test_glasses_insseg_result.jpg)\n\n`Tv Instance Segmentaton Result, Text Prompt: television`\n![tv_insseg_result](./data/resources/test_tv_insseg_result.jpg)\n\n`Shoes Instance Segmentaton Result, Text Prompt: shoe`\n![shoes_insseg_result](./data/resources/test_shoes_insseg_result.jpg)\n\n`Bridge Instance Segmentaton Result, Text Prompt: bridge`\n![bridge_insseg_result](./data/resources/test_bridge_insseg_result.jpg)\n\n`Airplane Instance Segmentaton Result, Text Prompt: airplane`\n![airplane_insseg_result](./data/resources/test_airplane_insseg_result.jpg)\n\n### Support Multiple Classes Segmentation All In Once ---- YOSO ---- You Only Segment Once\n```\ncd PROJECT_ROOT_DIR\nexport PYTHONPATH=$PWD:$PYTHONPATH\npython tools/sam_clip_text_seg.py --input_image_path ./data/test_images/test_horse.jpg --text \"horse,mountain,grass,sky,clouds,tree\" --cls_score_thresh 0.5 --use_text_prefix\n```\n\n`Horse Instance Segmentation Result, Text Prompt: horse,mountain,grass,sky,clouds,tree`\n![horse_insseg_result](./data/resources/test_horse_insseg_result_muti_label.jpg)\n`Tv Instance Segmentaton Result, Text Prompt: television,audio system,tape recorder,box`\n![tv_insseg_result](./data/resources/test_tv_insseg_result_multi_label.jpg)\n`Strawberry Instance Segmentaton Result, Text Prompt: strawberry,grapefruit,spoon,wolfberry,oatmeal`\n![strawberry_insseg_result](./data/resources/test_strawberry_insseg_result_multi_label.jpg)\n`Frog Instance Segmentaton Result, Text Prompt: frog,turtle,snail,eye`\n![frog_insseg_result](./data/resources/test_frog_insseg_result_multi_label.jpg)\n\n#### Instance Segmentation Provement\n\n##### 2023-04-21 improve background segmentation problem\n\n`Befor Optimize`\n![before](./data/resources/test_horse_insseg_result.jpg)\n`After Optimize`\n![after](./data/resources/test_horse_insseg_result_after.jpg)\n\n## Unsupervised Cluster Semantic Objects From SAM Model\nCluster first using sam model to get all obj's mask of the input image. Second using clip model to extract image features for each objects. Third calculate feature distance of every two object pairs. Finally using a similarity threshold to cluster source objects.\n\nTo test the cluster simply run\n\n```\ncd PROJECT_ROOT_DIR\nexport PYTHONPATH=$PWD:$PYTHONPATH\npython tools/cluster_sam.py --input_image_path ./data/test_images/test_bear.jpg --simi_thresh 0.82\n```\n\n`Bear Cluster Result`\n![bear_cluster_result](./data/resources/test_bear_result.jpg)\n\n`Horse Cluster Result`\n![horse_cluster_result](./data/resources/test_horse_result.jpg)\n\nEach row represents `source image`, `sam origin mask`, `ori masked image`, `clustered mask`, `cluster masked image`\n\n## UPDATES\n\n### 2023-07-04 Integrate MobileSAM\n\nIntegrate MobileSAM into the pipeline for lightweight and faster inference. If you want to use mobile-sam to segment your\nimage all you need to do is to modify `./config/sam.yaml` file. Modify the model name field to `vit_t` and modify the \nmodel weight file path to `./pretrained/sam/mobile_sam.pt`\n\n## TODO\n- [x] Test different kinds of cluster method\n- [x] Using cluster result as input prompts to reseg the image via sam model\n- [ ] Merge embedding feats of global image and masked image\n\n## Acknowledgement\n\nMost of the repo's code borrows from opeai's clip repo and facebook's segment-anything repo:\n\n- [CLIP](https://github.com/openai/CLIP)\n- [segment-anything](https://github.com/facebookresearch/segment-anything)\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=MaybeShewill-CV/segment-anything-u-specify\u0026type=Date)](https://star-history.com/#MaybeShewill-CV/segment-anything-u-specify\u0026Date)\n\n## Visitor Count\n\n![Visitor Count](https://profile-counter.glitch.me/15725187_sam_clip/count.svg)\n\n## Contact\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMaybeShewill-CV%2Fsegment-anything-u-specify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMaybeShewill-CV%2Fsegment-anything-u-specify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMaybeShewill-CV%2Fsegment-anything-u-specify/lists"}