{"id":13441748,"url":"https://github.com/Jumpat/SegmentAnythingin3D","last_synced_at":"2025-03-20T12:32:44.095Z","repository":{"id":154165249,"uuid":"631867864","full_name":"Jumpat/SegmentAnythingin3D","owner":"Jumpat","description":"Segment Anything in 3D with NeRFs (NeurIPS 2023)","archived":false,"fork":false,"pushed_at":"2024-03-04T06:56:24.000Z","size":119632,"stargazers_count":760,"open_issues_count":24,"forks_count":43,"subscribers_count":19,"default_branch":"main","last_synced_at":"2024-03-12T22:32:03.011Z","etag":null,"topics":["3d","3d-segmentation","computer-vision","deep-learning","nerf","segment-anything","segmentation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jumpat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-04-24T08:24:39.000Z","updated_at":"2024-04-15T07:59:36.676Z","dependencies_parsed_at":"2024-01-16T02:45:53.093Z","dependency_job_id":"df01dd49-d645-4c61-97fb-a5f10926b597","html_url":"https://github.com/Jumpat/SegmentAnythingin3D","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jumpat%2FSegmentAnythingin3D","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jumpat%2FSegmentAnythingin3D/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jumpat%2FSegmentAnythingin3D/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jumpat%2FSegmentAnythingin3D/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jumpat","download_url":"https://codeload.github.com/Jumpat/SegmentAnythingin3D/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221760148,"owners_count":16876362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d","3d-segmentation","computer-vision","deep-learning","nerf","segment-anything","segmentation"],"created_at":"2024-07-31T03:01:37.628Z","updated_at":"2025-03-20T12:32:44.078Z","avatar_url":"https://github.com/Jumpat.png","language":"Python","funding_links":[],"categories":["Paper List","Python"],"sub_categories":["Follow-up Papers"],"readme":"# Segment Anything🤖️ in 3D with NeRFs (SA3D)\n### [Project Page](https://jumpat.github.io/SA3D/) | [Arxiv Paper](https://arxiv.org/abs/2304.12308)\n\n[Segment Anything in 3D with NeRFs](https://arxiv.org/abs/2304.12308)  \n[Jiazhong Cen](https://github.com/Jumpat)\u003csup\u003e1\\*\u003c/sup\u003e, [Zanwei Zhou](https://github.com/Zanue)\u003csup\u003e1\\*\u003c/sup\u003e, [Jiemin Fang](https://jaminfong.cn/)\u003csup\u003e2,3†\u003c/sup\u003e, [Chen Yang](https://github.com/chensjtu)\u003csup\u003e1\u003c/sup\u003e, [Wei Shen](https://shenwei1231.github.io/)\u003csup\u003e1✉\u003c/sup\u003e, [Lingxi Xie](http://lingxixie.com/)\u003csup\u003e2\u003c/sup\u003e, [Dongsheng Jiang](https://sites.google.com/site/dongshengjiangbme/)\u003csup\u003e2\u003c/sup\u003e, [Xiaopeng Zhang](https://sites.google.com/site/zxphistory/)\u003csup\u003e2\u003c/sup\u003e, [Qi Tian](https://scholar.google.com/citations?hl=en\u0026user=61b6eYkAAAAJ)\u003csup\u003e2\u003c/sup\u003e   \n\u003csup\u003e1\u003c/sup\u003eAI Institute, SJTU \u0026emsp; \u003csup\u003e2\u003c/sup\u003eHuawei Inc \u0026emsp; \u003csup\u003e3\u003c/sup\u003eSchool of EIC, HUST .  \n\\*denotes equal contribution  \n†denotes project lead.\n\n*Given a NeRF, just input prompts from **one single view** and then get your 3D model.*   \n\u003cimg src=\"imgs/SA3D.gif\" width=\"800\"\u003e\n\nWe propose a novel framework to Segment Anything in 3D, named \u003cb\u003eSA3D\u003c/b\u003e. Given a neural radiance field (NeRF) model, SA3D allows users to obtain the 3D segmentation result of any target object via only \u003cb\u003eone-shot\u003c/b\u003e manual prompting in a single rendered view. The entire process for obtaining the target 3D model can be completed in approximately 2 minutes, yet without any engineering optimization. Our experiments demonstrate the effectiveness of SA3D in different scenes, highlighting the potential of SAM in 3D scene perception. \n\n## Update\n* **2024/04/16**: We release the [3D-GS](https://github.com/graphdeco-inria/gaussian-splatting) version of SA3D ([here](https://github.com/Jumpat/SegmentAnythingin3D/tree/nerfstudio-version)). Now 3D segmentation can be achieved within seconds!\n* **2023/11/11**: We release the [nerfstudio](https://docs.nerf.studio) version of SA3D ([here](https://github.com/Jumpat/SegmentAnythingin3D/tree/nerfstudio-version))! Currently it only supports the text prompt as input.\n* **2023/06/29**: We now support [MobileSAM](https://github.com/ChaoningZhang/MobileSAM) as the segmentation network. Follow the installation instruction in [MobileSAM](https://github.com/ChaoningZhang/MobileSAM), and then download *mobile_sam.pt* into folder ``./dependencies/sam_ckpt``. You can use `--mobile_sam` to switch to MobileSAM.\n\n## Overall Pipeline\n\n![SA3D_pipeline](https://github.com/Jumpat/SegmentAnythingin3D/assets/58475180/6135f473-3239-4721-9a79-15f7a7d11347)\n\nWith input prompts, SAM cuts out the target object from the according view. The obtained 2D segmentation mask is projected onto 3D mask grids via density-guided inverse rendering. 2D masks from other views are then rendered, which are mostly uncompleted but used as cross-view self-prompts to be fed into SAM again. Complete masks can be obtained and projected onto mask grids. This procedure is executed via an iterative manner while accurate 3D masks can be finally learned. SA3D can adapt to various radiance fields effectively without any additional redesigning.\n\n\n## Installation\n\n```\ngit clone https://github.com/Jumpat/SegmentAnythingin3D.git\ncd SegmentAnythingin3D\n\nconda create -n sa3d python=3.10\nconda activate sa3d\npip install -r requirements.txt\n```\n\n### SAM and Grounding-DINO:\n\n```\n# Installing SAM\nmkdir dependencies; cd dependencies \nmkdir sam_ckpt; cd sam_ckpt\nwget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\ngit clone git@github.com:facebookresearch/segment-anything.git \ncd segment-anything; pip install -e .\n\n# Installing Grounding-DINO\ngit clone https://github.com/IDEA-Research/GroundingDINO.git\ncd GroundingDINO/; pip install -e .\nmkdir weights; cd weights\nwget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth\n```\n\n## Download Data\nWe now release the configs on these datasets:\n* *Foward-facing:* [LLFF](https://drive.google.com/drive/folders/14boI-o5hGO9srnWaaogTU5_ji7wkX2S7) \n* *Inward-facing:* [mip-NeRF360](https://jonbarron.info/mipnerf360/), [LERF](https://www.lerf.io/)\n\n### Data structure:  \n\u003cdetails\u003e\n  \u003csummary\u003e (click to expand) \u003c/summary\u003e\n\n    data\n    ├── 360_v2             # Link: https://jonbarron.info/mipnerf360/\n    │   └── [bicycle|bonsai|counter|garden|kitchen|room|stump]\n    │       ├── poses_bounds.npy\n    │       └── [images|images_2|images_4|images_8]\n    │\n    ├── nerf_llff_data     # Link: https://drive.google.com/drive/folders/14boI-o5hGO9srnWaaogTU5_ji7wkX2S7\n    │   └── [fern|flower|fortress|horns|leaves|orchids|room|trex]\n    │       ├── poses_bounds.npy\n    │       └── [images|images_2|images_4|images_8]\n    │\n    └── lerf_data               # Link: https://drive.google.com/drive/folders/1vh0mSl7v29yaGsxleadcj-LCZOE_WEWB\n        └── [book_store|bouquet|donuts|...]\n            ├── transforms.json\n            └── [images|images_2|images_4|images_8]\n\u003c/details\u003e\n\n## Usage\n- Train NeRF\n  ```bash\n  python run.py --config=configs/llff/fern.py --stop_at=20000 --render_video --i_weights=10000\n  ```\n- Run SA3D in GUI\n  ```bash\n  python run_seg_gui.py --config=configs/llff/seg/seg_fern.py --segment \\\n  --sp_name=_gui --num_prompts=20 \\\n  --render_opt=train --save_ckpt\n  ```\n- Render and Save Fly-through Videos\n  ```bash\n  python run_seg_gui.py --config=configs/llff/seg/seg_fern.py --segment \\\n  --sp_name=_gui --num_prompts=20 \\\n  --render_only --render_opt=video --dump_images \\\n  --seg_type seg_img seg_density\n  ```\n\nSome tips when run SA3D:\n- Increase `--num_prompts` when the target object is extremely irregular like LLFF scenes *Fern* and *Trex*;\n- Use `--seg_poses` to specify the camera pose sequence used for training 3D mask, `default='train', choices=['train', 'video']`.\n\nUsing our [Dash](https://github.com/plotly/dash.git) based GUI:\n\n- Select which type of prompt to be used, currently support: *Point Prompt* and *Text Prompt*;\n  - *Point Prompt:* select `Points` in the drop down; click the original image to add a point prompt, then SAM will produce candidate masks; click `Clear Points` to clear out the previous inputs;\n    \n    https://github.com/Jumpat/SegmentAnythingin3D/assets/58475180/9ae39cb2-6a1f-40a7-b7df-6b149e75358f\n    \n    \n  - *Text Prompt:* select `Text` in the drop down;input your text prompt and click `Generate` to get candidate masks; note that unreasonable text input may cause error.\n    \n    https://github.com/Jumpat/SegmentAnythingin3D/assets/58475180/ba934e0c-dc8a-472a-958c-2b6c4d6ee644\n    \n    \n- Select your target mask;\n- Press `Start Training` to run SA3D; we visualize rendered masks and SAM predictions produced by our cross-view self-prompting stategy;\n  \n  https://github.com/Jumpat/SegmentAnythingin3D/assets/58475180/c5cc947e-8966-4ec5-9531-434a7b27eed5\n  \n  \n- Wait a few minutes to see the final rendering results.\n  \n  \n  https://github.com/Jumpat/SegmentAnythingin3D/assets/58475180/9578ea7a-0947-4105-a65c-1f8de12d0bb5\n\n\n# TODO List\n- [ ] Refine the GUI, *e.g.*, start from any train view, add more training hyper-parameter options, etc.;\n- [ ] Support the two-pass stage in GUI; currently it may have some bugs.\n\n## Some Visualization Samples\n\nSA3D can handle various scenes for 3D segmentation. Find more demos in our [project page](https://jumpat.github.io/SA3D/).\n\n| Forward facing | 360° | Multi-objects |\n| :---: | :---:| :---:|\n|\u003cimg src=\"imgs/horns.gif\" width=\"200\"\u003e | \u003cimg src=\"imgs/lego.gif\" width=\"200\"\u003e | \u003cimg src=\"imgs/orchid_multi.gif\" width=\"200\"\u003e\n\n## Acknowledgements\nThanks for the following project for their valuable contributions:\n- [Segment Anything](https://github.com/facebookresearch/segment-anything)\n- [DVGO](https://github.com/sunset1995/DirectVoxGO)\n- [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO.git)\n\n## Citation\nIf you find this project helpful for your research, please consider citing the report and giving a ⭐.\n```BibTex\n@inproceedings{cen2023segment,\n      title={Segment Anything in 3D with NeRFs}, \n      author={Jiazhong Cen and Zanwei Zhou and Jiemin Fang and Chen Yang and Wei Shen and Lingxi Xie and Dongsheng Jiang and Xiaopeng Zhang and Qi Tian},\n      booktitle    = {NeurIPS},\n      year         = {2023},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJumpat%2FSegmentAnythingin3D","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJumpat%2FSegmentAnythingin3D","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJumpat%2FSegmentAnythingin3D/lists"}