{"id":28759974,"url":"https://github.com/ucdvision/gen2seg","last_synced_at":"2025-08-23T14:14:06.300Z","repository":{"id":294611590,"uuid":"986775492","full_name":"UCDvision/gen2seg","owner":"UCDvision","description":"Code for \"gen2seg: Generative Models Enable Generalizable Instance Segmentation\"","archived":false,"fork":false,"pushed_at":"2025-07-12T00:54:16.000Z","size":1157,"stargazers_count":57,"open_issues_count":0,"forks_count":2,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-07-12T03:15:41.905Z","etag":null,"topics":["computer-vision","generative-ai","machine-learning","segmentation","self-supervised-learning","stable-diffusion"],"latest_commit_sha":null,"homepage":"https://reachomk.github.io/gen2seg/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UCDvision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-20T05:36:22.000Z","updated_at":"2025-07-12T00:54:19.000Z","dependencies_parsed_at":"2025-06-17T04:21:54.982Z","dependency_job_id":"bcb26c7b-509c-4777-9e53-ef25e3c9eee6","html_url":"https://github.com/UCDvision/gen2seg","commit_stats":null,"previous_names":["ucdvision/gen2seg"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/UCDvision/gen2seg","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCDvision%2Fgen2seg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCDvision%2Fgen2seg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCDvision%2Fgen2seg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCDvision%2Fgen2seg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UCDvision","download_url":"https://codeload.github.com/UCDvision/gen2seg/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCDvision%2Fgen2seg/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271751925,"owners_count":24814707,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","generative-ai","machine-learning","segmentation","self-supervised-learning","stable-diffusion"],"created_at":"2025-06-17T06:00:32.840Z","updated_at":"2025-08-23T14:14:06.292Z","avatar_url":"https://github.com/UCDvision.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gen2seg: Generative Models Enable Generalizable Instance Segmentation\n[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/reachomk/gen2seg)\n\n### [Project Page](https://reachomk.github.io/gen2seg) | [Paper](https://arxiv.org/abs/2505.15263)\n\n[**gen2seg: Generative Models Enable Generalizable Instance Segmentation**](https://reachomk.github.io/gen2seg)  \n [Om Khangaonkar](https://reachomk.github.io),\n [Hamed Pirsiavash](https://web.cs.ucdavis.edu/~hpirsiav/)\u003cbr\u003e\n UC Davis \u003cbr\u003e\n\u003cimg src='assets/teaser.png'/\u003e\n\n## Pretrained Models\nStable Diffusion 2 (SD): https://huggingface.co/reachomk/gen2seg-sd\n\nImageNet-1K-pretrained Masked Autoencoder-Huge (MAE-H):  https://huggingface.co/reachomk/gen2seg-mae-h\n\nIf you want any of our other models, send me an email. If there is sufficient demand, I will also release them publicly. \n\n## Getting Started\nPlease set up the environment by running\n```\nconda env create -f environment.yml\n```\nand then\n```\nconda activate gen2seg\n```\n##  Inference\nCurrently, we have released inference code for our SD and MAE models. You can run them by editing the `image_path` variable (for your input image) in each file, and then simply running it with `python inference_{mae or sd}.py`.  \n\nYou will need to have `transformers` and `diffusers` installed, along with standard machine learning packages such as `pytorch` and `numpy`.  More details on our specific environment will be released with the training code. \n\nWe have also released code for prompting. Please run `pip install opencv-contrib-python` prior to running this file if you didn't start from our conda environment. \n\nHere is how you run it:\n```\npython prompting.py \\\n    --feature_image /path/to/your/feature_image.png \\\n    --prompt_x [prompt pixel x] \\ \n    --prompt_y [prompt pixel y] \\\n```\nThe feature image is the one generated by our model, NOT the original image. \n\n\nWe also have the additional optional arguments:\n```\n--output_mask /path/to/save/output_mask.png\n--sigma [value between 0 and 1]\n--threshold [value between 0 and 255]\n```\n\nThreshold and sigma allow you to control the mask threshold and the amount of averaging for the query vector, respectively. By default they are 0.01 and 3. See our paper for more details. \n\nWe have also provided our inference script for SAM, to enable qualitative comparison. Please make sure you download the checkpoint and input the path in the script. You should also edit the `image_path` variable (for your input image). \n\n## Training our models\nYou will probably need a 48 GB GPU to train our SD model, but MAE will work on 24GB.   \n\n### Data\nWe use two datasets, Hypersim and Virtual Kitti 2.\n\nYou can download Virtual Kitti 2 directly from this link: https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/\n\nPlease download the rgb and instanceSegmentation tars. To work off-the-shelf with our current dataloader, please extract them into the same directory. This way, for a given scene, the RGB and segmentation will be under `frames/rgb` and `frames/instanceSegmentation` respectively. You can see the `VirtualKITTI2._find_pairs` function in `training/dataloaders/load.py` for more details. \n\nFor Hypersim, I recommend downloading using this script: https://github.com/apple/ml-hypersim/tree/main/contrib/99991\n\nAssuming you have a root folder `root`, you should download the RGB frames (`scene_cam_00_final_preview/*.color.jpg`) into `root/rgb`. You also will need to download the segmentation annotations (`scene_cam_03_geometry_hdf5/*..semantic_instance.hdf5`). You will to convert these RGB annotations by assigning the background as black and each mask a unique color (that is not black or white). Please delete all frames that do not have any annotations. If you keep these it will degrade performance. I also found deleting scenes with less than 10 annotated objects helped.  Please place the colored annotations into `root/instance-rgb`. \n\nYou will need to specify the path to each dataset at line 360 in `training/train.py`, or line 274 in `training/train_mae_full.py`.  \n\n### Training\nBefore beginning, please modify the `num_processes` variable in `training/scripts/multi_gpu.yaml` with the number of GPUs you want to parallelize over. \n\nTo train our models, please run the following scripts. Descriptions of the arguments are available in the respective training scripts. \n\nStable Diffusion:\n```./training/scripts/train_stable_diffusion_e2e_ft_instance.sh```\n\nMAE:\n```./training/scripts/train_mae_full_e2e_ft_instance.sh```\n\nPlease let me know if you want more details or have any questions. \n\n##  Citation\nPlease cite our paper if it was helpful or you liked it. \n```\n@article{khangaonkar2025gen2seg,\n      title={gen2seg: Generative Models Enable Generalizable Instance Segmentation}, \n      author={Om Khangaonkar and Hamed Pirsiavash},\n      year={2025},\n      journal={arXiv preprint arXiv:2505.15263}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fucdvision%2Fgen2seg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fucdvision%2Fgen2seg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fucdvision%2Fgen2seg/lists"}