https://github.com/ucdvision/gen2seg

Code for "gen2seg: Generative Models Enable Generalizable Instance Segmentation"
https://github.com/ucdvision/gen2seg

computer-vision generative-ai machine-learning segmentation self-supervised-learning stable-diffusion

Last synced: 10 months ago
JSON representation

Code for "gen2seg: Generative Models Enable Generalizable Instance Segmentation"

Host: GitHub
URL: https://github.com/ucdvision/gen2seg
Owner: UCDvision
Created: 2025-05-20T05:36:22.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-07-12T00:54:16.000Z (12 months ago)
Last Synced: 2025-07-12T03:15:41.905Z (12 months ago)
Topics: computer-vision, generative-ai, machine-learning, segmentation, self-supervised-learning, stable-diffusion
Language: Python
Homepage: https://reachomk.github.io/gen2seg/
Size: 1.1 MB
Stars: 57
Watchers: 6
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# gen2seg: Generative Models Enable Generalizable Instance Segmentation
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/reachomk/gen2seg)

### [Project Page](https://reachomk.github.io/gen2seg) | [Paper](https://arxiv.org/abs/2505.15263)

[**gen2seg: Generative Models Enable Generalizable Instance Segmentation**](https://reachomk.github.io/gen2seg)
[Om Khangaonkar](https://reachomk.github.io),
[Hamed Pirsiavash](https://web.cs.ucdavis.edu/~hpirsiav/)

UC Davis

## Pretrained Models
Stable Diffusion 2 (SD): https://huggingface.co/reachomk/gen2seg-sd

ImageNet-1K-pretrained Masked Autoencoder-Huge (MAE-H): https://huggingface.co/reachomk/gen2seg-mae-h

If you want any of our other models, send me an email. If there is sufficient demand, I will also release them publicly.

## Getting Started
Please set up the environment by running
```
conda env create -f environment.yml
```
and then
```
conda activate gen2seg
```
## Inference
Currently, we have released inference code for our SD and MAE models. You can run them by editing the `image_path` variable (for your input image) in each file, and then simply running it with `python inference_{mae or sd}.py`.

You will need to have `transformers` and `diffusers` installed, along with standard machine learning packages such as `pytorch` and `numpy`. More details on our specific environment will be released with the training code.

We have also released code for prompting. Please run `pip install opencv-contrib-python` prior to running this file if you didn't start from our conda environment.

Here is how you run it:
```
python prompting.py \
--feature_image /path/to/your/feature_image.png \
--prompt_x [prompt pixel x] \
--prompt_y [prompt pixel y] \
```
The feature image is the one generated by our model, NOT the original image.

We also have the additional optional arguments:
```
--output_mask /path/to/save/output_mask.png
--sigma [value between 0 and 1]
--threshold [value between 0 and 255]
```

Threshold and sigma allow you to control the mask threshold and the amount of averaging for the query vector, respectively. By default they are 0.01 and 3. See our paper for more details.

We have also provided our inference script for SAM, to enable qualitative comparison. Please make sure you download the checkpoint and input the path in the script. You should also edit the `image_path` variable (for your input image).

## Training our models
You will probably need a 48 GB GPU to train our SD model, but MAE will work on 24GB.

### Data
We use two datasets, Hypersim and Virtual Kitti 2.

You can download Virtual Kitti 2 directly from this link: https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/

Please download the rgb and instanceSegmentation tars. To work off-the-shelf with our current dataloader, please extract them into the same directory. This way, for a given scene, the RGB and segmentation will be under `frames/rgb` and `frames/instanceSegmentation` respectively. You can see the `VirtualKITTI2._find_pairs` function in `training/dataloaders/load.py` for more details.

For Hypersim, I recommend downloading using this script: https://github.com/apple/ml-hypersim/tree/main/contrib/99991

Assuming you have a root folder `root`, you should download the RGB frames (`scene_cam_00_final_preview/*.color.jpg`) into `root/rgb`. You also will need to download the segmentation annotations (`scene_cam_03_geometry_hdf5/*..semantic_instance.hdf5`). You will to convert these RGB annotations by assigning the background as black and each mask a unique color (that is not black or white). Please delete all frames that do not have any annotations. If you keep these it will degrade performance. I also found deleting scenes with less than 10 annotated objects helped. Please place the colored annotations into `root/instance-rgb`.

You will need to specify the path to each dataset at line 360 in `training/train.py`, or line 274 in `training/train_mae_full.py`.

### Training
Before beginning, please modify the `num_processes` variable in `training/scripts/multi_gpu.yaml` with the number of GPUs you want to parallelize over.

To train our models, please run the following scripts. Descriptions of the arguments are available in the respective training scripts.

Stable Diffusion:
```./training/scripts/train_stable_diffusion_e2e_ft_instance.sh```

MAE:
```./training/scripts/train_mae_full_e2e_ft_instance.sh```

Please let me know if you want more details or have any questions.

## Citation
Please cite our paper if it was helpful or you liked it.
```
@article{khangaonkar2025gen2seg,
title={gen2seg: Generative Models Enable Generalizable Instance Segmentation},
author={Om Khangaonkar and Hamed Pirsiavash},
year={2025},
journal={arXiv preprint arXiv:2505.15263}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ucdvision/gen2seg

Awesome Lists containing this project

README