https://github.com/Jiahao000/MosaicFusion

[IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
https://github.com/Jiahao000/MosaicFusion

diffusion-models instance-segmentation long-tailed object-detection open-vocabulary pytorch

Last synced: 3 months ago
JSON representation

[IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

Host: GitHub
URL: https://github.com/Jiahao000/MosaicFusion
Owner: Jiahao000
License: other
Created: 2023-09-24T14:50:09.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-10-08T09:59:30.000Z (9 months ago)
Last Synced: 2024-10-31T00:40:05.434Z (8 months ago)
Topics: diffusion-models, instance-segmentation, long-tailed, object-detection, open-vocabulary, pytorch
Language: Python
Homepage:
Size: 28.5 MB
Stars: 112
Watchers: 3
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

Jiahao Xie¹
Wei Li¹
Xiangtai Li¹
Ziwei Liu¹
Yew Soon Ong²
Chen Change Loy¹

¹S-Lab, ²Nanyang Technological University

:triangular_flag_on_post: Accepted to IJCV 2024

• [arXiv] •

We present MosaicFusion, a general diffusion-based data augmentation pipeline for large-vocabulary instance segmentation. The MosaicFusion-synthesized instance segmentation dataset can be used to train various downstream detection and segmentation models to improve their performances, especially for rare and novel categories.

🤩 Key Properties

Training-free

Directly generate multiple objects

Agnostic to detection architectures

Without extra detectors or segmentors

---

## 😎 Method

MosaicFusion is a training-free diffusion-based dataset augmentation pipeline that can produce image and mask pairs with multiple objects simultaneously using the off-the-shelf text-to-image diffusion models. The overall pipeline of MosaicFusion consists of two components: image generation and mask generation.

## 🥰 Qualitative Examples

Given only interest category names, MosaicFusion can generate high-quality multi-object images and masks simultaneously by conditioning on a specific text prompt for each region.

## 🛠️ Usage

### Installation

- Clone our [repo](https://github.com/Jiahao000/MosaicFusion) from GitHub:
```shell
git clone https://github.com/Jiahao000/MosaicFusion.git
cd MosaicFusion
```
- Create the `conda` environment:
```shell
conda env create -f environment.yml
```
- Download [lvis_v1_train.json](https://dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip), unzip and put it under a directory, e.g., `data/lvis/meta/lvis_v1_train.json`.

### Data Generation

1. Generate images and masks with MosaicFusion:
```shell
bash scripts/dist_text2seg.sh "a photo of a single category" output/text2seg Generation_log
```
Alternatively, if you run `MosaicFusion` on a cluster managed with [slurm](https://slurm.schedmd.com/):
```shell
bash scripts/slurm_text2seg.sh Dummy Generation_job "a photo of a single category" output/text2seg Generation_log
```
2. Convert generated images and masks to the required data format:
```shell
bash scripts/run_seg2ann.sh output/text2seg output/seg2ann
```
3. Merge MosaicFusion annotations into LVIS annotations:
```shell
bash scripts/run_merge_ann.sh data/lvis/meta/lvis_v1_train.json output/seg2ann/annotations/lvis_v1_train_mosaicfusion.json output/seg2ann/annotations/lvis_v1_train_merged.json
```

### Training Downstream Detectors or Segmentors

Please refer to [TRAIN.md](TRAIN.md) for training details.

## 👨‍💻 Todo
- [x] Data generation code for MosaicFusion
- [ ] Third-party training code with MosaicFusion data

## 🤟 Citation
If you find this work useful for your research, please consider citing our paper:
```bibtex
@article{xie2024mosaicfusion,
author = {Xie, Jiahao and Li, Wei and Li, Xiangtai and Liu, Ziwei and Ong, Yew Soon and Loy, Chen Change},
title = {MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation},
journal = {International Journal of Computer Vision},
year = {2024}
}
```

## 🗞️ License

Distributed under the S-Lab License. See [LICENSE](./LICENSE) for more information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Jiahao000/MosaicFusion

Awesome Lists containing this project

README

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

• [arXiv] •