https://github.com/Jiahao000/MosaicFusion
[IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
https://github.com/Jiahao000/MosaicFusion
diffusion-models instance-segmentation long-tailed object-detection open-vocabulary pytorch
Last synced: 3 months ago
JSON representation
[IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
- Host: GitHub
- URL: https://github.com/Jiahao000/MosaicFusion
- Owner: Jiahao000
- License: other
- Created: 2023-09-24T14:50:09.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-10-08T09:59:30.000Z (9 months ago)
- Last Synced: 2024-10-31T00:40:05.434Z (8 months ago)
- Topics: diffusion-models, instance-segmentation, long-tailed, object-detection, open-vocabulary, pytorch
- Language: Python
- Homepage:
- Size: 28.5 MB
- Stars: 112
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
README
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
1S-Lab, 2Nanyang Technological University
:triangular_flag_on_post: Accepted to IJCV 2024We present MosaicFusion, a general diffusion-based data augmentation pipeline for large-vocabulary instance segmentation. The MosaicFusion-synthesized instance segmentation dataset can be used to train various downstream detection and segmentation models to improve their performances, especially for rare and novel categories.
![]()
🤩 Key Properties
Training-free
Directly generate multiple objects
Agnostic to detection architectures
Without extra detectors or segmentors
---
## 😎 Method
MosaicFusion is a training-free diffusion-based dataset augmentation pipeline that can produce image and mask pairs with multiple objects simultaneously using the off-the-shelf text-to-image diffusion models. The overall pipeline of MosaicFusion consists of two components: image generation and mask generation.
![]()
## 🥰 Qualitative Examples
Given only interest category names, MosaicFusion can generate high-quality multi-object images and masks simultaneously by conditioning on a specific text prompt for each region.
![]()
## 🛠️ Usage
### Installation
- Clone our [repo](https://github.com/Jiahao000/MosaicFusion) from GitHub:
```shell
git clone https://github.com/Jiahao000/MosaicFusion.git
cd MosaicFusion
```
- Create the `conda` environment:
```shell
conda env create -f environment.yml
```
- Download [lvis_v1_train.json](https://dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip), unzip and put it under a directory, e.g., `data/lvis/meta/lvis_v1_train.json`.### Data Generation
1. Generate images and masks with MosaicFusion:
```shell
bash scripts/dist_text2seg.sh "a photo of a single category" output/text2seg Generation_log
```
Alternatively, if you run `MosaicFusion` on a cluster managed with [slurm](https://slurm.schedmd.com/):
```shell
bash scripts/slurm_text2seg.sh Dummy Generation_job "a photo of a single category" output/text2seg Generation_log
```
2. Convert generated images and masks to the required data format:
```shell
bash scripts/run_seg2ann.sh output/text2seg output/seg2ann
```
3. Merge MosaicFusion annotations into LVIS annotations:
```shell
bash scripts/run_merge_ann.sh data/lvis/meta/lvis_v1_train.json output/seg2ann/annotations/lvis_v1_train_mosaicfusion.json output/seg2ann/annotations/lvis_v1_train_merged.json
```### Training Downstream Detectors or Segmentors
Please refer to [TRAIN.md](TRAIN.md) for training details.
## 👨💻 Todo
- [x] Data generation code for MosaicFusion
- [ ] Third-party training code with MosaicFusion data## 🤟 Citation
If you find this work useful for your research, please consider citing our paper:
```bibtex
@article{xie2024mosaicfusion,
author = {Xie, Jiahao and Li, Wei and Li, Xiangtai and Liu, Ziwei and Ong, Yew Soon and Loy, Chen Change},
title = {MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation},
journal = {International Journal of Computer Vision},
year = {2024}
}
```## 🗞️ License
Distributed under the S-Lab License. See [LICENSE](./LICENSE) for more information.