An open API service indexing awesome lists of open source software.

https://github.com/Jiahao000/MosaicFusion

[IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
https://github.com/Jiahao000/MosaicFusion

diffusion-models instance-segmentation long-tailed object-detection open-vocabulary pytorch

Last synced: 3 months ago
JSON representation

[IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

Awesome Lists containing this project

README

        

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation


Jiahao Xie1
Wei Li1
Xiangtai Li1
Ziwei Liu1
Yew Soon Ong2
Chen Change Loy1


1S-Lab, 2Nanyang Technological University


:triangular_flag_on_post: Accepted to IJCV 2024



[arXiv]


We present MosaicFusion, a general diffusion-based data augmentation pipeline for large-vocabulary instance segmentation. The MosaicFusion-synthesized instance segmentation dataset can be used to train various downstream detection and segmentation models to improve their performances, especially for rare and novel categories.



🤩 Key Properties




  • Training-free

  • Directly generate multiple objects

  • Agnostic to detection architectures

  • Without extra detectors or segmentors



  • ---

    ## 😎 Method

    MosaicFusion is a training-free diffusion-based dataset augmentation pipeline that can produce image and mask pairs with multiple objects simultaneously using the off-the-shelf text-to-image diffusion models. The overall pipeline of MosaicFusion consists of two components: image generation and mask generation.



    ## 🥰 Qualitative Examples

    Given only interest category names, MosaicFusion can generate high-quality multi-object images and masks simultaneously by conditioning on a specific text prompt for each region.



    ## 🛠️ Usage

    ### Installation

    - Clone our [repo](https://github.com/Jiahao000/MosaicFusion) from GitHub:
    ```shell
    git clone https://github.com/Jiahao000/MosaicFusion.git
    cd MosaicFusion
    ```
    - Create the `conda` environment:
    ```shell
    conda env create -f environment.yml
    ```
    - Download [lvis_v1_train.json](https://dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip), unzip and put it under a directory, e.g., `data/lvis/meta/lvis_v1_train.json`.

    ### Data Generation

    1. Generate images and masks with MosaicFusion:
    ```shell
    bash scripts/dist_text2seg.sh "a photo of a single category" output/text2seg Generation_log
    ```
    Alternatively, if you run `MosaicFusion` on a cluster managed with [slurm](https://slurm.schedmd.com/):
    ```shell
    bash scripts/slurm_text2seg.sh Dummy Generation_job "a photo of a single category" output/text2seg Generation_log
    ```
    2. Convert generated images and masks to the required data format:
    ```shell
    bash scripts/run_seg2ann.sh output/text2seg output/seg2ann
    ```
    3. Merge MosaicFusion annotations into LVIS annotations:
    ```shell
    bash scripts/run_merge_ann.sh data/lvis/meta/lvis_v1_train.json output/seg2ann/annotations/lvis_v1_train_mosaicfusion.json output/seg2ann/annotations/lvis_v1_train_merged.json
    ```

    ### Training Downstream Detectors or Segmentors

    Please refer to [TRAIN.md](TRAIN.md) for training details.

    ## 👨‍💻 Todo
    - [x] Data generation code for MosaicFusion
    - [ ] Third-party training code with MosaicFusion data

    ## 🤟 Citation
    If you find this work useful for your research, please consider citing our paper:
    ```bibtex
    @article{xie2024mosaicfusion,
    author = {Xie, Jiahao and Li, Wei and Li, Xiangtai and Liu, Ziwei and Ong, Yew Soon and Loy, Chen Change},
    title = {MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation},
    journal = {International Journal of Computer Vision},
    year = {2024}
    }
    ```

    ## 🗞️ License

    Distributed under the S-Lab License. See [LICENSE](./LICENSE) for more information.