Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/showlab/datasetdm

[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
https://github.com/showlab/datasetdm

Last synced: 7 days ago
JSON representation

[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models

Host: GitHub
URL: https://github.com/showlab/datasetdm
Owner: showlab
Created: 2023-08-11T11:43:54.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-11-03T10:56:51.000Z (about 1 year ago)
Last Synced: 2024-12-20T02:04:26.881Z (14 days ago)
Language: Python
Homepage: https://weijiawu.github.io/DatasetDM_page/
Size: 2.73 MB
Stars: 309
Watchers: 16
Forks: 15
Open Issues: 18
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# DatasetDM (NeurIPS2023)
Official code for 'DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models'

[Project Website](https://weijiawu.github.io/DatasetDM_page/) | [Paper](https://arxiv.org/abs/2308.06160)

## :fire: News

* **[2023.9.24]** The weights for P-Decoder on VOC2012 and COCO2017 are released.

* **[2023.9.22]** the paper was accepted by NeurIPS2023.

* **[note]** We will release the code within three months. Please wait.

* **[2023.8.11]** We initialize the Repo.

## :paintbrush: DEMO
ToDo

## :notes: Introduction

## :bookmark_tabs: Supported Task
- [x] Instance Segmentation (COCO2017)
- [x] Semantic Segmentation (VOC, Cityscapes)
- [x] Depth Estimation
- [x] Open Pose
- [x] DeepFashion Segmentation
- [x] Open Segmentation
- [x] Long-tail Segmentation

## :bulb: Demo
To demonstrate the high-quality synthetic data, we visualized synthetic data from two domains: human-centric and urban city:

Large language model, GPT-4, is adopted to enhance the diversity of generative data:

## :bulb: Todo
- [ ] Hugging Face Demo
- [ ] ...

## :hammer_and_wrench: Getting Started

### Installation
```sh
conda create -n DatasetDM python=3.8
```

Install the corresponding torch==1.9.1, please refer to [pytorch](https://pytorch.org/get-started/previous-versions/).
Such as:
```sh
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
```
Then install other packages:
```sh
python -m pip install -r requirements.txt
```

Download the weights and configuration files of [SD 1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main) and place them in the ```./dataset/ckpts``` directory.

Or we have uploaded the corresponding SD weights used in my experiments to Google Drive (around 4.5GB) as following:

https://drive.google.com/file/d/12lrOexljsyvFB30-ltbYXnIpQ8oP4lrW/view?usp=sharing

Download the diffusers
```
cd model
git clone https://github.com/huggingface/diffusers.git
```
There may be some errors (such as https://github.com/showlab/DatasetDM/issues/11) due to the update of the Diffuser version. We recommend using Diffuser 0.3.0 (https://pypi.org/project/diffusers/0.3.0/#files).

Alternatively, you can directly utilize our diffuser, as there have been some modifications in ```./model/diffusers/models/unet_blocks.py```.

### Dataset Prepare

- Depth Estimation: Please follow [MED](https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox) to prepare the dataset on ```./data```

- Segmentation: VOC, Cityscapes, and COCO: Please follow [Mask2former](https://github.com/facebookresearch/Mask2Former) to prepare the dataset on ```./data```

The final dataset should be ordered as follow:

```
data/
PascalVOC12/
JPEGImages
SegmentationClassAug
splits/
train_aug.txt
COCO2017/
train2017/
2011_003261.jpg
...
annotations/
instances_train2017.json
person_keypoints_train2017.json
VirtualKITTI2/
Depth/
Scene01
Scene02
...
Image/
Scene01
Scene02
...
nyudepthv2/
sync/
official_splits/
test/
nyu_class_list.json
train_list.txt
test_list.txt
kitti/
input/
gt_depth/
kitti_eigen_train.txt
deepfashion-mm/
images/
segm/
captions.json/
train_set.txt/
test_set.txt
```

Besides, you also need to order the prompt txt files as follows:

```
dataset/
Prompts_From_GPT/
deepfashion_mm/
general.txt
coco_pose/
general.txt
KITTI/
general.txt
NYU/
general.txt
coco/
toothbrush.txt
hair drier.txt
book.txt
...
cityscapes/
bicycle.txt
motorcycle.txt
bus.txt
...
```

## :paintbrush: Table of Contents
- [Semantic Segmentation](#Semantic-Segmentation)
- [Instance Segmentation](#Instance-Segmentation)
- [Depth Estimation](#Depth-Estimation)
- [Open Pose](#Open-Pose)
- [Zero-Shot Semantic Segmentation](#Zero-Shot-Semantic-Segmentation)
- [Fashion Segmentation](#Fashion-Segmentation)
- [Long tail Segmentation](#Long-tail-Segmentation)

### Semantic Segmentation

#### [VOC 2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/)

Training the P-deocder, or we also provide the [trained weight](https://drive.google.com/file/d/12BzF51jwKpwmUb-jDyB5LHRrlk6DnfxR/view?usp=sharing) with 100 real images.

```sh
# For Segmentation Tasks
sh ./script/train_semantic_VOC.sh
```

```sh
# Generate synthetic data for VOC
sh ./script/data_generation_VOC_semantic.sh
```

```sh
# Visualization of generative data
python ./DataDiffusion/vis_VOC.py
```

#### [Cityscapes](https://www.cityscapes-dataset.com/)

```sh
# For Segmentation Tasks
sh ./script/train_semantic_Cityscapes.sh
```

```sh
# Generate synthetic data for Cityscapes
sh ./script/data_generation_Cityscapes_semantic.sh
```

Before training the existing segmentation model~(), you should adopt the augmentation:
```sh
sh ./script/augmentation_Cityscapes.sh
```

```sh
# Visualization of generative data
python ./DataDiffusion/vis_Cityscapes.py
```

### Instance Segmentation

#### [COCO 2017](https://cocodataset.org/#home)
Training the P-deocder, or we also provide the [trained weight](https://drive.google.com/file/d/12MlNvk_P5k_MYbFvD65v4KiaaGV9Jysx/view?usp=sharing) with 400 real images.

```sh
# For Segmentation Tasks
sh ./script/train_COCO.sh
```

```sh
# Generate synthetic data for COCO
sh ./script/data_generation_coco_instance.sh
```

```sh
# Visualization of generative data
python ./DataDiffusion/vis_COCO.py
```

Data Augmentation with image splicing
```sh
# Augmentation of generative data
sh ./script/augmentation_coco.sh
```
Then training [Mask2former](https://github.com/facebookresearch/Mask2Former) with these synthetic data, enjoy!

### Depth Estimation

#### [KITTI](https://www.cvlibs.net/datasets/kitti/)

```sh
# Training Depth Estimation Tasks on KITTI
sh ./script/train_depth_KITTI.sh
```

If you want to training with Virtual_KITTI_2, using the blow script:
```sh
# Training Depth Estimation Tasks on Virtual KITTI 2
sh ./script/train_depth_Virtual_KITTI_2.sh
```

```sh
# Generate synthetic data for KITTI
sh ./script/data_generation_KITTI_depth.sh
```

Then training any existing [Depth Estimation Method](https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox/tree/633580d0ed7395734a034abfebf2ddb6ad579d75) with these synthetic data, enjoy!

In our paper, we adopt [Depthformer](https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox/tree/633580d0ed7395734a034abfebf2ddb6ad579d75/configs/depthformer) to valid the quality of generative data.

#### [NYU-Depth-V2](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html)

```sh
# For Depth Estimation Tasks
sh ./script/train_depth_NYU.sh
```

```sh
# Generate synthetic data for NYU
sh ./script/data_generation_NYU_depth.sh
```

Data Augmentation with image splicing
```sh
# Augmentation of generative data
sh ./script/augmentation_NYU.sh
```

Then training any existing [Depth Estimation Method](https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox/tree/633580d0ed7395734a034abfebf2ddb6ad579d75) with these synthetic data, enjoy!

### Open Pose

#### [COCO 2017](https://cocodataset.org/#home)

```sh
# Training Pose Estimation Tasks on COCO2017
sh ./script/train_pose_coco.sh
```

```sh
# Generate synthetic data for Pose on COCO
sh ./script/data_generation_COCO_Pose.sh
```

Then you need convert the data to coco format, and training any existing Pose Estimation Method with these dataset. Here, we adopt [SimCC](https://github.com/leeyegy/SimCC) to valid the quality of generative data.

### Zero-Shot Semantic Segmentation
#### [PASCAL VOC 2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/)
Download [VOC 2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/), and order the dataset.

```sh
# For Zero Shot Segmentation Tasks
sh ./script/train_semantic_VOC_zero_shot.sh
```

```sh
# Generate synthetic data for VOC
sh ./script/data_generation_VOC_semantic.sh
```

Data Augmentation with image splicing
```sh
# Augmentation of generative data
sh ./script/augmentation_VOC.sh
```
Then training [Mask2former](https://github.com/facebookresearch/Mask2Former) with these synthetic data, enjoy!

### Fashion-Segmentation

#### [DeepFashion-MM](https://github.com/yumingj/DeepFashion-MultiModal)
Download [DeepFashion-MM](https://github.com/yumingj/DeepFashion-MultiModal), and order the dataset.

```sh
# Train DeepFashion Segmentation Tasks
sh ./script/train_semantic_DeepFashion_MM.sh
```

```sh
# Generate synthetic data for DeepFashion-MM
sh ./script/parallel_generate_Semantic_DeepFashion.py
```

Then training [Mask2former](https://github.com/facebookresearch/Mask2Former) or other Segmentation Methods~[mmsegmentation](https://github.com/open-mmlab/mmsegmentation) with these synthetic data, enjoy!

### Long-tail-Segmentation (VOC)

```sh
# For LongTail semantic segmentation
sh ./script/train_semantic_VOC_LongTail.sh
```

```sh
# Generate synthetic data for VOC
sh ./script/data_generation_VOC_semantic.sh
```

Data Augmentation with image splicing
```sh
# Augmentation of generative data
sh ./script/augmentation_VOC.sh
```

### Long-tail-Segmentation (LVIS)

```sh
# For LongTail semantic segmentation
sh ./script/train_instance_LVIS.sh
```

```sh
# Generate synthetic data for VOC
sh ./script/data_generation_LVIS_instance.sh
```

## Acknowledgements
This work draws inspiration from the following code as references. We extend our gratitude to these remarkable contributions:

- [Grounded-Diffusion](https://github.com/Lipurple/Grounded-Diffusion)
- [Mask2Former](https://github.com/facebookresearch/Mask2Former)
- [GenPromp](https://github.com/callsys/GenPromp)
- [Depthformer](https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox/tree/633580d0ed7395734a034abfebf2ddb6ad579d75/configs/depthformer)
- [DeepFashion-MM](https://github.com/yumingj/DeepFashion-MultiModal)

## Citation

```
@article{wu2023datasetdm,
title={DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models},
author={Wu, Weijia and Zhao, Yuzhong and Chen, Hao and Gu, Yuchao and Zhao, Rui and He, Yefei and Zhou, Hong and Shou, Mike Zheng and Shen, Chunhua},
journal={Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)},
year={2023}
}
```