https://github.com/aim-uofa/genpercept
[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models
https://github.com/aim-uofa/genpercept
depth-estimation dichotomous-image-segmentation human-pose-estimation iclr2025 image-matting monocular-depth-estimation one-step semantic-segmentation surface-normals
Last synced: about 2 months ago
JSON representation
[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models
- Host: GitHub
- URL: https://github.com/aim-uofa/genpercept
- Owner: aim-uofa
- License: bsd-2-clause
- Created: 2024-04-03T13:17:04.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-24T06:32:19.000Z (over 1 year ago)
- Last Synced: 2026-01-26T11:37:15.684Z (3 months ago)
- Topics: depth-estimation, dichotomous-image-segmentation, human-pose-estimation, iclr2025, image-matting, monocular-depth-estimation, one-step, semantic-segmentation, surface-normals
- Language: Python
- Homepage: https://huggingface.co/spaces/guangkaixu/GenPercept
- Size: 38.2 MB
- Stars: 220
- Watchers: 5
- Forks: 8
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
[ICLR2025] What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?
Former Title: "Diffusion Models Trained with Large Data Are Transferable Visual Models"
[Guangkai Xu](https://github.com/guangkaixu/),
[Yongtao Ge](https://yongtaoge.github.io/),
[Mingyu Liu](https://mingyulau.github.io/),
[Chengxiang Fan](https://leaf1170124460.github.io/),
[Kangyang Xie](https://github.com/felix-ky),
[Zhiyue Zhao](https://github.com/ZhiyueZhau),
[Hao Chen](https://stan-haochen.github.io/),
[Chunhua Shen](https://cshen.github.io/),
Zhejiang University
### [HuggingFace (Space)](https://huggingface.co/spaces/guangkaixu/GenPercept) | [HuggingFace (Model)](https://huggingface.co/guangkaixu/genpercept-models) | [arXiv](https://arxiv.org/abs/2403.06090)
#### 🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️
## 📢 News
- 2025.1.24: 🎉🎉🎉 GenPercept has been accepted by ICLR 2025. 🎉🎉🎉
- 2024.10.25: Update GenPercept [Huggingface]((https://huggingface.co/spaces/guangkaixu/GenPercept)) App demo.
- 2024.10.24: Release latest training and inference code, which is armed with the [accelerate](https://github.com/huggingface/accelerate) library and based on [Marigold](https://github.com/prs-eth/marigold).
- 2024.10.24: Release [arXiv v3 paper](https://arxiv.org/abs/2403.06090v3). We reorganize the structure of the paper and offer more detailed analysis.
- 2024.4.30: Release checkpoint weights of surface normal and dichotomous image segmentation.
- 2024.4.7: Add [HuggingFace](https://huggingface.co/spaces/guangkaixu/GenPercept) App demo.
- 2024.4.6: Release inference code and depth checkpoint weight of GenPercept in the [GitHub](https://github.com/aim-uofa/GenPercept) repo.
- 2024.3.15: Release [arXiv v2 paper](https://arxiv.org/abs/2403.06090v2), with supplementary material.
- 2024.3.10: Release [arXiv v1 paper](https://arxiv.org/abs/2403.06090v1).
## 📚 Download Resource Summary
- Space-Huggingface demo: https://huggingface.co/spaces/guangkaixu/GenPercept.
- Models-all (including ablation study): https://huggingface.co/guangkaixu/genpercept-exps.
- Models-main-paper: https://huggingface.co/guangkaixu/genpercept-models.
- Models-depth: https://huggingface.co/guangkaixu/genpercept-depth.
- Models-normal: https://huggingface.co/guangkaixu/genpercept-normal.
- Models-dis: https://huggingface.co/guangkaixu/genpercept-dis.
- Models-matting: https://huggingface.co/guangkaixu/genpercept-matting.
- Models-seg: https://huggingface.co/guangkaixu/genpercept-seg.
- Models-disparity: https://huggingface.co/guangkaixu/genpercept-disparity.
- Models-disparity-dpt-head: https://huggingface.co/guangkaixu/genpercept-disparity-dpt-head.
- Datasets-input demo: https://huggingface.co/datasets/guangkaixu/genpercept-input-demo.
- Datasets-evaluation data: https://huggingface.co/datasets/guangkaixu/genpercept_datasets_eval.
- Datasets-evaluation results: https://huggingface.co/datasets/guangkaixu/genpercept-exps-eval.
## 🖥️ Dependencies
```bash
conda create -n genpercept python=3.10
conda activate genpercept
pip install -r requirements.txt
pip install -e .
```
## 🚀 Inference
### Using Command-line Scripts
Download the [stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) and [our trained models](https://huggingface.co/guangkaixu/genpercept-models) from HuggingFace and put the checkpoints under ```./pretrained_weights/``` and ```./weights/```, respectively. You can download them with the script ```script/download_sd21.sh``` and ```script/download_weights.sh```, or download the weights of [depth](https://huggingface.co/guangkaixu/genpercept-depth), [normal](https://huggingface.co/guangkaixu/genpercept-normal), [Dichotomous Image Segmentation](https://huggingface.co/guangkaixu/genpercept-dis), [matting](https://huggingface.co/guangkaixu/genpercept-matting), [segmentation](https://huggingface.co/guangkaixu/genpercept-seg), [disparity](https://huggingface.co/guangkaixu/genpercept-disparity), [disparity_dpt_head](https://huggingface.co/guangkaixu/genpercept-disparity-dpt-head) seperately.
Then, place images in the ```./input/``` dictionary. We offer demo images in [Huggingface](guangkaixu/genpercept-input-demo), and you can also download with the script ```script/download_sample_data.sh```. Then, run inference with scripts as below.
```bash
# Depth
source script/infer/main_paper/inference_genpercept_depth.sh
# Normal
source script/infer/main_paper/inference_genpercept_normal.sh
# Dis
source script/infer/main_paper/inference_genpercept_dis.sh
# Matting
source script/infer/main_paper/inference_genpercept_matting.sh
# Seg
source script/infer/main_paper/inference_genpercept_seg.sh
# Disparity
source script/infer/main_paper/inference_genpercept_disparity.sh
# Disparity_dpt_head
source script/infer/main_paper/inference_genpercept_disparity_dpt_head.sh
```
If you would like to change the input folder path, unet path, and output path, input these parameters like:
```bash
# Assign a values
input_rgb_dir=...
unet=...
output_dir=...
# Take depth as example
source script/infer/main_paper/inference_genpercept_depth.sh $input_rgb_dir $unet $output_dir
```
For a general inference script, please see ```script/infer/inference_general.sh``` in detail.
***Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)***
### Using torch.hub
TODO
## 🔥 Train
NOTE: We implement the training with the [accelerate](https://github.com/huggingface/accelerate) library, but find a worse training accuracy with multi gpus compared to one gpu, with the same training ```effective_batch_size``` and ```max_iter```. Your assistance in resolving this issue would be greatly appreciated. Thank you very much!
### Preparation
Datasets: TODO
Place training datasets unser ```datasets/```
Download the [stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) from HuggingFace and put the checkpoints under ```./pretrained_weights/```. You can also download with the script ```script/download_sd21.sh```.
### Start Training
The reproduction training scripts in [arxiv v3 paper](https://arxiv.org/abs/2403.06090v3) is released in ```script/```, whose configs are stored in ```config/```. Models with ```max_train_batch_size > 2``` are trained on an H100 and ```max_train_batch_size <= 2``` on an RTX 4090. Run the train script:
```bash
# Take depth training of main paper as an example
source script/train_sd21_main_paper/sd21_train_accelerate_genpercept_1card_ensure_depth_bs8_per_accu_pixel_mse_ssi_grad_loss.sh
```
## 🎖️ Eval
### Preparation
1. Download [evaluation datasets](https://huggingface.co/datasets/guangkaixu/genpercept_eval/tree/main) and place them in ```datasets_eval```.
2. Download [our trained models](https://huggingface.co/guangkaixu/genpercept-exps) of main paper and ablation study in Section 3 of [arxiv v3 paper](https://arxiv.org/abs/2403.06090v3), and place them in ```weights/genpercept-exps```.
### Start Evaluation
The evaluation scripts are stored in ```script/eval_sd21```.
```bash
# Take "ensemble1 + step1" as an example
source script/eval_sd21/eval_ensemble1_step1/0_infer_eval_all.sh
```
## 📖 Recommanded Works
- Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. [arXiv](https://github.com/prs-eth/marigold), [GitHub](https://github.com/prs-eth/marigold).
- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. [arXiv](https://arxiv.org/abs/2403.12013), [GitHub](https://github.com/fuxiao0719/GeoWizard).
- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. [arXiv](https://arxiv.org/abs/2308.05733), [GitHub](https://github.com/aim-uofa/FrozenRecon).
## 👍 Results in Paper
### Depth and Surface Normal
### Dichotomous Image Segmentation
### Image Matting
### Image Segmentation
## 🎫 License
For non-commercial academic use, this project is licensed under [the 2-clause BSD License](https://opensource.org/license/bsd-2-clause).
For commercial use, please contact [Chunhua Shen](mailto:chhshen@gmail.com).
## 🎓 Citation
```
@article{xu2024diffusion,
title={What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?},
author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua},
journal={arXiv preprint arXiv:2403.06090},
year={2024}
}
```