{"id":18936136,"url":"https://github.com/aim-uofa/genpercept","last_synced_at":"2026-03-07T02:33:23.929Z","repository":{"id":231873318,"uuid":"781481275","full_name":"aim-uofa/GenPercept","owner":"aim-uofa","description":"[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models","archived":false,"fork":false,"pushed_at":"2025-01-24T06:32:19.000Z","size":40055,"stargazers_count":220,"open_issues_count":13,"forks_count":8,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-01-26T11:37:15.684Z","etag":null,"topics":["depth-estimation","dichotomous-image-segmentation","human-pose-estimation","iclr2025","image-matting","monocular-depth-estimation","one-step","semantic-segmentation","surface-normals"],"latest_commit_sha":null,"homepage":"https://huggingface.co/spaces/guangkaixu/GenPercept","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aim-uofa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-03T13:17:04.000Z","updated_at":"2026-01-16T14:51:36.000Z","dependencies_parsed_at":"2024-04-09T13:59:18.392Z","dependency_job_id":"77989585-29d7-42ea-a2dd-05d16a624a65","html_url":"https://github.com/aim-uofa/GenPercept","commit_stats":null,"previous_names":["aim-uofa/genpercept"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aim-uofa/GenPercept","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGenPercept","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGenPercept/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGenPercept/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGenPercept/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aim-uofa","download_url":"https://codeload.github.com/aim-uofa/GenPercept/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGenPercept/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30206070,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T19:07:06.838Z","status":"online","status_checked_at":"2026-03-07T02:00:06.765Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["depth-estimation","dichotomous-image-segmentation","human-pose-estimation","iclr2025","image-matting","monocular-depth-estimation","one-step","semantic-segmentation","surface-normals"],"created_at":"2024-11-08T12:06:07.926Z","updated_at":"2026-03-07T02:33:23.882Z","avatar_url":"https://github.com/aim-uofa.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003e [ICLR2025] What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?\u003c/h1\u003e\n\nFormer Title: \"Diffusion Models Trained with Large Data Are Transferable Visual Models\"\n\n[Guangkai Xu](https://github.com/guangkaixu/), \u0026nbsp; \n[Yongtao Ge](https://yongtaoge.github.io/), \u0026nbsp; \n[Mingyu Liu](https://mingyulau.github.io/), \u0026nbsp;\n[Chengxiang Fan](https://leaf1170124460.github.io/), \u0026nbsp;\u003cbr\u003e\n[Kangyang Xie](https://github.com/felix-ky), \u0026nbsp;\n[Zhiyue Zhao](https://github.com/ZhiyueZhau), \u0026nbsp;\n[Hao Chen](https://stan-haochen.github.io/), \u0026nbsp;\n[Chunhua Shen](https://cshen.github.io/), \u0026nbsp;\n\nZhejiang University\n\n### [HuggingFace (Space)](https://huggingface.co/spaces/guangkaixu/GenPercept) | [HuggingFace (Model)](https://huggingface.co/guangkaixu/genpercept-models) | [arXiv](https://arxiv.org/abs/2403.06090)\n\n#### 🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg width=\"800\" alt=\"image\" src=\"figs/pipeline.jpg\"\u003e\n\u003c/div\u003e\n\n\n##  📢 News\n- 2025.1.24: 🎉🎉🎉 GenPercept has been accepted by ICLR 2025. 🎉🎉🎉\n- 2024.10.25: Update GenPercept [Huggingface]((https://huggingface.co/spaces/guangkaixu/GenPercept)) App demo.\n- 2024.10.24: Release latest training and inference code, which is armed with the [accelerate](https://github.com/huggingface/accelerate) library and based on [Marigold](https://github.com/prs-eth/marigold).\n- 2024.10.24: Release [arXiv v3 paper](https://arxiv.org/abs/2403.06090v3). We reorganize the structure of the paper and offer more detailed analysis.\n- 2024.4.30: Release checkpoint weights of surface normal and dichotomous image segmentation.\n- 2024.4.7:  Add [HuggingFace](https://huggingface.co/spaces/guangkaixu/GenPercept) App demo. \n- 2024.4.6:  Release inference code and depth checkpoint weight of GenPercept in the [GitHub](https://github.com/aim-uofa/GenPercept) repo.\n- 2024.3.15: Release [arXiv v2 paper](https://arxiv.org/abs/2403.06090v2), with supplementary material.\n- 2024.3.10: Release [arXiv v1 paper](https://arxiv.org/abs/2403.06090v1).\n\n\n## 📚 Download Resource Summary\n\n - Space-Huggingface demo: https://huggingface.co/spaces/guangkaixu/GenPercept.\n - Models-all (including ablation study): https://huggingface.co/guangkaixu/genpercept-exps.\n - Models-main-paper: https://huggingface.co/guangkaixu/genpercept-models.\n - Models-depth: https://huggingface.co/guangkaixu/genpercept-depth.\n - Models-normal: https://huggingface.co/guangkaixu/genpercept-normal.\n - Models-dis: https://huggingface.co/guangkaixu/genpercept-dis.\n - Models-matting: https://huggingface.co/guangkaixu/genpercept-matting.\n - Models-seg: https://huggingface.co/guangkaixu/genpercept-seg.\n - Models-disparity: https://huggingface.co/guangkaixu/genpercept-disparity.\n - Models-disparity-dpt-head: https://huggingface.co/guangkaixu/genpercept-disparity-dpt-head.\n - Datasets-input demo: https://huggingface.co/datasets/guangkaixu/genpercept-input-demo.\n - Datasets-evaluation data: https://huggingface.co/datasets/guangkaixu/genpercept_datasets_eval.\n - Datasets-evaluation results: https://huggingface.co/datasets/guangkaixu/genpercept-exps-eval.\n\n\n##  🖥️ Dependencies\n\n```bash\nconda create -n genpercept python=3.10\nconda activate genpercept\npip install -r requirements.txt\npip install -e .\n```\n\n## 🚀 Inference\n### Using Command-line Scripts\nDownload the [stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) and [our trained models](https://huggingface.co/guangkaixu/genpercept-models) from HuggingFace and put the checkpoints under ```./pretrained_weights/``` and ```./weights/```, respectively. You can download them with the script ```script/download_sd21.sh``` and ```script/download_weights.sh```, or download the weights of [depth](https://huggingface.co/guangkaixu/genpercept-depth), [normal](https://huggingface.co/guangkaixu/genpercept-normal), [Dichotomous Image Segmentation](https://huggingface.co/guangkaixu/genpercept-dis), [matting](https://huggingface.co/guangkaixu/genpercept-matting), [segmentation](https://huggingface.co/guangkaixu/genpercept-seg), [disparity](https://huggingface.co/guangkaixu/genpercept-disparity), [disparity_dpt_head](https://huggingface.co/guangkaixu/genpercept-disparity-dpt-head) seperately.\n\nThen, place images in the ```./input/``` dictionary. We offer demo images in [Huggingface](guangkaixu/genpercept-input-demo), and you can also download with the script ```script/download_sample_data.sh```. Then, run inference with scripts as below.\n\n```bash\n# Depth\nsource script/infer/main_paper/inference_genpercept_depth.sh\n# Normal\nsource script/infer/main_paper/inference_genpercept_normal.sh\n# Dis\nsource script/infer/main_paper/inference_genpercept_dis.sh\n# Matting\nsource script/infer/main_paper/inference_genpercept_matting.sh\n# Seg\nsource script/infer/main_paper/inference_genpercept_seg.sh\n# Disparity\nsource script/infer/main_paper/inference_genpercept_disparity.sh\n# Disparity_dpt_head\nsource script/infer/main_paper/inference_genpercept_disparity_dpt_head.sh\n```\n\nIf you would like to change the input folder path, unet path, and output path, input these parameters like:\n```bash\n# Assign a values\ninput_rgb_dir=...\nunet=...\noutput_dir=...\n# Take depth as example\nsource script/infer/main_paper/inference_genpercept_depth.sh $input_rgb_dir $unet $output_dir\n```\nFor a general inference script, please see ```script/infer/inference_general.sh``` in detail.\n\n***Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)***\n\n\n### Using torch.hub\n\nTODO\n\n\u003c!-- GenPercept models can be easily used with torch.hub for quick integration into your Python projects. Here's how to use the models for normal estimation, depth estimation, and segmentation:\n#### Normal Estimation\n```python\nimport torch\nimport cv2\nimport numpy as np\n\n# Load the normal predictor model from torch hub\nnormal_predictor = torch.hub.load(\"hugoycj/GenPercept-hub\", \"GenPercept_Normal\", trust_repo=True)\n\n# Load the input image using OpenCV\nimage = cv2.imread(\"path/to/your/image.jpg\", cv2.IMREAD_COLOR)\n\n# Use the model to infer the normal map from the input image\nwith torch.inference_mode():\n    normal = normal_predictor.infer_cv2(image)\n\n# Save the output normal map to a file\ncv2.imwrite(\"output_normal_map.png\", normal)\n```\n\n#### Depth Estimation\n```python\nimport torch\nimport cv2\n\n# Load the depth predictor model from torch hub\ndepth_predictor = torch.hub.load(\"hugoycj/GenPercept-hub\", \"GenPercept_Depth\", trust_repo=True)\n\n# Load the input image using OpenCV\nimage = cv2.imread(\"path/to/your/image.jpg\", cv2.IMREAD_COLOR)\n\n# Use the model to infer the depth map from the input image\nwith torch.inference_mode():\n    depth = depth_predictor.infer_cv2(image)\n\n# Save the output depth map to a file\ncv2.imwrite(\"output_depth_map.png\", depth)\n```\n\n#### Segmentation\n```python\nimport torch\nimport cv2\n\n# Load the segmentation predictor model from torch hub\nseg_predictor = torch.hub.load(\"hugoycj/GenPercept-hub\", \"GenPercept_Segmentation\", trust_repo=True)\n\n# Load the input image using OpenCV\nimage = cv2.imread(\"path/to/your/image.jpg\", cv2.IMREAD_COLOR)\n\n# Use the model to infer the segmentation map from the input image\nwith torch.inference_mode():\n    segmentation = seg_predictor.infer_cv2(image)\n\n# Save the output segmentation map to a file\ncv2.imwrite(\"output_segmentation_map.png\", segmentation)\n``` --\u003e\n\n## 🔥 Train\n\nNOTE: We implement the training with the [accelerate](https://github.com/huggingface/accelerate) library, but find a worse training accuracy with multi gpus compared to one gpu, with the same training ```effective_batch_size``` and ```max_iter```. Your assistance in resolving this issue would be greatly appreciated. Thank you very much!\n\n### Preparation\n\nDatasets: TODO\n\nPlace training datasets unser ```datasets/```\n\nDownload the [stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) from HuggingFace and put the checkpoints under ```./pretrained_weights/```. You can also download with the script ```script/download_sd21.sh```.\n\n\n### Start Training\n\nThe reproduction training scripts in [arxiv v3 paper](https://arxiv.org/abs/2403.06090v3) is released in ```script/```, whose configs are stored in ```config/```. Models with ```max_train_batch_size \u003e 2``` are trained on an H100 and ```max_train_batch_size \u003c= 2``` on an RTX 4090. Run the train script:\n\n```bash\n# Take depth training of main paper as an example\nsource script/train_sd21_main_paper/sd21_train_accelerate_genpercept_1card_ensure_depth_bs8_per_accu_pixel_mse_ssi_grad_loss.sh\n```\n\n## 🎖️ Eval\n\n### Preparation\n\n1. Download [evaluation datasets](https://huggingface.co/datasets/guangkaixu/genpercept_eval/tree/main) and place them in ```datasets_eval```.\n2. Download [our trained models](https://huggingface.co/guangkaixu/genpercept-exps) of main paper and ablation study in Section 3 of [arxiv v3 paper](https://arxiv.org/abs/2403.06090v3), and place them in ```weights/genpercept-exps```.\n\n### Start Evaluation\n\nThe evaluation scripts are stored in ```script/eval_sd21```.\n\n```bash\n# Take \"ensemble1 + step1\" as an example\nsource script/eval_sd21/eval_ensemble1_step1/0_infer_eval_all.sh\n```\n\n\n\n## 📖 Recommanded Works\n\n- Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. [arXiv](https://github.com/prs-eth/marigold), [GitHub](https://github.com/prs-eth/marigold).\n- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. [arXiv](https://arxiv.org/abs/2403.12013), [GitHub](https://github.com/fuxiao0719/GeoWizard).\n- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. [arXiv](https://arxiv.org/abs/2308.05733), [GitHub](https://github.com/aim-uofa/FrozenRecon).\n\n\n## 👍 Results in Paper\n\n### Depth and Surface Normal\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg width=\"800\" alt=\"image\" src=\"figs/demo_depth_normal_new.jpg\"\u003e\n\u003c/div\u003e\n\n### Dichotomous Image Segmentation\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg width=\"800\" alt=\"image\" src=\"figs/demo_dis_new.jpg\"\u003e\n\u003c/div\u003e\n\n### Image Matting\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg width=\"800\" alt=\"image\" src=\"figs/demo_matting.jpg\"\u003e\n\u003c/div\u003e\n\n### Image Segmentation\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg width=\"800\" alt=\"image\" src=\"figs/demo_seg.jpg\"\u003e\n\u003c/div\u003e\n\n\n## 🎫 License\n\nFor non-commercial academic use, this project is licensed under [the 2-clause BSD License](https://opensource.org/license/bsd-2-clause). \nFor commercial use, please contact [Chunhua Shen](mailto:chhshen@gmail.com).\n\n\n## 🎓 Citation\n```\n@article{xu2024diffusion,\n  title={What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?},\n  author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua},\n  journal={arXiv preprint arXiv:2403.06090},\n  year={2024}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faim-uofa%2Fgenpercept","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faim-uofa%2Fgenpercept","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faim-uofa%2Fgenpercept/lists"}