{"id":13488637,"url":"https://idea-research.github.io/HumanSD/","last_synced_at":"2025-03-28T01:36:55.598Z","repository":{"id":163795985,"uuid":"625541116","full_name":"IDEA-Research/HumanSD","owner":"IDEA-Research","description":"[ICCV 2023] The official implementation of paper \"HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation\"","archived":false,"fork":false,"pushed_at":"2023-10-24T08:18:16.000Z","size":24955,"stargazers_count":266,"open_issues_count":8,"forks_count":17,"subscribers_count":13,"default_branch":"main","last_synced_at":"2024-08-01T18:38:44.932Z","etag":null,"topics":["conditional-image-generation","deep-learning","iccv","iccv2023","image-generation","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IDEA-Research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-04-09T12:42:08.000Z","updated_at":"2024-07-30T02:26:48.000Z","dependencies_parsed_at":null,"dependency_job_id":"750e8850-bf68-440b-a0c7-98efaa065adc","html_url":"https://github.com/IDEA-Research/HumanSD","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FHumanSD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FHumanSD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FHumanSD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDEA-Research%2FHumanSD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IDEA-Research","download_url":"https://codeload.github.com/IDEA-Research/HumanSD/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222333976,"owners_count":16968058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conditional-image-generation","deep-learning","iccv","iccv2023","image-generation","pytorch"],"created_at":"2024-07-31T18:01:19.352Z","updated_at":"2025-03-28T01:36:55.585Z","avatar_url":"https://github.com/IDEA-Research.png","language":"Python","funding_links":[],"categories":["Additional conditions"],"sub_categories":[],"readme":"# HumanSD\n\n---\n\nThis repository contains the implementation of the ICCV2023 paper:\n\u003e **HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation** [[Project Page]](https://idea-research.github.io/HumanSD/) [[Paper]](https://arxiv.org/abs/2304.04269) [[Code]](https://github.com/IDEA-Research/HumanSD) [[Video]](https://drive.google.com/file/d/1Djc2uJS5fmKnKeBnL34FnAAm3YSH20Bb/view?usp=sharing) [[Data]](https://forms.gle/ANxDTjxcE2Ua45oU8) \u003cbr\u003e\n\u003e [Xuan Ju](https://juxuan.space/)\u003csup\u003e∗12\u003c/sup\u003e, [Ailing Zeng](https://ailingzeng.site/)\u003csup\u003e∗1\u003c/sup\u003e, [Chenchen Zhao](https://zcc31415926.github.io/)\u003csup\u003e∗2\u003c/sup\u003e, [Jianan Wang](https://github.com/wendyjnwang/)\u003csup\u003e1\u003c/sup\u003e, [Lei Zhang](https://www.leizhang.org/)\u003csup\u003e1\u003c/sup\u003e, [Qiang Xu](https://cure-lab.github.io/)\u003csup\u003e2\u003c/sup\u003e\u003cbr\u003e\n\u003e \u003csup\u003e∗\u003c/sup\u003e Equal contribution \u003csup\u003e1\u003c/sup\u003eInternational Digital Economy Academy \u003csup\u003e2\u003c/sup\u003eThe Chinese University of Hong Kong\n\n\nIn this work, we propose a native skeleton-guided diffusion model for controllable HIG called HumanSD. Instead of performing image editing with dual-branch diffusion, we fine-tune the original SD model using a novel heatmap-guided denoising loss. This strategy effectively and efficiently strengthens the given skeleton condition during model training while mitigating the catastrophic forgetting effects. HumanSD is fine-tuned on the assembly of\nthree large-scale human-centric datasets with text-imagepose information, two of which are established in this work. \n\n---\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/teaser.png\" width=\"95%\"\u003e\n\u003c/div\u003e\n\n\n\n- (a) a generation by the pre-trained pose-less text-guided [stable diffusion (SD)](https://github.com/Stability-AI/stablediffusion)\n- (b) pose skeleton images as the condition to ControlNet and our proposed HumanSD\n- (c) a generation by [ControlNet](https://github.com/lllyasviel/ControlNet)\n- (d) a generation by HumanSD (ours). ControlNet and HumanSD receive both text and pose conditions. \n\nHumanSD shows its superiorities in terms of (I) challenging poses, (II) accurate painting styles, (III) pose control capability, (IV) multi-person scenarios, and (V) delicate details. \n\n**Table of Contents**\n\n- [HumanSD](#humansd)\n  - [TODO](#todo)\n  - [Model Overview](#model-overview)\n  - [Getting Started](#getting-started)\n    - [Environment Requirement](#environment-requirement)\n    - [Model and Checkpoints](#model-and-checkpoints)\n    - [Quick Demo](#quick-demo)\n    - [Dataset](#dataset)\n  - [Training](#training)\n  - [Quantitative Results](#quantitative-results)\n  - [Qualitative Results](#qualitative-results)\n    - [Natural Scene](#natural-scene)\n    - [Sketch Scene](#sketch-scene)\n    - [Shadow Play Scene](#shadow-play-scene)\n    - [Children Drawing Scene](#children-drawing-scene)\n    - [Oil Painting Scene](#oil-painting-scene)\n    - [Watercolor Scene](#watercolor-scene)\n    - [Digital Art Scene](#digital-art-scene)\n    - [Relief Scene](#relief-scene)\n    - [Sculpture Scene](#sculpture-scene)\n  - [Cite Us](#cite-us)\n  - [Acknowledgement](#acknowledgement)\n\n\n## TODO\n\nNews!! Our paper have been accepted by ICCV2023! Training code is released.\n\n- [x] Release inference code and pretrained models\n- [x] Release Gradio UI demo\n- [x] Public training data (LAION-Human)\n- [x] Release training code\n\n## Model Overview\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/model.png\" width=\"95%\"\u003e\n\u003c/div\u003e\n\n## Getting Started\n### Environment Requirement\n\nHumanSD has been implemented and tested on Pytorch 1.12.1 with python 3.9.\n\nClone the repo:\n```bash\ngit clone git@github.com:IDEA-Research/HumanSD.git\n```\n\nWe recommend you first install `pytorch` following [official instructions](https://pytorch.org/get-started/previous-versions/). For example:\n\n```bash\n# conda\nconda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch\n```\n\nThen, you can install required packages thourgh:\n\n```bash\npip install -r requirements.txt\n```\n\nYou also need to install MMPose following [here](https://github.com/open-mmlab/mmpose). Noted that you only need to install MMPose as a python package. PS: Because of the update of MMPose, we recommend you to install 0.29.0 version of MMPose.\n\n### Model and Checkpoints\n\nDownload necessary checkpoints of HumanSD, which can be found [here](https://drive.google.com/drive/folders/1NLQAlF7i0zjEpd-XY0EcVw9iXP5bB5BJ?usp=sharing). The data structure should be like:\n\n```\n|-- humansd_data\n    |-- checkpoints\n        |-- higherhrnet_w48_humanart_512x512_udp.pth\n        |-- v2-1_512-ema-pruned.ckpt\n        |-- humansd-v1.ckpt\n```\n\nNoted that v2-1_512-ema-pruned.ckpt should be download from [Stable Diffusion](https://github.com/Stability-AI/stablediffusion).\n\n\n\n### Quick Demo\n\nYou can run demo either through command line or gradio.\n\nYou can run demo through command line with:\n\n```\npython scripts/pose2img.py --prompt \"oil painting of girls dancing on the stage\" --pose_file assets/pose/demo.npz\n```\n\nYou can also run demo compared with ControlNet and T2I-Adapter:\n\n```\npython scripts/pose2img.py --prompt \"oil painting of girls dancing on the stage\" --pose_file assets/pose/demo.npz --controlnet --t2i\n```\n\n\nYou can run gradio demo through:\n\n```\npython scripts/gradio/pose2img.py\n```\n\nWe have also provided the comparison of ControlNet and T2I-Adapter, you can run all these methods in one demo. But you need to download corresponding model and checkpoints following:\n\u003cdetails\u003e \u003csummary\u003eTo compare ControlNet, and T2I-Adpater's results.\u003c/summary\u003e\n(1) You need to initialize ControlNet and T2I-Adapter as submodule using \n\n```\ngit submodule init\ngit submodule update\n```\n(2) Then download checkpoints from: a. [T2I-Adapter](https://huggingface.co/TencentARC/T2I-Adapter/resolve/main/models/t2iadapter_openpose_sd14v1.pth) \nb. [ControlNet](https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.pth). \nAnd put them into humansd_data/checkpoints\n\nThen, run:\n\n```\npython scripts/gradio/pose2img.py --controlnet --t2i\n```\n\nNoted that you may have to modify some code in T2I-Adapter due to the path conflict. \n\ne.g., use\n\n```\nfrom comparison_models.T2IAdapter.ldm.models.diffusion.ddim import DDIMSampler\n```\n\ninstead of \n\n```\nfrom T2IAdapter.ldm.models.diffusion.ddim import DDIMSampler\n```\n\n\u003c/details\u003e\n\n\n### Dataset\n\nYou may refer to the code [here](ldm/data/humansd.py) for loading the data.\n\n**Laion-Human**\n\n\nYou may apply for access of Laion-Human [here](https://forms.gle/ANxDTjxcE2Ua45oU8). Noted that we have provide the pose annotations, images' .parquet file and mapping file, please download the images according to .parquet. The `key` in .parquet is the corresponding image index. For example, image with `key=338717` in 00033.parquet is corresponding to images/00000/000338717.jpg. \n\nAfter downloading the images and pose, you need to extract zip files and make it looks like:\n\n\n```\n|-- humansd_data\n    |-- datasets\n        |-- Laion \n            |-- Aesthetics_Human\n                |-- images\n                    |-- 00000.parquet\n                    |-- 00001.parquet\n                    |-- ...\n                |-- pose\n                    |-- 00000\n                        |-- 000000000.npz\n                        |-- 000000001.npz\n                        |-- ...\n                    |-- 00001\n                    |-- ... \n                |-- mapping_file_training.json        \n```\n\nThen, you can use `python utils/download_data.py` to download all images.\n\n\nThen, the file data structure should be like:\n\n```\n|-- humansd_data\n    |-- datasets\n        |-- Laion \n            |-- Aesthetics_Human\n                |-- images\n                    |-- 00000.parquet\n                    |-- 00001.parquet\n                    |-- ...\n                    |-- 00000\n                        |-- 000000000.jpg\n                        |-- 000000001.jpg\n                        |-- ...\n                    |-- 00001\n                    |-- ...\n                |-- pose\n                    |-- 00000\n                        |-- 000000000.npz\n                        |-- 000000001.npz\n                        |-- ...\n                    |-- 00001\n                    |-- ... \n                |-- mapping_file_training.json        \n```\n\n\n\nIf you download the LAION-Aesthetics in tar files, which is different from our data structure, we recommend you extract the tar file through code:\n\n```python\nimport tarfile\ntar_file=\"00000.tar\" # 00000.tar - 00286.tar\npresent_tar_path=f\"xxxxxx/{tar_file}\"\nsave_dir=\"humansd_data/datasets/Laion/Aesthetics_Human/images\"\nwith tarfile.open(present_tar_path, \"r\") as tar_file:\n    for present_file in tar_file.getmembers():\n        if present_file.name.endswith(\".jpg\"):\n            print(f\"     image:- {present_file.name} -\")\n            image_save_path=os.path.join(save_dir,tar_file.replace(\".tar\",\"\"),present_file.name)\n            present_image_fp=TarIO.TarIO(present_tar_path, present_file.name)\n            present_image=Image.open(present_image_fp)\n            present_image_numpy=cv2.cvtColor(np.array(present_image),cv2.COLOR_RGB2BGR)\n            if not os.path.exists(os.path.dirname(image_save_path)):\n                os.makedirs(os.path.dirname(image_save_path))\n            cv2.imwrite(image_save_path,present_image_numpy)\n```\n\n**Human-Art**\n\n\nYou may download [Human-Art](https://idea-research.github.io/HumanArt/) dataset [here](https://forms.gle/UVv1GiNJNQsE4qif7).\n\nThe file data structure should be like:\n\n```\n|-- humansd_data\n    |-- datasets\n        |-- HumanArt \n            |-- images\n                |-- 2D_virtual_human\n                    |-- cartoon\n                        |-- 000000000007.jpg\n                        |-- 000000000019.jpg\n                        |-- ...\n                    |-- digital_art\n                    |-- ...\n                |-- 3D_virtual_human\n                |-- real_human\n            |-- pose\n                |-- 2D_virtual_human\n                    |-- cartoon\n                        |-- 000000000007.npz\n                        |-- 000000000019.npz\n                        |-- ...\n                    |-- digital_art\n                    |-- ...\n                |-- 3D_virtual_human\n                |-- real_human\n            |-- mapping_file_training.json   \n            |-- mapping_file_validation.json     \n```\n\n## Training\n\nNote that the datasets and checkpoints should be downloaded and prepared before training.\n\nRun the commands below to start training:\n\n```\npython main.py --base configs/humansd/humansd-finetune.yaml -t --gpus 0,1 --name finetune_humansd\n```\n\nIf you want to finetune without heat-map-guided diffusion loss for ablation, you can run the following commands:\n\n```\npython main.py --base configs/humansd/humansd-finetune-originalloss.yaml -t --gpus 0,1 --name finetune_humansd_original_loss\n```\n\n## Quantitative Results\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/quantitative_results.png\" width=\"97%\"\u003e\n\u003c/div\u003e\n\nMetrics can be calculated through:\n\n```\npython scripts/pose2img_metrics.py --outdir outputs/metrics --config utils/metrics/metrics.yaml --ckpt path_to_ckpt\n```\n\n## Qualitative Results\n\n\n- (a) a generation by the pre-trained text-guided [stable diffusion (SD)](https://github.com/Stability-AI/stablediffusion)\n- (b) pose skeleton images as the condition to ControlNet, T2I-Adapter and our proposed HumanSD\n- (c) a generation by [ControlNet](https://github.com/lllyasviel/ControlNet)\n- (d) a generation by [T2I-Adapter](https://github.com/TencentARC/T2I-Adapter)\n- (e) a generation by HumanSD (ours). \n\nControlNet, T2I-Adapter, and HumanSD receive both text and pose conditions.\n\n### Natural Scene\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/natural1.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/natural3.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/natural2.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/natural4.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/natural5.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n### Sketch Scene\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/sketch1.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/sketch2.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n### Shadow Play Scene\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/shadowplay1.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n### Children Drawing Scene\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/childrendrawing1.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n### Oil Painting Scene\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/oilpainting1.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/oilpainting2.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n### Watercolor Scene\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/watercolor1.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n### Digital Art Scene\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/digitalart1.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n### Relief Scene\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/relief1.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n### Sculpture Scene\n\n\u003cdiv  align=\"center\"\u003e    \n\u003cimg src=\"assets/sculpture1.png\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n\n## Cite Us\n\n```bibtex\n@article{ju2023humansd,\n  title={Human{SD}: A Native Skeleton-Guided Diffusion Model for Human Image Generation},\n  author={Ju, Xuan and Zeng, Ailing and Zhao, Chenchen and Wang, Jianan and Zhang, Lei and Xu, Qiang},\n  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},\n  year={2023}\n}\n@inproceedings{ju2023human,\n    title={Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes},\n    author={Ju, Xuan and Zeng, Ailing and Wang, Jianan and Xu, Qiang and Zhang, Lei},\n    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n    year={2023},\n}\n```\n\n\n## Acknowledgement\n\n- Our code is modified on the basis of [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), thanks to all the contributors!\n- HumanSD would not be possible without [LAION](https://laion.ai/) and their efforts to create open, large-scale datasets.\n- Thanks to [the DeepFloyd team](https://twitter.com/deepfloydai) at Stability AI, for creating the subset of [LAION-5B](https://laion.ai/blog/laion-5b/) dataset used to train HumanSD.\n- HumanSD uses [OpenCLIP](https://laion.ai/blog/large-openclip/), trained by [Romain Beaumont](https://github.com/rom1504).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/idea-research.github.io%2FHumanSD%2F","html_url":"https://awesome.ecosyste.ms/projects/idea-research.github.io%2FHumanSD%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/idea-research.github.io%2FHumanSD%2F/lists"}