{"id":13488686,"url":"https://github.com/vislearn/ControlNet-XS","last_synced_at":"2025-03-28T01:37:20.953Z","repository":{"id":195781558,"uuid":"693209193","full_name":"vislearn/ControlNet-XS","owner":"vislearn","description":null,"archived":false,"fork":false,"pushed_at":"2024-09-16T14:10:52.000Z","size":98090,"stargazers_count":434,"open_issues_count":14,"forks_count":12,"subscribers_count":16,"default_branch":"main","last_synced_at":"2024-09-16T16:43:13.448Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vislearn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-18T15:06:06.000Z","updated_at":"2024-09-16T14:10:57.000Z","dependencies_parsed_at":"2024-08-13T16:23:32.652Z","dependency_job_id":null,"html_url":"https://github.com/vislearn/ControlNet-XS","commit_stats":null,"previous_names":["vislearn/controlnet-xs"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vislearn%2FControlNet-XS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vislearn%2FControlNet-XS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vislearn%2FControlNet-XS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vislearn%2FControlNet-XS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vislearn","download_url":"https://codeload.github.com/vislearn/ControlNet-XS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222333976,"owners_count":16968058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:20.085Z","updated_at":"2025-03-28T01:37:20.947Z","avatar_url":"https://github.com/vislearn.png","language":"Python","funding_links":[],"categories":["Additional conditions"],"sub_categories":[],"readme":"# ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems\n\n**ECCV 2024 (Oral)**\n\n\nDenis Zavadski, Johann-Friedrich Feiden, [Carsten Rother](https://hci.iwr.uni-heidelberg.de/vislearn/people/carsten-rother/)\n\n![](./ControlNet-XS_files/teaser_small.gif)\n\nThese are ControlNet-XS weights trained on [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and  [stabilityai/stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) on edge and depthmap conditioning respectively. You can find more details and further visual examples on the project page [ControlNet-XS](https://vislearn.github.io/ControlNet-XS/).\n\n## The codebase\nThe code is based on on the StableDiffusion frameworks. To use the ControlNet-XS, you need to access the weights for the StableDiffusion version that you want to control separately.\nWe provide the weights with both depth and edge control for StableDiffusion2.1 and StableDiffusion-XL at Huggingface\n- [ControlNet-XS](https://huggingface.co/CVL-Heidelberg/ControlNet-XS)\n\nAfter obtaining the weights, you need the complete the config files with the paths to the checkpoints of the StableDiffusion base and ControlNet-XS.\n\n#### 1. Setting up the virtualenv\n\nThis is assuming you have navigated to the `ControlNet-XS` root after cloning it.\n\n\n**PyTorch 1.13**\n\n```shell\n# install required packages from pypi\npython3 -m venv .pt13\nsource .pt13/bin/activate\npip3 install -r requirements/pt13.txt\n```\n\n**PyTorch 2.0**\n\n\n```shell\n# install required packages from pypi\npython3 -m venv .pt2\nsource .pt2/bin/activate\npip3 install -r requirements/pt2.txt\n```\n\n\n#### 3. Install `sgm`\n\n```shell\npip3 install .\n```\n\n\n## Usage\n\n\nExample for StableDiffusion-XL with Canny Edges\n\n```python\nimport scripts.control_utils as cu\nimport torch\nfrom PIL import Image\n\npath_to_config = 'ControlNet-XS-main/configs/inference/sdxl/sdxl_encD_canny_48m.yaml'\nmodel = cu.create_model(path_to_config).to('cuda')\n\nimage_path = 'PATH/TO/IMAGES/Shoe.png'\n\ncanny_high_th = 250\ncanny_low_th = 100\nsize = 768\nnum_samples=2\n\nimage = cu.get_image(image_path, size=size)\nedges = cu.get_canny_edges(image, low_th=canny_low_th, high_th=canny_high_th)\n\nsamples, controls = cu.get_sdxl_sample(\n    guidance=edges,\n    ddim_steps=10,\n    num_samples=num_samples,\n    model=model,\n    shape=[4, size // 8, size // 8],\n    control_scale=0.95,\n    prompt='cinematic, shoe in the streets, made from meat, photorealistic shoe, highly detailed',\n    n_prompt='lowres, bad anatomy, worst quality, low quality',\n)\n\n\nImage.fromarray(cu.create_image_grid(samples)).save('SDXL_MyShoe.png')\n```\n![images_1)](./ControlNet-XS_files/SDXL_MyShoe.png)\n\nExample for StableDiffuion2.1 with depth maps\n\n\n```python\nimport scripts.control_utils as cu\nimport torch\nfrom PIL import Image\n\npath_to_config = 'PATH/TO/CONFIG/sd21_encD_depth_14m.yaml'\nmodel = cu.create_model(path_to_config).to('cuda')\n\nsize = 768\nimage_path = 'PATH/TO/IMAGES/Shoe.png'\n\n\nimage = cu.get_image(image_path, size=size)\ndepth = cu.get_midas_depth(image, max_resolution=size)\nnum_samples = 2\n\nsamples, controls = cu.get_sd_sample(\n    guidance=depth,\n    ddim_steps=10,\n    num_samples=num_samples,\n    model=model,\n    shape=[4, size // 8, size // 8],\n    control_scale=0.95,\n    prompt='cinematic, advertising shot, shoe in a city street, photorealistic shoe, colourful, highly detailed',\n    n_prompt='low quality, bad quality, sketches'\n)\n\n\nImage.fromarray(cu.create_image_grid(samples)).save('SD_MyShoe.png')\n```\n![images_2)](./ControlNet-XS_files/SD_MyShoe.png)\n\n\n## Training on Custom Data\n\nTo train your own models on custom data, please orient yourself on the `ldm.data.dummy_set.DummyBase` example for the required output. It is an example dataset working with a directory of images.\nFor neural network based control hints, like MiDaS depths, it is adviced to pre-compute the hints and load them as images instead of computing them during training.\n\nTo train, run in bash\n```\npython main.py -t --base /PATH/TO/CONFIG --logdir /PATH/TO/LOGS --name NAME_YOUR_RUN\n```\n\n### SD 1.5 / 2.1\nExample configs for training Stable Diffusion 1.5 with Canny Edges and Stable Diffusion 2.1 with MiDaS depths (computed on the fly) are in in `configs/training/sd`. You just need to fill in your paths.\n\n### SD 1.5 / 2.1\nExample configs for training Stable Diffusion XL with Canny Edges and with MiDaS depths (computed on the fly) are in in `configs/training/sdxl`. You just need to fill in your paths.\n\n\n## Citation\n\n```bibtex\n@misc{zavadski2024controlnetxs,\n    title={ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems}, \n    author={Denis Zavadski and Johann-Friedrich Feiden and Carsten Rother},\n    year={2024},\n    eprint={2312.06573},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV},\n}","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvislearn%2FControlNet-XS","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvislearn%2FControlNet-XS","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvislearn%2FControlNet-XS/lists"}