{"id":28957982,"url":"https://github.com/open-mmlab/anycontrol","last_synced_at":"2025-10-07T18:02:55.088Z","repository":{"id":283858733,"uuid":"822405236","full_name":"open-mmlab/AnyControl","owner":"open-mmlab","description":"[ECCV 2024] AnyControl, a multi-control image synthesis model that supports any combination of user provided control signals. 一个支持用户自由输入控制信号的图像生成模型，能够根据多种控制生成自然和谐的结果！","archived":false,"fork":false,"pushed_at":"2024-07-05T12:52:26.000Z","size":11622,"stargazers_count":123,"open_issues_count":3,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-22T16:50:59.841Z","etag":null,"topics":["controllable-generation","multi-control","text-to-image"],"latest_commit_sha":null,"homepage":"https://any-control.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/open-mmlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-01T05:09:30.000Z","updated_at":"2025-03-11T02:42:43.000Z","dependencies_parsed_at":"2025-03-22T17:01:46.779Z","dependency_job_id":null,"html_url":"https://github.com/open-mmlab/AnyControl","commit_stats":null,"previous_names":["open-mmlab/anycontrol"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/open-mmlab/AnyControl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-mmlab%2FAnyControl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-mmlab%2FAnyControl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-mmlab%2FAnyControl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-mmlab%2FAnyControl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/open-mmlab","download_url":"https://codeload.github.com/open-mmlab/AnyControl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-mmlab%2FAnyControl/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261567617,"owners_count":23178178,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["controllable-generation","multi-control","text-to-image"],"created_at":"2025-06-23T22:35:50.770Z","updated_at":"2025-10-07T18:02:54.993Z","avatar_url":"https://github.com/open-mmlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [ECCV 2024] AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation \n[![arXiv](https://img.shields.io/badge/arXiv-2406.18958-b31b1b.svg)](https://arxiv.org/abs/2406.18958)\n[![Project Page](https://img.shields.io/badge/Project-Page-Green)](https://any-control.github.io/)\n[![HuggingFace Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/nowsyn/anycontrol)\n\u003ca target=\"_blank\" href=\"https://huggingface.co/spaces/nowsyn/AnyControl\"\u003e\n  \u003cimg src=\"https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg\" alt=\"Online Demo in HF\"/\u003e\n\u003c/a\u003e\n\n\u003e **AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation**\n\u003e\n\u003e [Yanan Sun](https://scholar.google.com/citations?user=6TA1oPkAAAAJ\u0026hl=en), Yanchen Liu, Yinhao Tang, [Wenjie Pei](https://wenjiepei.github.io/) and [Kai Chen*](https://chenkai.site/)\n\u003e \n\u003e (\\* Corresponding Author)\n\u003e\n\u003e - Presented by Shanghai AI Laboratory\n\u003e - :mailbox_with_mail: Primary contact: [Yanan Sun](https://scholar.google.com/citations?user=6TA1oPkAAAAJ\u0026hl=en) ( now.syn@gmail.com ) \n\n## Highlights \u003ca name=\"highlights\"\u003e\u003c/a\u003e\n\n:star2: **AnyControl**, a controllable image synthesis framework that supports any combination of various forms of control signals. Our AnyControl enables holistic understanding of user inputs, and produces harmonious results in high quality and fidelity under versatile control signals.\n\n:star2: AnyControl proposes a novel Multi-Control Encoder comprising alternating multi-control fusion block and multi-control alignment block to achieve comprehensive understanding of complex multi-modal user inputs. \n\n![](./assets/teaser.png \"AnyControl\")\n\n\n## What's New\u003ca name=\"news\"\u003e\u003c/a\u003e\n\n[2024/07/05] Online demo released in [HuggingFace](https://huggingface.co/spaces/nowsyn/AnyControl).\n\n[2024/07/03] :fire: AnyControl accepted by ECCV 2024!\n\n[2024/07/03] COCO-UM released in [HuggingFace](https://huggingface.co/datasets/nowsyn/COCO-UM).\n\n[2024/07/03] AnyControl models released in [HuggingFace](https://huggingface.co/nowsyn/anycontrol).\n\n[2024/07/03] AnyControl training and inference code released.\n\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Inference](#inference) \n- [Training](#training)\n- [Results](#results)\n- [COCO-UM](#coco-um)\n- [License and Citation](#license-and-citation)\n- [Related Resources](#resources)\n\n## Installation \u003ca name=\"installation\"\u003e\u003c/a\u003e\n\n```bash\n# Clone the Repository\ngit clone https://github.com/nowsyn/AnyControl.git\n\n# Navigate to the Repository\ncd AnyControl\n\n# Create Virtual Environment with Conda\nconda create --name AnyControl python=3.10\nconda activate AnyControl\n\n# Install Dependencies\npip install -r requirements.txt\n\n# Install detectron2\npip install git+https://github.com/facebookresearch/detectron2.git@v0.6\n\n# Compile ms_deform_attn op\ncd annotator/entityseg/mask2former/modeling/pixel_decoder/ops\nsh make.sh\n```\n\n## Inference \u003ca name=\"inference\"\u003e\u003c/a\u003e\n\n1. Download `anycontrol_15.ckpt` and third-party models from [HuggingFace](https://huggingface.co/nowsyn/anycontrol).\n\n```bash\nconda install git-lfs\ngit lfs install\n\nmkdir .cache\ngit clone https://huggingface.co/nowsyn/anycontrol .cache/anycontrol\n\nln -s `pwd`/.cache/anycontrol/ckpts ./ckpts\nln -s `pwd`/.cache/anycontrol/annotator/ckpts ./annotator/ckpts\n```\n\n2. Start the gradio demo.\n\n```bash\npython src/inference/gradio_demo.py\n```\n\nWe give a screenshot of the gradio demo. You can set the number of conditions you prefer to use dynamically, then upload the condition images and choose the processor for each conditon. We totally provide 4 spatial condition processors including `edge`, `depth`, `seg`, and `pose`. BTW, another two global control processors `content` and `color` are provided for your information, which are not part of this work.\n\n![](./assets/demo_00.jpg \"AnyControl Gradio Demo\")\n\n\n## Training\n\nWe recommand using 8 A100 GPUs for training. \n\n1. Please refer to [DATASET.md](docs/DATASET.md) to prepare datasests.\n2. Start multi-gpu training.\n\n```bash\npython -m torch.distributed.launch --nproc_per_node 8 src/train/train.py \\\n    --config-path configs/anycontrol_local.yaml \\\n    --learning-rate 0.00001 \\\n    --batch-size 8 \\\n    --training-steps 90000 \\\n    --log-freq 500\n```\n\n## Results\u003ca name=\"results\"\u003e\u003c/a\u003e\n\n![](./assets/results.png \"results\")\n\n\n## COCO-UM\u003ca name=\"coco-um\"\u003e\u003c/a\u003e\nMost existing methods evaluate multi-control image synthesis on COCO-5K with totally spatio-aligned conditions. However, we argue that evaluation on well-aligned multi-control conditions cannot reflect the ability of methods to handle overlapped  multiple conditions in practical applications, given that the user provided conditions are typically collected from diverse sources which are not aligned. \n\nTherefore, we construct an **U**naligned **M**ulti-control benchmark based on COCO-5K, short for **COCO-UM**, for a more effective evaluation on multi-control image synthesis. \n\nYou can access COCO-UM [here](https://huggingface.co/datasets/nowsyn/COCO-UM). \n\n\n## License and Citation \u003ca name=\"license-and-citation\"\u003e\u003c/a\u003e\n\nAll assets and code are under the [license](./LICENSE) unless specified otherwise.\n\nIf this work is helpful for your research, please consider citing the following BibTeX entry.\n\n``` bibtex\n@misc{sun2024anycontrol,\n  title={AnyControl: Create your artwork with versatile control on text-to-image generation},\n  author={Sun, Yanan and Liu, Yanchen and Tang, Yinhao and Pei, Wenjie and Chen, Kai},\n  booktitle={ECCV},\n  year={2024},\n}\n```\n\n## Related Resources \u003ca name=\"resources\"\u003e\u003c/a\u003e\n\nWe acknowledge all the open-source contributors for the following projects to make this work possible:\n\n- [Uni-ControlNet](https://github.com/ShihaoZhaoZSH/Uni-ControlNet) | [UniControl](https://github.com/salesforce/UniControl) | [ControlNet](https://github.com/lllyasviel/ControlNet)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-mmlab%2Fanycontrol","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopen-mmlab%2Fanycontrol","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-mmlab%2Fanycontrol/lists"}