{"id":13488648,"url":"https://genforce.github.io/ctrl-x/","last_synced_at":"2025-03-28T01:37:03.136Z","repository":{"id":243759129,"uuid":"812919268","full_name":"genforce/ctrl-x","owner":"genforce","description":"Official implementation of \"Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance\" (NeurIPS 2024)","archived":false,"fork":false,"pushed_at":"2024-12-10T00:08:15.000Z","size":49666,"stargazers_count":262,"open_issues_count":2,"forks_count":9,"subscribers_count":22,"default_branch":"main","last_synced_at":"2024-12-10T01:19:04.180Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://genforce.github.io/ctrl-x","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/genforce.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-10T06:47:15.000Z","updated_at":"2024-12-10T00:08:19.000Z","dependencies_parsed_at":"2024-11-23T06:28:11.133Z","dependency_job_id":null,"html_url":"https://github.com/genforce/ctrl-x","commit_stats":null,"previous_names":["genforce/ctrl-x"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genforce%2Fctrl-x","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genforce%2Fctrl-x/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genforce%2Fctrl-x/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genforce%2Fctrl-x/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/genforce","download_url":"https://codeload.github.com/genforce/ctrl-x/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245952888,"owners_count":20699553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:19.505Z","updated_at":"2025-03-28T01:37:03.116Z","avatar_url":"https://github.com/genforce.png","language":"Python","funding_links":[],"categories":["Additional conditions"],"sub_categories":[],"readme":"# Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance (NeurIPS 2024)\n\n\u003ca href=\"https://arxiv.org/abs/2406.07540\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-Paper-red\"\u003e\u003c/a\u003e \n\u003ca href=\"https://genforce.github.io/ctrl-x\"\u003e\u003cimg src=\"https://img.shields.io/badge/Project-Page-yellow\"\u003e\u003c/a\u003e\n[![GitHub](https://img.shields.io/github/stars/genforce/ctrl-x?style=social)](https://github.com/genforce/ctrl-x)\n\n[Kuan Heng Lin](https://kuanhenglin.github.io)\u003csup\u003e1*\u003c/sup\u003e, [Sicheng Mo](https://sichengmo.github.io/)\u003csup\u003e1*\u003c/sup\u003e, [Ben Klingher](https://bklingher.github.io)\u003csup\u003e1\u003c/sup\u003e, [Fangzhou Mu](https://pages.cs.wisc.edu/~fmu/)\u003csup\u003e2\u003c/sup\u003e, [Bolei Zhou](https://boleizhou.github.io/)\u003csup\u003e1\u003c/sup\u003e \u003cbr\u003e\n\u003csup\u003e1\u003c/sup\u003eUCLA\u0026emsp;\u003csup\u003e2\u003c/sup\u003eNVIDIA \u003cbr\u003e\n\u003csup\u003e*\u003c/sup\u003eEqual contribution \u003cbr\u003e\n\n![Ctrl-X teaser figure](docs/assets/teaser_github.jpg)\n\n## Getting started\n\n### Environment setup\n\nOur code is built on top of [`diffusers v0.28.0`](https://github.com/huggingface/diffusers). To set up the environment, please run the following.\n```\nconda env create -f environment.yaml\nconda activate ctrlx\n```\n\n### Running Ctrl-X\n\n#### Gradio demo\n\nWe provide a user interface for testing our method. Running the following command starts the demo.\n```bash\npython app_ctrlx.py\n```\n\n#### Script\n\nWe also provide a script for running our method. This is equivalent to the Gradio demo.\n```bash\npython run_ctrlx.py \\\n    --structure_image assets/images/horse__point_cloud.jpg \\\n    --appearance_image assets/images/horse.jpg \\\n    --prompt \"a photo of a horse standing on grass\" \\\n    --structure_prompt \"a 3D point cloud of a horse\"\n```\nIf `appearance_image` is not provided, then Ctrl-X does *structure-only* control. If `structure_image` is not provided, then Ctrl-X does *appearance-only* control.\n\n#### Optional arguments\n\nThere are three optional arguments for both `app_ctrlx.py` and `run_ctrlx.py`:\n- `model_offload` (flag): If enabled, offloads each component of both the base model and refiner to the CPU when not in use, reducing memory usage while slightly increasing inference time.\n    - To use `model_offload`, [`accelerate`](https://github.com/huggingface/accelerate) must be installed. This must be done manually with `pip install accelerate` as `environment.yaml` does *not* have `accelerate` listed.\n- `sequential_offload` (flag): If enabled, offloads each layer of both the base model and refiner to the CPU when not in use, *significantly* reducing memory usage while *massively* increasing inference time.\n    - Similarly, `accelerate` must be installed to use `sequential_offload`.\n    - If both `model_offload` and `sequential_offload` are enabled, then our code defaults to `sequential_offload`.\n- `disable_refiner` (flag): If enabled, disables the refiner (and does not load it), reducing memory usage.\n- `model` (`str`): When provided a `safetensor` checkpoint path, loads the checkpoint for the base model.\n\nApproximate GPU VRAM usage for the Gradio demo and script (structure *and* appearance control) on a single NVIDIA RTX A6000 is as follows.\n\n| Flags                                    | Inference time (s) | GPU VRAM usage (GiB) |\n| ---------------------------------------- | ------------------ | -------------------- |\n| None                                     | 28.8               | 18.8                 |\n| `model_offload`                          | 38.3               | 12.6                 |\n| `sequential_offload`                     | 169.3              | 3.8                  |\n| `disable_refiner`                        | 25.5               | 14.5                 |\n| `model_offload` + `disable_refiner`      | 31.7               | 7.4                  |\n| `sequential_offload` + `disable_refiner` | 151.4              | 3.8                  |\n\nHere, VRAM usage is obtained via `torch.cuda.max_memory_reserved()`, which is the closest option in PyTorch to `nvidia-smi` numbers but is probably still an underestimation. You can obtain these numbers on your own hardware by adding the `benchmark` flag for `run_ctrlx.py`.\n\nHave fun playing around with Ctrl-X! :D\n\n## Future plans (a.k.a. TODOs)\n\n- [ ] Add dataset for quantitative evaluation.\n- [ ] Add support for arbitrary schedulers besides DDIM, not necessarily with self-recurrence (if not possible).\n- [ ] Add support for DiTs, including SD3 and FLUX.1.\n- [ ] Add support for video generation models, including CogVideoX and Mochi 1.\n\n## Contact\n\nFor any questions, thoughts, discussions, and any other things you want to reach out for, please contact [Jordan Lin](https://kuanhenglin.github.io) (kuanhenglin@ucla.edu).\n\n## Reference\n\nIf you use our code in your research, please cite the following work.\n\n```bibtex\n@inproceedings{lin2024ctrlx,\n    author = {Lin, {Kuan Heng} and Mo, Sicheng and Klingher, Ben and Mu, Fangzhou and Zhou, Bolei},\n    booktitle = {Advances in Neural Information Processing Systems},\n    title = {Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance},\n    year = {2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/genforce.github.io%2Fctrl-x%2F","html_url":"https://awesome.ecosyste.ms/projects/genforce.github.io%2Fctrl-x%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/genforce.github.io%2Fctrl-x%2F/lists"}