https://github.com/genforce/ctrl-x

Official implementation of "Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance" (NeurIPS 2024)
https://github.com/genforce/ctrl-x

Last synced: about 1 month ago
JSON representation

Official implementation of "Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance" (NeurIPS 2024)

Host: GitHub
URL: https://github.com/genforce/ctrl-x
Owner: genforce
Created: 2024-06-10T06:47:15.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-12-10T00:08:15.000Z (5 months ago)
Last Synced: 2024-12-10T01:19:04.180Z (5 months ago)
Language: Python
Homepage: https://genforce.github.io/ctrl-x
Size: 47.4 MB
Stars: 262
Watchers: 22
Forks: 9
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

        # Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance (NeurIPS 2024)

 



[![GitHub](https://img.shields.io/github/stars/genforce/ctrl-x?style=social)](https://github.com/genforce/ctrl-x)

[Kuan Heng Lin](https://kuanhenglin.github.io)^1*, [Sicheng Mo](https://sichengmo.github.io/)^1*, [Ben Klingher](https://bklingher.github.io)¹, [Fangzhou Mu](https://pages.cs.wisc.edu/~fmu/)², [Bolei Zhou](https://boleizhou.github.io/)¹ 


¹UCLA ²NVIDIA 


^*Equal contribution 


![Ctrl-X teaser figure](docs/assets/teaser_github.jpg)

## Getting started

### Environment setup

Our code is built on top of [`diffusers v0.28.0`](https://github.com/huggingface/diffusers). To set up the environment, please run the following.

```

conda env create -f environment.yaml

conda activate ctrlx

```

### Running Ctrl-X

#### Gradio demo

We provide a user interface for testing our method. Running the following command starts the demo.

```bash

python app_ctrlx.py

```

#### Script

We also provide a script for running our method. This is equivalent to the Gradio demo.

```bash

python run_ctrlx.py \

    --structure_image assets/images/horse__point_cloud.jpg \

    --appearance_image assets/images/horse.jpg \

    --prompt "a photo of a horse standing on grass" \

    --structure_prompt "a 3D point cloud of a horse"

```

If `appearance_image` is not provided, then Ctrl-X does *structure-only* control. If `structure_image` is not provided, then Ctrl-X does *appearance-only* control.

#### Optional arguments

There are three optional arguments for both `app_ctrlx.py` and `run_ctrlx.py`:

- `model_offload` (flag): If enabled, offloads each component of both the base model and refiner to the CPU when not in use, reducing memory usage while slightly increasing inference time.

    - To use `model_offload`, [`accelerate`](https://github.com/huggingface/accelerate) must be installed. This must be done manually with `pip install accelerate` as `environment.yaml` does *not* have `accelerate` listed.

- `sequential_offload` (flag): If enabled, offloads each layer of both the base model and refiner to the CPU when not in use, *significantly* reducing memory usage while *massively* increasing inference time.

    - Similarly, `accelerate` must be installed to use `sequential_offload`.

    - If both `model_offload` and `sequential_offload` are enabled, then our code defaults to `sequential_offload`.

- `disable_refiner` (flag): If enabled, disables the refiner (and does not load it), reducing memory usage.

- `model` (`str`): When provided a `safetensor` checkpoint path, loads the checkpoint for the base model.

Approximate GPU VRAM usage for the Gradio demo and script (structure *and* appearance control) on a single NVIDIA RTX A6000 is as follows.

| Flags                                    | Inference time (s) | GPU VRAM usage (GiB) |

| ---------------------------------------- | ------------------ | -------------------- |

| None                                     | 28.8               | 18.8                 |

| `model_offload`                          | 38.3               | 12.6                 |

| `sequential_offload`                     | 169.3              | 3.8                  |

| `disable_refiner`                        | 25.5               | 14.5                 |

| `model_offload` + `disable_refiner`      | 31.7               | 7.4                  |

| `sequential_offload` + `disable_refiner` | 151.4              | 3.8                  |

Here, VRAM usage is obtained via `torch.cuda.max_memory_reserved()`, which is the closest option in PyTorch to `nvidia-smi` numbers but is probably still an underestimation. You can obtain these numbers on your own hardware by adding the `benchmark` flag for `run_ctrlx.py`.

Have fun playing around with Ctrl-X! :D

## Future plans (a.k.a. TODOs)

- [ ] Add dataset for quantitative evaluation.

- [ ] Add support for arbitrary schedulers besides DDIM, not necessarily with self-recurrence (if not possible).

- [ ] Add support for DiTs, including SD3 and FLUX.1.

- [ ] Add support for video generation models, including CogVideoX and Mochi 1.

## Contact

For any questions, thoughts, discussions, and any other things you want to reach out for, please contact [Jordan Lin](https://kuanhenglin.github.io) ([email protected]).

## Reference

If you use our code in your research, please cite the following work.

```bibtex

@inproceedings{lin2024ctrlx,

    author = {Lin, {Kuan Heng} and Mo, Sicheng and Klingher, Ben and Mu, Fangzhou and Zhou, Bolei},

    booktitle = {Advances in Neural Information Processing Systems},

    title = {Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance},

    year = {2024}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/genforce/ctrl-x

Awesome Lists containing this project

README