https://github.com/OPPO-Mente-Lab/attention-mask-control

code for paper "Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models"
https://github.com/OPPO-Mente-Lab/attention-mask-control

Last synced: 6 months ago
JSON representation

code for paper "Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models"

Host: GitHub
URL: https://github.com/OPPO-Mente-Lab/attention-mask-control
Owner: OPPO-Mente-Lab
License: mit
Created: 2023-06-02T01:34:09.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-09-21T11:52:14.000Z (about 2 years ago)
Last Synced: 2024-08-01T18:37:43.031Z (about 1 year ago)
Language: Python
Size: 65.4 KB
Stars: 35
Watchers: 2
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

### Code for paper: "Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models"
[[Projext Page](https://oppo-mente-lab.github.io/compositional_t2i/)][[Paper](https://arxiv.org/abs/2305.13921)]

# Requirements
A suitable [conda](https://conda.io/) environment named `AMC` can be created
and activated with:

```
conda env create -f environment.yaml
conda activate AMC
```

# Data Prepearing
First, please download the coco dataset from [here](https://cocodataset.org/ "here"). We use COCO2014 in the paper.
Then, you can process your data with this script:
```shell
python coco_preprocess.py \
--coco_image_path /YOUR/COCO/PATH/train2014 \
--coco_caption_file /YOUR/COCO/PATH/annotations/captions_train2014.json \
--coco_instance_file /YOUR/COCO/PATH/annotations/instances_train2014.json \
--output_dir /YOUR/DATA/PATH
```

# Training
Before training, you need to change configs in `train_boxnet.sh`
- ROOT_DIR: where to save all the results.
- webdataset_base_urls: /YOUR/DATA/PATH/{xxx-xxx}.tar
- model_path: stable diffusion v1-5 checkpoint

You can train the BoxNet through this script:
```shell
sh train_boxnet.sh $NODE_NUM $CURRENT_NODE_RANK $GPUS_PER_NODE
```
# Text-to-Image Synthesis
With a trained BoxNet, you can start the Text-to-Image Synthesis with:
```shell
python test_pipeline_onestage.py \
--stable_model_path /stable-diffusion-v1-5/checkpoint
--boxnet_model_path /TRAINED/BOXNET/CKPT
--output_dir /YOUR/SAVE/DIR
```
all the test prompt is saved in file `test_prompts.json`.

# TODOs

- [x] Release data preparation code
- [x] Release inference code
- [x] Release training code
- [ ] Release demo
- [ ] Release checkpoint

# Acknowledgements
This implementation is based on the repo from the [diffusers](https://github.com/huggingface/diffusers) library.
[Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/finetune_taiyi_stable_diffusion) codebase.
[DETR](https://github.com/facebookresearch/detr) codebase.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/OPPO-Mente-Lab/attention-mask-control

Awesome Lists containing this project

README