{"id":13488696,"url":"https://github.com/tomtom1103/compose-and-conquer","last_synced_at":"2025-03-28T01:37:22.848Z","repository":{"id":217609708,"uuid":"744336633","full_name":"tomtom1103/compose-and-conquer","owner":"tomtom1103","description":"[ICLR 2024] Official repo. for Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis","archived":false,"fork":false,"pushed_at":"2024-01-18T09:00:05.000Z","size":3929,"stargazers_count":99,"open_issues_count":0,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-05-15T23:59:35.990Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tomtom1103.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-17T04:53:51.000Z","updated_at":"2024-07-23T03:02:39.674Z","dependencies_parsed_at":null,"dependency_job_id":"e1f4b397-26da-448c-b17d-55034434308d","html_url":"https://github.com/tomtom1103/compose-and-conquer","commit_stats":null,"previous_names":["tomtom1103/compose-and-conquer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomtom1103%2Fcompose-and-conquer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomtom1103%2Fcompose-and-conquer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomtom1103%2Fcompose-and-conquer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomtom1103%2Fcompose-and-conquer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tomtom1103","download_url":"https://codeload.github.com/tomtom1103/compose-and-conquer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222333976,"owners_count":16968058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:20.233Z","updated_at":"2025-03-28T01:37:22.842Z","avatar_url":"https://github.com/tomtom1103.png","language":"Python","funding_links":[],"categories":["Additional conditions"],"sub_categories":[],"readme":" # [ICLR 2024] Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis\n\n[![arXiv](https://img.shields.io/badge/arXiv-2110.02711-red)](https://arxiv.org/abs/2401.09048)\n\n\u003e **Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis**\u003cbr\u003e\n\u003e Jonghyun Lee, Hansam Cho, Youngjoon Yoo, Seoung Bum Kim, Yonghyun Jeong\u003cbr\u003e\n\u003e \n\u003e**Abstract**: \u003cbr\u003e\nAddressing the limitations of text as a source of accurate layout representation in text-conditional diffusion models, many works incorporate additional signals to condition certain attributes within a generated image. Although successful, previous works do not account for the specific localization of said attributes extended into the three dimensional plane. In this context, we present a conditional diffusion model that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics from multiple exemplar images. Specifically, we first introduce depth disentanglement training to leverage the relative depth of objects as an estimator, allowing the model to identify the absolute positions of unseen objects through the use of synthetic image triplets. We also introduce soft guidance, a method for imposing global semantics onto targeted regions without the use of any additional localization cues. Our integrated framework, Compose and Conquer (CnC), unifies these techniques to localize multiple conditions in a disentangled manner. We demonstrate that our approach allows perception of objects at varying depths while offering a versatile framework for composing localized objects with different global semantics.\n \n\n## Description\nOfficial implementation of Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis.\n\n![image](samples/fig.jpg)\n\n## Updates\n- 2024/1/16 Compose and Conquer has been accepted at ICLR 2024! see you at vienna\n\n## To-Do\n - [ ] Update canny + depth weights\n - [ ] Diffusers integration\n\n\n## Setup\nFor the time being, we recommend a conda environment:\n```\nconda env create -f environment.yaml\nconda activate cnc\n```\n\n1. Download the [trained weights](https://drive.google.com/drive/folders/1kK1RbLs3bf08Hawwq9LU6jGfB3N7li2y?usp=sharing) and place it into the `trained_weights` folder. Note that we provide seperate weights for the full model (`cnc_v1.ckpt`), the local fuser (`local_fuser_v1.ckpt`), and the global fuser (`global_fuser_v1.ckpt`), and only the weights for the full model is required to run CnC. Feel free to tinker with the fusers.\n\n2. Download the weights for the SOD module [U\u003csup\u003e2\u003c/sup\u003e-Net](https://github.com/xuebinqin/U-2-Net) either from their repository or [this link](https://drive.google.com/file/d/1kO3fbi8bd-EVrcfV0dKdcBTNuiJ8wf9G/view?usp=sharing), and place it in `annotator/u2net/weights`.\n\n## Local Demo\nLaunch the gradio demo:\n```\npython src/test/test_cnc.py\n```\n![image](samples/gradio.jpg)\n\nThe mask for Soft Guidance is automatically extracted from the foreground image via U\u003csup\u003e2\u003c/sup\u003e-Net.\n\n## Training\n\n### Dataset Preparation\nFor the official implementation, we've trained on the [COCO-Stuff Dataset](https://github.com/nightrome/cocostuff) and the [Pick-a-Pic Dataset](https://huggingface.co/datasets/yuvalkirstain/pickapic_v1). Our dataset preperation and training code follows a specific filetree convention:\n```\nbase_dataset_dir/\n├── train_captions.txt\n├── train/\n├── train_mask/\n├── train_foreground/\n├── train_foreground_depthmaps/\n├── train_foreground_embeddings/\n├── train_background/\n├── train_background_depthmaps/\n├── train_background_embeddings/\n```\n\nWe've provided scripts to prepare the masks, foreground/background images, and its primitives in `data`. Here's a step by step example to prepare the COCO-Stuff dataset:\n1. Set up the COCO-Stuff Dataset through instructions provided by their [repository](https://github.com/nightrome/cocostuff):\n\n```\n# Get this repo\ngit clone https://github.com/nightrome/cocostuff.git\ncd cocostuff\n\n# Download everything\nwget --directory-prefix=downloads http://images.cocodataset.org/zips/train2017.zip\nwget --directory-prefix=downloads http://images.cocodataset.org/zips/val2017.zip\nwget --directory-prefix=downloads http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip\n\n# Unpack everything\nmkdir -p dataset/images\nmkdir -p dataset/annotations\nunzip downloads/train2017.zip -d dataset/images/\nunzip downloads/val2017.zip -d dataset/images/\nunzip downloads/stuffthingmaps_trainval2017.zip -d dataset/annotations/\n```\n2. Download the annotations folder that contain the text captions for COCO-Stuff. Note that the `dataset/annotations` folder contains the pixel-wise annotation images, and the text captions are not included, hence the additional download.\n\n```\nwget http://images.cocodataset.org/annotations/annotations_trainval2017.zip\nunzip annotations_trainval2017.zip\n```\n\n3. extract the captions to a .txt file:\n\n```\npython data/parse_cocostuff.py\n```\n\n4. prepare the synthetic image triplets:\n\n```\npython data/get_mask_cocostuff.py\npython data/get_foreground.py --base_dir \"/path/to/cocostuff/dataset/images\" --placeholder \"train2017\"\npython data/get_background.py --base_dir \"/path/to/cocostuff/dataset/images\" --placeholder \"train2017\"\n```\n\n5. prepare the primitives (Depth maps, CLIP Image embeddings)\n\n```\npython data/get_primitives.py --base_dir \"/path/to/cocostuff/dataset/images\" --placeholder \"train2017\"\n```\n\nBy this point, the filetree convention will be completed, and the cocostuff directory should look something like this:\n\n```\n|-- README.md\n|-- cocostuff.bib\n|-- dataset\n|   |-- annotations\n|   |   |-- train2017\n|   |   |-- val2017\n|   |-- images\n|   |   |-- train2017\n|   |   |-- train2017_captions.txt\n|   |   |-- train2017_mask\n|   |   |-- train2017_background\n|   |   |-- train2017_background_depthmaps\n|   |   |-- train2017_background_embeddings\n|   |   |-- train2017_foreground\n|   |   |-- train2017_foreground_depthmaps\n|   |   |-- train2017_foreground_embeddings\n```\n\n\n\n### Global Fuser training\nTo train your own global fuser, first download the SD weights and place them in `ckpt`.\n\n```\nwget https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned.ckpt\n```\n\nThen extract weights from SD:\n```\npython utils/prepare_weights.py init_global ckpt/v1-5-pruned.ckpt configs/global_v15_v4_1.yaml ckpt/init_global.ckpt\n```\n\nand run `scripts/train_global_fuser.sh`. Make sure to edit the .yaml files to contain your `base_dir` and `placeholder` data flags. For the COCO-Stuff example above, the `--base_dir` would be `/path/to/cocostuff/dataset/images` and `--placeholder` would be `train2017`.\n\n### Local Fuser training\nTo train your own local fuser, first download the [Uni-ControlNet weights](https://github.com/ShihaoZhaoZSH/Uni-ControlNet) and place them in `ckpt`.\n\nThen extract weights from Uni-ControlNet:\n```\npython utils/prepare_weights.py init_local_fromuni ckpt/uni.ckpt configs/local_fuser_v1.yaml ckpt/init_local_fromuni.ckpt\n```\n\nand run `scripts/train_local_fuser.sh`.\n\n### CnC Finetuning\nOnce both local and global fusers are trained, merge and finetune the model via:\n\n```\npython utils/prepare_weights.py integrate /path/to/your/localfuser.ckpt /path/to/your/globalfuser configs/cnc_v1.yaml init_cnc.ckpt\n\nzsh train_cnc.sh\n```\n\n## Acknowledgements🤍\nThis repository is built upon [LDM](https://github.com/CompVis/latent-diffusion), [ControlNet](https://github.com/lllyasviel/ControlNet/tree/main), [Uni-ControlNet](https://github.com/ShihaoZhaoZSH/Uni-ControlNet), and [U\u003csup\u003e2\u003c/sup\u003e-Net](https://github.com/xuebinqin/U-2-Net). ya bois are the real mvps.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomtom1103%2Fcompose-and-conquer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomtom1103%2Fcompose-and-conquer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomtom1103%2Fcompose-and-conquer/lists"}