{"id":13488633,"url":"https://essunny310.github.io/FreestyleNet/","last_synced_at":"2025-03-28T01:36:53.828Z","repository":{"id":149122722,"uuid":"615220578","full_name":"essunny310/FreestyleNet","owner":"essunny310","description":"[CVPR 2023 Highlight] Freestyle Layout-to-Image Synthesis","archived":false,"fork":false,"pushed_at":"2023-04-22T16:54:29.000Z","size":4517,"stargazers_count":138,"open_issues_count":2,"forks_count":3,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-08-01T18:38:41.041Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://essunny310.github.io/FreestyleNet/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/essunny310.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-17T08:03:06.000Z","updated_at":"2024-07-12T04:23:18.000Z","dependencies_parsed_at":"2023-05-05T02:31:30.506Z","dependency_job_id":null,"html_url":"https://github.com/essunny310/FreestyleNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/essunny310%2FFreestyleNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/essunny310%2FFreestyleNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/essunny310%2FFreestyleNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/essunny310%2FFreestyleNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/essunny310","download_url":"https://codeload.github.com/essunny310/FreestyleNet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222333976,"owners_count":16968058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:19.273Z","updated_at":"2025-03-28T01:36:53.821Z","avatar_url":"https://github.com/essunny310.png","language":"Python","funding_links":[],"categories":["Additional conditions"],"sub_categories":[],"readme":"# FreestyleNet\nOfficial PyTorch implementation of [Freestyle Layout-to-Image Synthesis](https://arxiv.org/abs/2303.14412)\n\n[![arXiv](https://img.shields.io/badge/arXiv-2303.14412-b31b1b.svg)](https://arxiv.org/abs/2303.14412)\n[![Project Website](https://img.shields.io/badge/🔗-Project_Website-blue.svg)](https://essunny310.github.io/FreestyleNet/)\n\n## Introduction\n\nFreestyleNet is a diffusion-based network that can generate diverse semantics onto a given layout. Compared to prior layout-to-image synthesis methods and text-to-image generation models (e.g., Stable Diffusion), FreestyleNet is armed with more controllability, enabling 1) the generation of semantics beyond the pre-defined semantic categories in the training dataset, and 2) the separate modulation of each class in the layout with text.\n\n![Teaser](./files/teaser.png)\n\nA comparison to [ControlNet](https://github.com/lllyasviel/ControlNet) is provided on our [project website](https://essunny310.github.io/FreestyleNet/).\n\n## Updates\n\n- \\[2023/04/22\\] - Code \u0026 pre-trained models released!\n\n## Requirements\n\nOur code is built upon [Stable Diffusion](https://github.com/CompVis/stable-diffusion). Please clone the repository and set up the environment:\n```\ngit clone https://github.com/essunny310/FreestyleNet.git\ncd FreestyleNet\nconda env create -f environment.yaml\nconda activate freestyle\n```\n\nYou will also need to download the pre-trained Stable Diffusion model (or manually download it from [here](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original)):\n```\nmkdir models/ldm/stable-diffusion\nwget -O models/ldm/stable-diffusion/sd-v1-4-full-ema.ckpt https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4-full-ema.ckpt\n```\n\n## Data Preparation\n\n**COCO-Stuff**. The dataset can be found [here](https://github.com/nightrome/cocostuff). You will need to download *train2017.zip*, *val2017.zip*, and *stuffthingmaps_trainval2017.zip*. Please unzip them and generate two files: `COCO_train.txt` and `COCO_val.txt`, which contain the absolute path of each image (e.g., \"/path/to/dataset/COCO-Stuff/train_img/000000000009.jpg\"). At last, put them under a directory as follows:\n```\nCOCO-Stuff\n    stuffthingmaps_trainval2017/\n        train2017/\n            000000000009.png\n            ...\n        val2017/\n            000000000139.png\n            ...\n    train_img/\n        000000000009.jpg\n        ...\n    val_img/\n        000000000139.jpg\n        ...\n    COCO_train.txt\n    COCO_val.txt\n```\n\n**ADE20K**. The dataset can be downloaded [here](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip). Please unzip it and generate two files: `ADE20K_train.txt` and `ADE20K_val.txt`, just like we just did for COCO-Stuff, then you should get a directory structure as follows:\n```\nADEChallengeData2016\n    annotations/\n        training/\n            ADE_train_00000001.png\n            ...\n        validation/\n            ADE_val_00000001.png\n            ...\n    images/\n        training/\n            ADE_train_00000001.jpg\n            ...\n        validation/\n            ADE_val_00000001.jpg\n            ...\n    ADE20K_train.txt\n    ADE20K_val.txt\n```\n\n## Training\n\nTo train FreestyleNet, run:\n```\npython main.py --base /path/to/config\n               -t\n               --actual_resume models/ldm/stable-diffusion/sd-v1-4-full-ema.ckpt\n               -n \u003cexp_name\u003e\n               --gpus 0,\n               --data_root /path/to/dataset\n               --train_txt_file /path/to/dataset/with/train.txt\n               --val_txt_file /path/to/dataset/with/val.txt\n```\n\nWe provide two training scripts: `train_COCO.sh` and `train_ADE20K.sh`. Please modify `--data_root`,  `--train_txt_file`, and `--val_txt_file` according to the actual path.\n\n## Pre-trained Models\n\nWe provide two models trained on COCO-Stuff and ADE20K respectively.\n* [freestyle-sd-v1-4-coco.ckpt](https://drive.google.com/file/d/1bAGXJKBXaOVRrJYd08LakOFNoWHhFiBm/view?usp=sharing)\n* [freestyle-sd-v1-4-ade20k.ckpt](https://drive.google.com/file/d/1PDoMWRI7EVQc5FyLMClz9e6xtM2DbbD1/view?usp=sharing)\n\n## Generation\n\n### Layout-to-Image Synthesis (LIS)\n\nTo generate images under the traditional LIS setting, run:\n```\npython scripts/LIS.py --batch_size 8\n                      --config /path/to/config\n                      --ckpt /path/to/trained_model\n                      --dataset \u003cdataset name\u003e\n                      --outdir /path/to/output\n                      --txt_file /path/to/dataset/with/val.txt\n                      --data_root /path/to/dataset\n                      --plms \n```\nWe provide two sampling scripts: `sample_COCO.sh` and `sample_ADE20K.sh`. Please modify `--ckpt`, `--txt_file`, and `--data_root` according to the actual path.\n\n### Freestyle Layout-to-Image Synthesis (FLIS)\n\nTo generate images in a freestyle way, you need to prepare a layout image and a json file that defines the mapping between text and layout.\n```\n{\n  \"text_label_mapping\": {\n    \"book\": 83, # each mapping should be formatted as \u003c\"text\": label_value\u003e\n    \"vase\": 85,\n    \"flower\": 118,\n    \"furniture\": 122,\n    \"paper\": 138,\n    \"plastic\": 142,\n    \"table\": 164,\n    \"concrete wall\": 171\n  },\n  \"layout_path\": \"examples/layout_flower.png\"\n}\n```\n* Binding new attributes/generating unseen objects: Just describe the object with the desired attribute or describe a new object, e.g., change \u003c\"flower\": 118\u003e to \u003c\"sunflower\": 118\u003e.\n* Specifying the style: Just add a description of the desired image style, e.g., add a line to the \"text_label_mapping\" such as \u003c\"drawn by Van Gogh\": -1\u003e. Here \"-1\" means that we want to apply the style globally (e.g., no layout constraint).\n\n\nWe provide several examples in `examples/` and you try them out by running:\n```\npython scripts/FLIS.py --config configs/stable-diffusion/v1-inference_FLIS.yaml\n                       --ckpt /path/to/trained_model\n                       --json examples/layout_flower.json \\\n                       --outdir outputs/FLIS \\\n                       --plms \n```                                             \nA reference script `sample_FLIS.sh` is provided as well.\n\n## Citation\n\nIf you find FreestyleNet useful for your work, please kindly consider citing our paper:\n\n```bibtex\n@inproceedings{xue2023freestylenet,\n  title = {Freestyle Layout-to-Image Synthesis},\n  author = {Xue, Han and Huang, Zhiwu and Sun, Qianru and Song, Li and Zhang, Wenjun},\n  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, \n  year = {2023},\n}\n```\n\n## Acknowledgments\n\nOur code borrows heavily from [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [Textual Inversion](https://github.com/rinongal/textual_inversion).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/essunny310.github.io%2FFreestyleNet%2F","html_url":"https://awesome.ecosyste.ms/projects/essunny310.github.io%2FFreestyleNet%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/essunny310.github.io%2FFreestyleNet%2F/lists"}