{"id":13488670,"url":"https://github.com/mhh0318/Cocktail","last_synced_at":"2025-03-28T01:37:11.644Z","repository":{"id":170874928,"uuid":"647085176","full_name":"mhh0318/Cocktail","owner":"mhh0318","description":null,"archived":false,"fork":false,"pushed_at":"2023-06-02T12:44:52.000Z","size":37116,"stargazers_count":58,"open_issues_count":2,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-08-01T18:39:09.216Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mhh0318.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-05-30T03:06:46.000Z","updated_at":"2024-07-15T08:13:36.000Z","dependencies_parsed_at":"2024-01-16T09:03:06.530Z","dependency_job_id":"5778e06a-4693-441b-bdc1-5114bd3f60c2","html_url":"https://github.com/mhh0318/Cocktail","commit_stats":null,"previous_names":["mhh0318/cocktail"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhh0318%2FCocktail","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhh0318%2FCocktail/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhh0318%2FCocktail/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhh0318%2FCocktail/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mhh0318","download_url":"https://codeload.github.com/mhh0318/Cocktail/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222333976,"owners_count":16968058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:19.820Z","updated_at":"2025-03-28T01:37:11.631Z","avatar_url":"https://github.com/mhh0318.png","language":"Python","funding_links":[],"categories":["Additional conditions"],"sub_categories":[],"readme":"# *Cocktail*🍸: Mixing Multi-Modality Controls for Text-Conditional Image Generation\n\n\n\u003c!-- \u003ca href=\"\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-2203.10821-b31b1b.svg\" height=22.5\u003e\u003c/a\u003e  --\u003e\n\u003ca href=\"https://mhh0318.github.io/cocktail/\"\u003e\u003cimg src=\"https://img.shields.io/badge/Web-Project Page-brightgreen.svg\" height=22.5\u003e\u003c/a\u003e\n\u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg\" height=22.5\u003e\u003c/a\u003e \n\u003ca href=\"https://huggingface.co/MichaelHu/cocktail\"\u003e\u003cimg src=\"https://img.shields.io/badge/HuggingFace-Checkpoint-yellow.svg\" height=22.5\u003e\u003c/a\u003e \n\n![img](samples/results/3_sample_0_mark.png)\n\n*James Bond is drinking Cocktail🍸.*\n\n\n\nhttps://github.com/mhh0318/Cocktail/assets/42776955/e2a93a6d-3e36-4e54-8462-b359fa8946fa\n\n\n  \n![img](readme/cktl.png)\n\nOur approach requires only **[one generalized model]**, unlike previous that needed multiple models for mixing multiple modalities. \n\nDifferent from currently existing schemes, our scheme does not require modifications to the modal prior of the base model \u003cstrong\u003eFig.(a)\u003c/strong\u003e, which results in a significant reduction in cost. Also in the face of multiple modalities we do not need multiple models demonstrated in \u003cstrong\u003eFig.(b)\u003c/strong\u003e. Cocktail🍸 fuse the information from multiple modalities like \u003cstrong\u003eFig.(c)\u003c/strong\u003e shown.\n\n## Abstract \n\n![img](readme/teaser.jpg)\n\nWe propose Cocktail, a pipeline to mix various modalities into one embedding, amalgamated with a generalized ControlNet (gControlNet), a controllable normalisation (ControlNorm), and a spatial guidance sampling method, to actualize multi-modal and spatially-refined control for text-conditional diffusion models.\n## Pipeline\n\n![img](readme/ppl.png)\n\nThe parameters indicated by the yellow sections are sourced from the pre-trained model and stay constant, while only those in the blue sections are updated during training, with the gradient back-propagated along the blue arrows. The light grey dashed sections signify additional operations that occur solely during the inference process, specifically, the process of storing attention maps derived from the gControlNet for the sampling stage.\n\n## Results\n\n### [Examples] Cocktail for Multi-modality\n\n![img](readme/fig1.png)\n\n### [Examples] Cocktail for free-modality\n\n![img](readme/free.png)\n\n### [Comparisons] single-modality\n\n![img](readme/fig3.png)\n\n### [Comparisons] multi-modality\n\nHere, the \"cross\" symbol ❌ and the checkmark symbol ✅ denote the unmatched and matched modalities, respectively. It is important to note that our model accurately captures all modalities.\n\n![img](readme/fig5.png)\n![img](readme/fig4.png)\n\n\n\n\n\n## TODO\n\n- [x] Release Gradio Demo\n- [ ] Release sampling codes\n- [x] Release inference codes\n- [x] Release pre-trained models\n\n## Setup\n\n### Installation Requirmenets\n\nYou can create an anaconda environment called `cocktail` with the required dependencies by running:\n\n```\ngit clone https://github.com/mhh0318/cocktail.git\ncd cocktail\nconda env create -f environment.yaml\n```\n\n### Download Pretrained Weights\n\nDownload the pretrained models from [here](https://huggingface.co/MichaelHu/cocktail), and save it to the root dir.\n\n### Gradio Demo\nGradio demo can be launched by:\n```bash\npython gradio_demo.py [--share]\n```\n![img](readme/gradio_demo.png) \n\n### Annotations\nWe use HED, SAN, and OpenPose to extract the sketch map, segmentation map, and human pose map from the image.\n- Extract sketch map:\n```python\npython annotator/hed.py {/path/to/image.png} {/path/to/sketch.png}\n```\n- Extract segmentation map:\n```python\npython annotator/SAN/run.py {/path/to/image.png} {/path/to/seg.png}\n```\n- Extract human pose map:\n```python\npython annotator/openpose/run.py {/path/to/image.png} {/path/to/openpose.png}\n```\n\n### Quick Inference\n\nFor the simultaneous vision-language generation, please run:\n\n```bash\npython ./inference {args}\n```\nargs here can be int 0 or 1, as the provided two example conditions.\n\n\nIf the environment is setup correctly, this command should function properly and generate some results in the folder `./samples/results/{args}_sample_{batch}.png`.\n\n![img](samples/results/0_sample_0.png)\n![img](samples/results/0_sample_1.png)\n![img](samples/results/1_sample_0.png)\n![img](samples/results/1_sample_1.png)\n\n## Comments \n\nOur codebase for the diffusion models builds heavily on [ControlNet](https://github.com/lllyasviel/ControlNet) and  [Stable Diffusion](https://github.com/CompVis/stable-diffusion).\n\nThanks for the opensourcing!\n\n## Citation\nIf you use this code for your research, please cite our paper.\n```\n@article{hu2023cocktail,\n  title = {Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation},\n  author = {Hu, Minghui and Zheng, Jianbin and Liu, Daqing and Zheng, Chuanxia and Wang, Chaoyue and Tao, Dacheng and Cham, Tat-Jen},\n  journal = {arXiv},\n  year = {2023},\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhh0318%2FCocktail","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmhh0318%2FCocktail","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhh0318%2FCocktail/lists"}