{"id":13488368,"url":"https://github.com/Nithin-GK/MaxFusion","last_synced_at":"2025-03-28T00:33:45.548Z","repository":{"id":233339534,"uuid":"773856333","full_name":"Nithin-GK/MaxFusion","owner":"Nithin-GK","description":"[ECCV'24] MaxFusion: Plug \u0026 Play multimodal generation in text to image diffusion models","archived":false,"fork":false,"pushed_at":"2024-07-23T17:10:21.000Z","size":9119,"stargazers_count":18,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-31T00:39:54.467Z","etag":null,"topics":["diffusion-models","model-merging","multimodal","plug-and-play","txt2img"],"latest_commit_sha":null,"homepage":"https://nithin-gk.github.io/maxfusion.github.io/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Nithin-GK.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-18T14:17:13.000Z","updated_at":"2024-10-30T08:00:38.000Z","dependencies_parsed_at":"2024-07-23T19:53:58.789Z","dependency_job_id":"00650d26-de8f-4c7e-ab17-2cbd432c7232","html_url":"https://github.com/Nithin-GK/MaxFusion","commit_stats":null,"previous_names":["nithin-gk/maxfusion"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nithin-GK%2FMaxFusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nithin-GK%2FMaxFusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nithin-GK%2FMaxFusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nithin-GK%2FMaxFusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Nithin-GK","download_url":"https://codeload.github.com/Nithin-GK/MaxFusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245949278,"owners_count":20698913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","model-merging","multimodal","plug-and-play","txt2img"],"created_at":"2024-07-31T18:01:14.506Z","updated_at":"2025-03-28T00:33:40.524Z","avatar_url":"https://github.com/Nithin-GK.png","language":"Jupyter Notebook","funding_links":[],"categories":["T2I Diffusion Model augmentation"],"sub_categories":[],"readme":"\n\n\u003ch2 align=\"center\"\u003e \u003ca href=\"\"\u003eMaxFusion: Plug \u0026 Play multimodal generation in text to image diffusion models\u003c/a\u003e\u003c/h2\u003e\n\n\u003ch5 align=\"center\"\u003e If you like our project, please give us a star ⭐ on GitHub for latest update.  \u003c/h2\u003e\n\n\n\n\u003ch5 align=\"center\"\u003e\n    \n[![hf_space](https://img.shields.io/badge/🤗-Open%20In%20Spaces-blue.svg)]()\n[![project page](https://img.shields.io/badge/Project%20Page-8A2BE2)](https://nithin-gk.github.io/maxfusion.github.io/)\n[![arXiv](https://img.shields.io/badge/Arxiv-2404.09977-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2404.09977) \u003cbr\u003e\n\n\n\n## Applications\n\u003cimg src=\"./assets/img1.png\" width=\"100%\"\u003e\n\n\u003cimg src=\"./assets/img2.png\" width=\"100%\"\u003e\n\n\nKeywords: Multimodal Generation,   Text to image generation, Plug and Play\n\n\nWe propose **MaxFusion**, a plug and play framework for multimodal generation using text to image diffusion models.\n    *(a) Multimodal generation*. We address the problem of conflicting spatial conditioning for text to iamge models .\n    *(b) Saliency in variance maps*. We discover that the variance maps of different feature layers expresses the strength og conditioning.\n\n\u003cbr\u003e\n\n\n### Contributions:\n\n- We tackle the need for training with paired data for multi-task conditioning using diffusion models.\n- We propose a novel variance-based feature merging strategy for diffusion models.\n- Our method allows us to use combined information to influence the output, unlike individual models that are limited to a single condition.\n- Unlike previous solutions, our approach is easily scalable and can be added on top of off-the-shelf models.\n\n\u003c!-- \u003cp align=\"center\"\u003e\n  \u003cimg src=\"./utils/intropng.png\" alt=\"Centered Image\" style=\"width: 50%;\"\u003e\n\u003c/p\u003e --\u003e\n\n## Environment setup \n\n\n```\nconda env create -f environment.yml\n```\n\n\n## Code demo:\n\nA notebook for differnt demo conditions is provided in demo.ipynb\n\n\n# Testing On custom datasets \n\nWill be released shortly\n\n##  Instructions for Interactive Demo\n\nAn intractive demo can be run locally using\n\n```\npython gradio_maxfusion.py\n\n```\n\n\n\nThis code is reliant on:\n```\nhttps://github.com/google/prompt-to-prompt/\n```\n\n## Citation\n5. If you use our work, please use the following citation\n\n```\n@article{nair2024maxfusion,\n  title={MaxFusion: Plug\\\u0026Play Multi-Modal Generation in Text-to-Image Diffusion Models},\n  author={Nair, Nithin Gopalakrishnan and Valanarasu, Jeya Maria Jose and Patel, Vishal M},\n  journal={arXiv preprint arXiv:2404.09977},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNithin-GK%2FMaxFusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNithin-GK%2FMaxFusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNithin-GK%2FMaxFusion/lists"}