Awesome-Controllable-Diffusion

Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.
https://github.com/atfortes/Awesome-Controllable-Diffusion

Last synced: 3 days ago
JSON representation

2024
- Cross-Image Attention for Zero-Shot Appearance Transfer. - image-attention)] [[paper](https://arxiv.org/abs/2311.03335)] [[code](https://github.com/garibida/cross-image-attention)]
- The Chosen One: Consistent Characters in Text-to-Image Diffusion Models. - chosen-one/)] [[paper](https://arxiv.org/abs/2311.10093)] [[code](https://github.com/ZichengDuan/TheChosenOne)]
- MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion.
- PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding. - maker.github.io/)] [[paper](https://arxiv.org/abs/2312.04461)] [[code](https://github.com/TencentARC/PhotoMaker)]
- InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning.
- Context Diffusion: In-Context Aware Image Generation.
- PALP: Prompt Aligned Personalization of Text-to-Image Models. - aligned.github.io/)] [[paper](https://arxiv.org/abs/2401.06105)]
- Training-Free Consistent Text-to-Image Generation.
- InstanceDiffusion: Instance-level Control for Image Generation. - xwang/InstanceDiffusion)]
- ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback. - ai.github.io/ControlNet_Plus_Plus/)] [[paper](https://arxiv.org/abs/2404.07987)] [[code](https://github.com/liming-ai/ControlNet_Plus_Plus)]
- Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model. - adapter.github.io/)] [[paper](https://arxiv.org/abs/2404.09967)] [[code](https://github.com/HL-hanlin/Ctrl-Adapter)]
- Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding.
- Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs. - DiffusionMaster)]
- Kosmos-G: Generating Images in Context with Multimodal Large Language Models - G)]
- Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition.
- Visual Style Prompting with Swapping Self-Attention. - ai/Visual-Style-Prompting)]
- RealCompo: Dynamic Equilibrium between Realism and Composition Improves Text-to-Image Diffusion Models.
- IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation. - hah/IFAdapter)]
- CSGO: Content-Style Composition in Text-to-Image Generation. - gen.github.io/)] [[paper](https://arxiv.org/abs/2408.16766)] [[code](https://github.com/instantX-research/CSGO)]
- Generative Photomontage.
- Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches.
- - -Driven-orange?style=flat-square)
- IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts. - research.github.io/IP-Adapter-Instruct.github.io/)] [[paper](https://arxiv.org/abs/2408.03209)] [[code](https://github.com/unity-research/IP-Adapter-Instruct)]
- ViPer: Visual Personalization of Generative Models via Individual Preference Learning. - VILAB/ViPer)]
- Training-free Composite Scene Generation for Layout-to-Image Synthesis. - F/csg)]
- SEED-Story: Multimodal Long Story Generation with Large Language Model. - Story)]
- Sketch-Guided Scene Image Generation.
- Instant 3D Human Avatar Generation using Image Diffusion Models.
- Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance. - x/)] [[paper](https://arxiv.org/abs/2406.07540)] [[code](https://github.com/genforce/ctrl-x)]
- Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis. - AI-Research/Zero-Painter)]
- pOps: Photo-Inspired Diffusion Operators.
- Personalized Residuals for Concept-Driven Text-to-Image Generation. - residuals/)] [[paper](https://arxiv.org/abs/2405.12978)]
- StyleBooth: Image Style Editing with Multimodal Instruction. - vilab.github.io/stylebooth-page/)] [[paper](https://arxiv.org/abs/2404.12154)] [[code](https://github.com/modelscope/scepter)]
- MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation. - research.github.io/mixture-of-attention/)] [[paper](https://arxiv.org/abs/2404.11565)] [[code](https://github.com/snap-research/mixture-of-attention)]
- Continuous Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions. - control/)] [[paper](https://arxiv.org/abs/2403.17064)] [[code](https://github.com/CompVis/attribute-control)]
- Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation. - Your-3D/)] [[paper](https://arxiv.org/abs/2403.09625)] [[code](https://github.com/liuff19/Make-Your-3D)]
- FeedFace: Efficient Inference-based Face Personalization via Diffusion Models. - cd/FeedFace)]
- Multi-LoRA Composition for Image Generation. - LoRA-Composition/)] [[paper](https://arxiv.org/abs/2402.16843)] [[code](https://github.com/maszhongming/Multi-LoRA-Composition)]
- Direct Consistency Optimization for Compositional Text-to-Image Personalization. - t2i.github.io/)] [[paper](https://arxiv.org/abs/2402.12004)] [[code](https://github.com/kyungmnlee/dco)]
- UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion. - ptm.github.io/)] [[paper](https://arxiv.org/abs/2401.13388)]
- Style Aligned Image Generation via Shared Attention. - aligned-gen.github.io/)] [[paper](https://arxiv.org/abs/2312.02133)] [[code](https://github.com/google/style-aligned/)]
- Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models.
- MultiBooth: Towards Generating All Your Concepts in an Image from Text.
- InstantID: Zero-shot Identity-Preserving Generation in Seconds.
- Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation. - attention/)] [[paper](https://arxiv.org/abs/2403.16990)] [[code](https://github.com/omer11a/bounded-attention)]
- FlashFace: Human Image Personalization with High-fidelity Identity Preservation. - page/)] [[paper](https://arxiv.org/abs/2403.17008)] [[code](https://github.com/ali-vilab/FlashFace)]
- Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models.
- Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models. - t2i.github.io/)] [[paper](https://arxiv.org/abs/2404.04243)]
- MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models. - gk.github.io/maxfusion.github.io/)] [[paper](https://arxiv.org/abs/2404.09977)] [[code](https://github.com/Nithin-GK/MaxFusion)]
- - a50b5e?style=flat-square)
- InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation.
- StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation.
- Customizing Text-to-Image Models with a Single Image Pair.
- MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation. - adapter.github.io/)] [[paper](https://arxiv.org/abs/2404.05674)] [[code](https://github.com/bytedance/MoMA/tree/main)]
- PuLID: Pure and Lightning ID Customization via Contrastive Alignment.
- Compositional Text-to-Image Generation with Dense Blob Representations. - 2d.github.io/)] [[paper](https://arxiv.org/abs/2405.08246)]
- FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition. - uofa.github.io/FreeCustom/)] [[paper](https://arxiv.org/abs/2405.13870)] [[code](https://github.com/aim-uofa/FreeCustom)]
- RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control. - modulation.github.io/)] [[paper](https://arxiv.org/abs/2405.17401)] [[code](https://github.com/google/RB-Modulation)]
Consistency Models
- CCM: Adding Conditional Controls to Text-to-Image Consistency Models - blue?style=flat-square)![](https://img.shields.io/badge/Layout-a50b5e?style=flat-square)
- PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models - blue?style=flat-square)![](https://img.shields.io/badge/Layout-a50b5e?style=flat-square)
Diffusion Models
2023
- ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs.
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation.
- Subject-driven Text-to-Image Generation via Apprenticeship Learning.
- StyleDrop: Text-to-Image Generation in Any Style.
- BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing. - Diffusion-website/)] [[paper](https://arxiv.org/abs/2305.14720)] [[code](https://github.com/salesforce/LAVIS/tree/main/projects/blip-diffusion)]
- Adding Conditional Control to Text-to-Image Diffusion Models.
- T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. - Adapter)]
- GLIGEN: Open-Set Grounded Text-to-Image Generation.
- Awesome-LLM-Reasoning
- Awesome-Controllable-T2I-Diffusion-Models - to-image diffusion models.
- Multi-Concept Customization of Text-to-Image Diffusion. - diffusion/)] [[paper](https://arxiv.org/abs/2212.04488)] [[code](https://github.com/adobe-research/custom-diffusion)]
- Regional Prompter
- Face0: Instantaneously Conditioning a Text-to-Image Model on a Face.
- Controlling Text-to-Image Diffusion by Orthogonal Finetuning.
- Zero-shot spatial layout conditioning for text-to-image diffusion models.
- IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. - adapter.github.io/)] [[paper](https://arxiv.org/abs/2308.06721)] [[code]()]
- - -Driven-orange?style=flat-square)
Benchmark
- VQAv2
- TAG
- Bongard-HOI - object interactions (HOIs) from natural images.
- ARC - like form of general fluid intelligence.
- SCIENCEQA
- ARO
- OK-VQA
- A-OKVQA - based visual question answering benchmark.
- NExT-QA
- GQA - world images.
- VQA
2025
- Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models. - diffusion/)] [[paper](https://arxiv.org/abs/2503.08434)] [[code](https://github.com/atfortes/BokehDiffusion)]
Technique
- [Project
- [Blog
- [Project - liu/LLaVA)], 2023.10
- [Project - CAIR/MiniGPT-4)], 2023.4
- [Paper
- [Paper
- [Paper
- [Paper - gpt.github.io/)], 2023.7
- [Paper
- [Paper - stanford/med-flamingo)], 2023.7
- [Paper - VL)], 2023.8
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Project - research/google-research/tree/master/socraticmodels)], 2022.4
- [Paper - science/mm-cot)], 2023.2
- [Paper - chatgpt)], 2023.3
- [Project - REACT)] [[Demo](https://huggingface.co/spaces/microsoft-cognitive-service/mm-react)], 2023.3
- [Paper
- [Paper - portal/Link-Context-Learning)], 20233.8
- [Paper
- [Project - iep)], 2017.5
- [Project - vqa)], 2018.10
- [Project
- [Project - columbia/viper)], 2023.3
- [Paper
- [Project - llm)], 2023.4
- [Paper
- [Project - vid.github.io/#video-demos)], 2023.10
- [Paper
- [Blog
- [Paper
- [Paper
- [Project
Star History
- ![Star History Chart - history.com/#atfortes/Awesome-Controllable-Generation&Timeline)
- ![Star History Chart - history.com/#atfortes/Awesome-Controllable-Generation&Timeline)

Programming Languages

Categories

2024 58 Technique 36 2023 17 Diffusion Models 15 Benchmark 11 Consistency Models 2 Star History 2 2025 1

Sub Categories

Keywords

awesome 2 chain-of-thought 1 chatgpt 1 cot 1 deepseek 1 deepseek-r1 1 gpt 1 gpt-4o 1 in-context-learning 1 language-models 1 mllm 1 multimodal 1 openai-o1 1 papers 1 prompt 1 prompt-engineering 1 reasoning 1 strawberry 1 awesome-list 1 controllable-generation 1 diffusion-models 1 multi-concept 1 personalization 1 spatial-controls 1 text-to-image 1