https://github.com/tencentarc/blobctrl
[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
https://github.com/tencentarc/blobctrl
aigc image-editing
Last synced: about 1 year ago
JSON representation
[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
- Host: GitHub
- URL: https://github.com/tencentarc/blobctrl
- Owner: TencentARC
- License: other
- Created: 2025-03-17T07:29:29.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-20T05:20:02.000Z (over 1 year ago)
- Last Synced: 2025-04-13T00:53:04.816Z (about 1 year ago)
- Topics: aigc, image-editing
- Language: Python
- Homepage: https://liyaowei-stu.github.io/project/BlobCtrl/
- Size: 51.2 MB
- Stars: 79
- Watchers: 9
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# BlobCtrl
๐ This repository contains the implementation of "BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing".
Keywords: Image Generation, Image Editing, Diffusion Models, Element-level
> TL;DR: BlobCtrl enables precise, user-friendly multi-round element-level visual manipulation.
> Main Features: ๐ฆElement-level Add/Remove/Move/Replace/Enlarge/Shrink.
> [Yaowei Li](https://github.com/liyaowei-stu) 1, [Lingen Li](https://lg-li.github.io/) 3, [Zhaoyang Zhang](https://zzyfd.github.io/#/) 2โก, [Xiaoyu Li](https://github.com/zhuang2002) 2, [Guangzhi Wang](http://gzwang.xyz/) 2, [Hongxiang Li](https://lihxxx.github.io/) 1, [Xiaodong Cun](https://vinthony.github.io/academic/) 2, [Ying Shan](https://www.linkedin.com/in/YingShanProfile/) 2, [Yuexian Zou](https://www.ece.pku.edu.cn/info/1046/2146.htm) 1โ
> 1Peking University 2ARC Lab, Tencent PCG 3The Chinese University of Hong Kong โกProject Lead โCorresponding Author
๐Project Page |
๐Arxiv |
๐นVideo |
๐คHugging Face Demo |
๐คHugging Model
๐คHugging Data (TBD) |
๐คHugging Benchmark (TBD)
https://github.com/user-attachments/assets/ec5fab3c-fa84-4f5d-baf9-1e744f577515
Youtube Introduction Video: [Youtube](https://youtu.be/rdR4QRR-mbE).
**๐ Table of Contents**
- [BlobCtrl](#blobctrl)
- [๐ฅ Update Logs](#-update-logs)
- [๐ ๏ธ Method Overview](#๏ธ-method-overview)
- [๐ Getting Started](#-getting-started)
- [๐๐ผ Running Scripts](#-running-scripts)
- [๐ค๐ผ Cite Us](#-cite-us)
- [๐ Acknowledgement](#-acknowledgement)
- [โ Contact](#-contact)
- [๐ Star History](#-star-history)
## ๐ฅ Update Logs
- [TBD] Release the data preprocessing code.
- [TBD] Release the BlobData and BlobBench.
- [TBD] Release the training code
- [X] [20/03/2025] Release the inference code.
- [X] [17/03/2025] Release the paper, webpage and gradio demo.
## ๐ ๏ธ Method Overview
We introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks, offering a practical solution for precise and flexible visual content creation.
## ๐ Getting Started
Environment Requirement ๐
BlobCtrl has been implemented and tested on CUDA121, Pytorch 2.2.0, python 3.10.15.
Clone the repo:
```
git clone git@github.com:TencentARC/BlobCtrl.git
```
We recommend you first use `conda` to create virtual environment, and install needed libraries. For example:
```
conda create -n blobctrl python=3.10.15 -y
conda activate blobctrl
python -m pip install --upgrade pip
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install xformers torch==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
```
Then, you can install diffusers (implemented in this repo) with:
```
pip install -e .
```
Download Model Checkpoints ๐พ
Download the corresponding checkpoints of BlobCtrl.
```
sh examples/blobctrl/scripts/download_models.sh
```
**The ckpt folder contains**
- Our provided [BlobCtrl](https://huggingface.co/Yw22/BlobCtrl) checkpoints (`UNet LoRA` + `BlobNet`).
- Pretrained [SD-v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint.
- Pretrained [DINOv2](https://huggingface.co/facebook/dinov2-large) checkpoint.
- Pretrained [SAM](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth) checkpoint.
The checkpoint structure should be like:
```
|-- models
|-- blobnet
|-- config.json
|-- diffusion_pytorch_model.safetensors
|-- dinov2-large
|-- config.json
|-- model.safetensors
...
|-- sam
|-- sam_vit_h_4b8939.pth
|-- unet_lora
|-- pytorch_lora_weights.safetensors
```
## ๐๐ผ Running Scripts
BlobCtrl demo ๐ค
You can run the demo using the script:
```
sh examples/blobctrl/scripts/run_app.sh
```
BlobCtrl Inference ๐
You can run the inference using the script:
```
examples/blobctrl/scripts/inference.sh
```
## ๐ค๐ผ Cite Us
```
@misc{li2024brushedit,
title={BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing},
author={Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou},
year={2025},
eprint={2503.13434},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## ๐ Acknowledgement
Our implementation builds upon the [diffusers](https://github.com/huggingface/diffusers) library. We extend our sincere gratitude to all the contributors of the diffusers project!
We also acknowledge the [BlobGAN](https://github.com/dave-epstein/blobgan) project for providing valuable insights and inspiration for our blob-based representation approach.
## โ Contact
For any question, feel free to email `liyaowei01@gmail.com`.
## ๐ Star History