An open API service indexing awesome lists of open source software.

https://github.com/tencentarc/blobctrl

[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
https://github.com/tencentarc/blobctrl

aigc image-editing

Last synced: about 1 year ago
JSON representation

[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Awesome Lists containing this project

README

          

# BlobCtrl

๐Ÿ˜ƒ This repository contains the implementation of "BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing".

Keywords: Image Generation, Image Editing, Diffusion Models, Element-level

> TL;DR: BlobCtrl enables precise, user-friendly multi-round element-level visual manipulation.

> Main Features: ๐Ÿฆ‰Element-level Add/Remove/Move/Replace/Enlarge/Shrink.

> [Yaowei Li](https://github.com/liyaowei-stu) 1, [Lingen Li](https://lg-li.github.io/) 3, [Zhaoyang Zhang](https://zzyfd.github.io/#/) 2โ€ก, [Xiaoyu Li](https://github.com/zhuang2002) 2, [Guangzhi Wang](http://gzwang.xyz/) 2, [Hongxiang Li](https://lihxxx.github.io/) 1, [Xiaodong Cun](https://vinthony.github.io/academic/) 2, [Ying Shan](https://www.linkedin.com/in/YingShanProfile/) 2, [Yuexian Zou](https://www.ece.pku.edu.cn/info/1046/2146.htm) 1โœ‰

> 1Peking University 2ARC Lab, Tencent PCG 3The Chinese University of Hong Kong โ€กProject Lead โœ‰Corresponding Author


๐ŸŒProject Page |
๐Ÿ“œArxiv |
๐Ÿ“นVideo |
๐Ÿค—Hugging Face Demo |
๐Ÿค—Hugging Model


๐Ÿค—Hugging Data (TBD) |
๐Ÿค—Hugging Benchmark (TBD)

https://github.com/user-attachments/assets/ec5fab3c-fa84-4f5d-baf9-1e744f577515

Youtube Introduction Video: [Youtube](https://youtu.be/rdR4QRR-mbE).

**๐Ÿ“– Table of Contents**

- [BlobCtrl](#blobctrl)
- [๐Ÿ”ฅ Update Logs](#-update-logs)
- [๐Ÿ› ๏ธ Method Overview](#๏ธ-method-overview)
- [๐Ÿš€ Getting Started](#-getting-started)
- [๐Ÿƒ๐Ÿผ Running Scripts](#-running-scripts)
- [๐Ÿค๐Ÿผ Cite Us](#-cite-us)
- [๐Ÿ’– Acknowledgement](#-acknowledgement)
- [โ“ Contact](#-contact)
- [๐ŸŒŸ Star History](#-star-history)

## ๐Ÿ”ฅ Update Logs

- [TBD] Release the data preprocessing code.
- [TBD] Release the BlobData and BlobBench.
- [TBD] Release the training code
- [X] [20/03/2025] Release the inference code.
- [X] [17/03/2025] Release the paper, webpage and gradio demo.

## ๐Ÿ› ๏ธ Method Overview

We introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks, offering a practical solution for precise and flexible visual content creation.



## ๐Ÿš€ Getting Started

Environment Requirement ๐ŸŒ


BlobCtrl has been implemented and tested on CUDA121, Pytorch 2.2.0, python 3.10.15.

Clone the repo:

```
git clone git@github.com:TencentARC/BlobCtrl.git
```

We recommend you first use `conda` to create virtual environment, and install needed libraries. For example:

```
conda create -n blobctrl python=3.10.15 -y
conda activate blobctrl
python -m pip install --upgrade pip
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install xformers torch==2.2.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
```

Then, you can install diffusers (implemented in this repo) with:

```
pip install -e .
```

Download Model Checkpoints ๐Ÿ’พ


Download the corresponding checkpoints of BlobCtrl.

```
sh examples/blobctrl/scripts/download_models.sh
```

**The ckpt folder contains**

- Our provided [BlobCtrl](https://huggingface.co/Yw22/BlobCtrl) checkpoints (`UNet LoRA` + `BlobNet`).
- Pretrained [SD-v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint.
- Pretrained [DINOv2](https://huggingface.co/facebook/dinov2-large) checkpoint.
- Pretrained [SAM](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth) checkpoint.

The checkpoint structure should be like:

```
|-- models
|-- blobnet
|-- config.json
|-- diffusion_pytorch_model.safetensors
|-- dinov2-large
|-- config.json
|-- model.safetensors
...
|-- sam
|-- sam_vit_h_4b8939.pth
|-- unet_lora
|-- pytorch_lora_weights.safetensors
```

## ๐Ÿƒ๐Ÿผ Running Scripts

BlobCtrl demo ๐Ÿค—


You can run the demo using the script:

```
sh examples/blobctrl/scripts/run_app.sh
```

BlobCtrl Inference ๐ŸŒ 


You can run the inference using the script:

```
examples/blobctrl/scripts/inference.sh
```

## ๐Ÿค๐Ÿผ Cite Us

```
@misc{li2024brushedit,
title={BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing},
author={Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou},
year={2025},
eprint={2503.13434},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

## ๐Ÿ’– Acknowledgement

Our implementation builds upon the [diffusers](https://github.com/huggingface/diffusers) library. We extend our sincere gratitude to all the contributors of the diffusers project!

We also acknowledge the [BlobGAN](https://github.com/dave-epstein/blobgan) project for providing valuable insights and inspiration for our blob-based representation approach.

## โ“ Contact

For any question, feel free to email `liyaowei01@gmail.com`.

## ๐ŸŒŸ Star History



Star History Chart