https://github.com/tencentarc/brushedit

[TPAMI under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"
https://github.com/tencentarc/brushedit

diffusion-models image-editing image-inpainting

Last synced: about 1 year ago
JSON representation

[TPAMI under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"

Host: GitHub
URL: https://github.com/tencentarc/brushedit
Owner: TencentARC
License: other
Created: 2024-12-16T07:17:15.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-12-26T07:48:55.000Z (over 1 year ago)
Last Synced: 2025-04-11T16:50:41.768Z (about 1 year ago)
Topics: diffusion-models, image-editing, image-inpainting
Language: Python
Homepage: https://liyaowei-stu.github.io/project/BrushEdit/
Size: 53.3 MB
Stars: 547
Watchers: 6
Forks: 26
Open Issues: 10
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff

Awesome Lists containing this project

README

# BrushEdit

😃 This repository contains the implementation of "BrushEdit: All-In-One Image Inpainting and Editing".

Keywords: Image Inpainting, Image Generation, Image Editing, Diffusion Models, MLLM Agent, Instruction-basd Editing

> TL;DR: BrushEdit is an advanced, unified AI agent for image inpainting and editing.

> Main Elements: 🛠️ Fully automated / 🤠 Interactive editing.

>[Yaowei Li](https://github.com/liyaowei-stu)^1*, [Yuxuan Bian](https://yxbian23.github.io/)^3*, [Xuan Ju](https://github.com/juxuan27)^3*, [Zhaoyang Zhang](https://zzyfd.github.io/#/)^2‡, [Junhao Zhuang](https://github.com/zhuang2002)⁴, [Ying Shan](https://www.linkedin.com/in/YingShanProfile/)^2✉, [Yuexian Zou](https://www.ece.pku.edu.cn/info/1046/2146.htm)^1✉
, [Qiang Xu](https://cure-lab.github.io/)^3✉

>¹Peking University ²ARC Lab, Tencent PCG ³The Chinese University of Hong Kong ⁴Tsinghua University
^*Equal Contribution ^‡Project Lead ^✉Corresponding Author

https://github.com/user-attachments/assets/fde82f21-8b36-4584-8460-c109c195e614

4K HD Introduction Video: [Youtube](https://www.youtube.com/watch?v=nDB7un9Rbdk).

**📖 Table of Contents**

- [BrushEdit](#brushedit)
- [TODO](#todo)
- [🛠️ Pipeline Overview](#️-pipeline-overview)
- [🚀 Getting Started](#-getting-started)
- [Environment Requirement 🌍](#environment-requirement-)
- [Download Checkpoints 💾](#download-checkpoints-)
- [🏃🏼 Running Scripts](#-running-scripts)
- [🤗 BrushEidt demo](#-brusheidt-demo)
- [👻 Demo Features](#-demo-features)
- [🤝🏼 Cite Us](#-cite-us)
- [💖 Acknowledgement](#-acknowledgement)
- [❓ Contact](#-contact)

## TODO

- [X] Release the code of BrushEdit. (MLLM-dirven Agent for Image Editing and Inpainting)
- [X] Release the paper and webpage. More info: [BrushEdit](https://liyaowei-stu.github.io/project/BrushEdit/)
- [X] Release the BrushNetX checkpoint(a more powerful BrushNet).
- [X] Release gradio demo.

## 🛠️ Pipeline Overview

BrushEdit consists of four main steps: (i) Editing category classification: determine the type of editing required. (ii) Identification of the primary editing object: Identify the main object to be edited. (iii) Acquisition of the editing mask and target Caption: Generate the editing mask and corresponding target caption. (iv) Image inpainting: Perform the actual image editing. Steps (i) to (iii) utilize pre-trained MLLMs and detection models to ascertain the editing type, target object, editing masks, and target caption. Step (iv) involves image editing using the dual-branch inpainting model improved BrushNet. This model inpaints the target areas based on the target caption and editing masks, leveraging the generative potential and background preservation capabilities of inpainting models.

![teaser](assets/brushedit_teaser.png)

## 🚀 Getting Started

### Environment Requirement 🌍

BrushEdit has been implemented and tested on CUDA118, Pytorch 2.0.1, python 3.10.6.

Clone the repo:

```
git clone https://github.com/TencentARC/BrushEdit.git
```

We recommend you first use `conda` to create virtual environment, and install `pytorch` following [official instructions](https://pytorch.org/). For example:

```
conda create -n brushedit python=3.10.6 -y
conda activate brushedit
python -m pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
```

Then, you can install diffusers (implemented in this repo) with:

```
pip install -e .
```

After that, you can install required packages thourgh:

```
pip install -r app/requirements.txt
```

### Download Checkpoints 💾

Checkpoints of BrushEdit can be downloaded using the following command.

```
sh app/down_load_brushedit.sh
```

**The ckpt folder contains**

- BrushNetX pretrained checkpoints for Stable Diffusion v1.5 (`brushnetX`)
- Pretrained Stable Diffusion v1.5 checkpoint (e.g., realisticVisionV60B1_v51VAE from [Civitai](https://civitai.com/)). You can use `scripts/convert_original_stable_diffusion_to_diffusers.py` to process other models downloaded from Civitai.
- Pretrained GroundingDINO checkpoint from [offical](https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth).
- Pretrained SAM checkpoint from [offical](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth).

The checkpoint structure should be like:

```
|-- models
|-- base_model
|-- realisticVisionV60B1_v51VAE
|-- model_index.json
|-- vae
|-- ...
|-- dreamshaper_8
|-- ...
|-- epicrealism_naturalSinRC1VAE
|-- ...
|-- meinamix_meinaV11
|-- ...
|-- ...
|-- brushnetX
|-- config.json
|-- diffusion_pytorch_model.safetensors
|-- grounding_dino
|-- groundingdino_swint_ogc.pth
|-- sam
|-- sam_vit_h_4b8939.pth
|-- vlm
|-- llava-v1.6-mistral-7b-hf
|-- ...
|-- llava-v1.6-vicuna-13b-hf
|-- ...
|-- Qwen2-VL-7B-Instruct
|-- ...
|-- ...

```

We provide five base diffusion models, including:

- Dreamshapre_8 is a versatile model that can generate impressive portraits and landscape images.
- Epicrealism_naturalSinRC1VAE is a realistic style model that excels at generating portraits
- HenmixReal_v5c is a model that specializes in generating realistic images of women.
- Meinamix_meinaV11 is a model that excels at generating images in an animated style.
- RealisticVisionV60B1_v51VAE is a highly generalized realistic style model.

The BrushNetX checkpoint represents an enhanced version of BrushNet, having been trained on a more diverse dataset to improve its editing capabilities, such as deletion and replacement.

We provide two VLM models, including Qwen2-VL-7B-Instruct and LLama3-LLaa-next-8b-hf. **We strongly recommend using GPT-4o for reasoning.** After selecting the VLM model as gpt4-o, enter the API KEY and click the Submit and Verify button. If the output is success, you can use gpt4-o normally. Secondarily, we recommend using the Qwen2VL model.

And you can download more prefromhuggingface_hubimporthf_hub_download, snapshot_downloadtrained VLMs model from [QwenVL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) and [LLaVA-Next](https://huggingface.co/collections/llava-hf/llava-next-65f75c4afac77fd37dbbe6cf).

## 🏃🏼 Running Scripts

### 🤗 BrushEidt demo

You can run the demo using the script:

```
sh app/run_app.sh
```

### 👻 Demo Features

demo_vis

💡 Fundamental Features:

🎨 Aspect Ratio: Select the aspect ratio of the image. To prevent OOM, 1024px is the maximum resolution.

🎨 VLM Model: Select the VLM model. We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.

🎨 Generate Mask: According to the input instructions, generate a mask for the area that may need to be edited.

🎨 Square/Circle Mask: Based on the existing mask, generate masks for squares and circles. (The coarse-grained mask provides more editing imagination.)

🎨 Invert Mask: Invert the mask to generate a new mask.

🎨 Dilation/Erosion Mask: Expand or shrink the mask to include or exclude more areas.

🎨 Move Mask: Move the mask to a new position.

🎨 Generate Target Prompt: Generate a target prompt based on the input instructions.

🎨 Target Prompt: Description for masking area, manual input or modification can be made when the content generated by VLM does not meet expectations.

🎨 Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)

🎨 Control length: The intensity of editing and inpainting.

💡 Advanced Features:

🎨 Base Model: We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.

🎨 Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)

🎨 Control length: The intensity of editing and inpainting.

🎨 Num samples: The number of samples to generate.

🎨 Negative prompt: The negative prompt for the classifier-free guidance.

🎨 Guidance scale: The guidance scale for the classifier-free guidance.

## 🤝🏼 Cite Us

```
@misc{li2024brushedit,
title={BrushEdit: All-In-One Image Inpainting and Editing},
author={Yaowei Li and Yuxuan Bian and Xuan Ju and Zhaoyang Zhang and and Junhao Zhuang and Ying Shan and Yuexian Zou and Qiang Xu},
year={2024},
eprint={2412.10316},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

```

## 💖 Acknowledgement
Our code is modified based on [diffusers](https://github.com/huggingface/diffusers) and [BrushNet](https://github.com/TencentARC/BrushNet) here, thanks to all the contributors!

## ❓ Contact
For any question, feel free to email `liyaowei01@gmail.com`.

## 🌟 Star History

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tencentarc/brushedit

Awesome Lists containing this project

README