https://github.com/tencentarc/brushedit
[TPAMI under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"
https://github.com/tencentarc/brushedit
diffusion-models image-editing image-inpainting
Last synced: about 1 year ago
JSON representation
[TPAMI under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"
- Host: GitHub
- URL: https://github.com/tencentarc/brushedit
- Owner: TencentARC
- License: other
- Created: 2024-12-16T07:17:15.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-26T07:48:55.000Z (over 1 year ago)
- Last Synced: 2025-04-11T16:50:41.768Z (about 1 year ago)
- Topics: diffusion-models, image-editing, image-inpainting
- Language: Python
- Homepage: https://liyaowei-stu.github.io/project/BrushEdit/
- Size: 53.3 MB
- Stars: 547
- Watchers: 6
- Forks: 26
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# BrushEdit
π This repository contains the implementation of "BrushEdit: All-In-One Image Inpainting and Editing".
Keywords: Image Inpainting, Image Generation, Image Editing, Diffusion Models, MLLM Agent, Instruction-basd Editing
> TL;DR: BrushEdit is an advanced, unified AI agent for image inpainting and editing.
> Main Elements: π οΈ Fully automated / π€ Interactive editing.
>[Yaowei Li](https://github.com/liyaowei-stu)1*, [Yuxuan Bian](https://yxbian23.github.io/)3*, [Xuan Ju](https://github.com/juxuan27)3*, [Zhaoyang Zhang](https://zzyfd.github.io/#/)2β‘, [Junhao Zhuang](https://github.com/zhuang2002)4, [Ying Shan](https://www.linkedin.com/in/YingShanProfile/)2β, [Yuexian Zou](https://www.ece.pku.edu.cn/info/1046/2146.htm)1β
, [Qiang Xu](https://cure-lab.github.io/)3β
>1Peking University 2ARC Lab, Tencent PCG 3The Chinese University of Hong Kong 4Tsinghua University
*Equal Contribution β‘Project Lead βCorresponding Author
πProject Page |
πArxiv |
πΉVideo |
π€Hugging Face Demo |
π€Hugging Model |
https://github.com/user-attachments/assets/fde82f21-8b36-4584-8460-c109c195e614
4K HD Introduction Video: [Youtube](https://www.youtube.com/watch?v=nDB7un9Rbdk).
**π Table of Contents**
- [BrushEdit](#brushedit)
- [TODO](#todo)
- [π οΈ Pipeline Overview](#οΈ-pipeline-overview)
- [π Getting Started](#-getting-started)
- [Environment Requirement π](#environment-requirement-)
- [Download Checkpoints πΎ](#download-checkpoints-)
- [ππΌ Running Scripts](#-running-scripts)
- [π€ BrushEidt demo](#-brusheidt-demo)
- [π» Demo Features](#-demo-features)
- [π€πΌ Cite Us](#-cite-us)
- [π Acknowledgement](#-acknowledgement)
- [β Contact](#-contact)
## TODO
- [X] Release the code of BrushEdit. (MLLM-dirven Agent for Image Editing and Inpainting)
- [X] Release the paper and webpage. More info: [BrushEdit](https://liyaowei-stu.github.io/project/BrushEdit/)
- [X] Release the BrushNetX checkpoint(a more powerful BrushNet).
- [X] Release gradio demo.
## π οΈ Pipeline Overview
BrushEdit consists of four main steps: (i) Editing category classification: determine the type of editing required. (ii) Identification of the primary editing object: Identify the main object to be edited. (iii) Acquisition of the editing mask and target Caption: Generate the editing mask and corresponding target caption. (iv) Image inpainting: Perform the actual image editing. Steps (i) to (iii) utilize pre-trained MLLMs and detection models to ascertain the editing type, target object, editing masks, and target caption. Step (iv) involves image editing using the dual-branch inpainting model improved BrushNet. This model inpaints the target areas based on the target caption and editing masks, leveraging the generative potential and background preservation capabilities of inpainting models.

## π Getting Started
### Environment Requirement π
BrushEdit has been implemented and tested on CUDA118, Pytorch 2.0.1, python 3.10.6.
Clone the repo:
```
git clone https://github.com/TencentARC/BrushEdit.git
```
We recommend you first use `conda` to create virtual environment, and install `pytorch` following [official instructions](https://pytorch.org/). For example:
```
conda create -n brushedit python=3.10.6 -y
conda activate brushedit
python -m pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
```
Then, you can install diffusers (implemented in this repo) with:
```
pip install -e .
```
After that, you can install required packages thourgh:
```
pip install -r app/requirements.txt
```
### Download Checkpoints πΎ
Checkpoints of BrushEdit can be downloaded using the following command.
```
sh app/down_load_brushedit.sh
```
**The ckpt folder contains**
- BrushNetX pretrained checkpoints for Stable Diffusion v1.5 (`brushnetX`)
- Pretrained Stable Diffusion v1.5 checkpoint (e.g., realisticVisionV60B1_v51VAE from [Civitai](https://civitai.com/)). You can use `scripts/convert_original_stable_diffusion_to_diffusers.py` to process other models downloaded from Civitai.
- Pretrained GroundingDINO checkpoint from [offical](https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth).
- Pretrained SAM checkpoint from [offical](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth).
The checkpoint structure should be like:
```
|-- models
|-- base_model
|-- realisticVisionV60B1_v51VAE
|-- model_index.json
|-- vae
|-- ...
|-- dreamshaper_8
|-- ...
|-- epicrealism_naturalSinRC1VAE
|-- ...
|-- meinamix_meinaV11
|-- ...
|-- ...
|-- brushnetX
|-- config.json
|-- diffusion_pytorch_model.safetensors
|-- grounding_dino
|-- groundingdino_swint_ogc.pth
|-- sam
|-- sam_vit_h_4b8939.pth
|-- vlm
|-- llava-v1.6-mistral-7b-hf
|-- ...
|-- llava-v1.6-vicuna-13b-hf
|-- ...
|-- Qwen2-VL-7B-Instruct
|-- ...
|-- ...
```
We provide five base diffusion models, including:
- Dreamshapre_8 is a versatile model that can generate impressive portraits and landscape images.
- Epicrealism_naturalSinRC1VAE is a realistic style model that excels at generating portraits
- HenmixReal_v5c is a model that specializes in generating realistic images of women.
- Meinamix_meinaV11 is a model that excels at generating images in an animated style.
- RealisticVisionV60B1_v51VAE is a highly generalized realistic style model.
The BrushNetX checkpoint represents an enhanced version of BrushNet, having been trained on a more diverse dataset to improve its editing capabilities, such as deletion and replacement.
We provide two VLM models, including Qwen2-VL-7B-Instruct and LLama3-LLaa-next-8b-hf. **We strongly recommend using GPT-4o for reasoning.** After selecting the VLM model as gpt4-o, enter the API KEY and click the Submit and Verify button. If the output is success, you can use gpt4-o normally. Secondarily, we recommend using the Qwen2VL model.
And you can download more prefromhuggingface_hubimporthf_hub_download, snapshot_downloadtrained VLMs model from [QwenVL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) and [LLaVA-Next](https://huggingface.co/collections/llava-hf/llava-next-65f75c4afac77fd37dbbe6cf).
## ππΌ Running Scripts
### π€ BrushEidt demo
You can run the demo using the script:
```
sh app/run_app.sh
```
### π» Demo Features

π‘ Fundamental Features:
- π¨ Aspect Ratio: Select the aspect ratio of the image. To prevent OOM, 1024px is the maximum resolution.
- π¨ VLM Model: Select the VLM model. We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
- π¨ Generate Mask: According to the input instructions, generate a mask for the area that may need to be edited.
- π¨ Square/Circle Mask: Based on the existing mask, generate masks for squares and circles. (The coarse-grained mask provides more editing imagination.)
- π¨ Invert Mask: Invert the mask to generate a new mask.
- π¨ Dilation/Erosion Mask: Expand or shrink the mask to include or exclude more areas.
- π¨ Move Mask: Move the mask to a new position.
- π¨ Generate Target Prompt: Generate a target prompt based on the input instructions.
- π¨ Target Prompt: Description for masking area, manual input or modification can be made when the content generated by VLM does not meet expectations.
- π¨ Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
- π¨ Control length: The intensity of editing and inpainting.
π‘ Advanced Features:
- π¨ Base Model: We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
- π¨ Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
- π¨ Control length: The intensity of editing and inpainting.
- π¨ Num samples: The number of samples to generate.
- π¨ Negative prompt: The negative prompt for the classifier-free guidance.
- π¨ Guidance scale: The guidance scale for the classifier-free guidance.
## π€πΌ Cite Us
```
@misc{li2024brushedit,
title={BrushEdit: All-In-One Image Inpainting and Editing},
author={Yaowei Li and Yuxuan Bian and Xuan Ju and Zhaoyang Zhang and and Junhao Zhuang and Ying Shan and Yuexian Zou and Qiang Xu},
year={2024},
eprint={2412.10316},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## π Acknowledgement
Our code is modified based on [diffusers](https://github.com/huggingface/diffusers) and [BrushNet](https://github.com/TencentARC/BrushNet) here, thanks to all the contributors!
## β Contact
For any question, feel free to email `liyaowei01@gmail.com`.
## π Star History