https://github.com/zituitui/belm

[NeurIPS 2024] Official implementation of "BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models".
https://github.com/zituitui/belm

diffusion-models image-editing neurips-2024 numerical-odes text-to-image-generation

Last synced: about 2 months ago
JSON representation

[NeurIPS 2024] Official implementation of "BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models".

Host: GitHub
URL: https://github.com/zituitui/belm
Owner: zituitui
License: mit
Created: 2024-10-12T07:45:16.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-11-25T07:22:09.000Z (6 months ago)
Last Synced: 2025-03-30T06:03:57.398Z (2 months ago)
Topics: diffusion-models, image-editing, neurips-2024, numerical-odes, text-to-image-generation
Language: Python
Homepage:
Size: 48.7 MB
Stars: 122
Watchers: 2
Forks: 7
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        # BELM: High-quality Exact Inversion sampler of Diffusion Models 🏆



This repository is the official implementation of the **NeurIPS 2024** paper:

_"BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models"_ 

Keywords: Diffusion Model, Exact Inversion, ODE Solver

> **Fangyikang Wang¹, Hubery Yin², Yuejiang Dong³, Huminhao Zhu¹, 
 Chao Zhang¹, Hanbin Zhao¹, Hui Qian¹, Chen Li²**

> 

> ¹Zhejiang University ²WeChat, Tencent Inc. ³Tsinghua University

[![arXiv](https://img.shields.io/badge/arXiv%20paper-2410.07273-b31b1b.svg)](https://arxiv.org/abs/2410.07273) 

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) 

[![Zhihu](https://img.shields.io/badge/zhihu-%E7%9F%A5%E4%B9%8E-informational.svg)](https://zhuanlan.zhihu.com/p/1379396199) 

[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fzituitui%2FBELM&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visitors&edge_flat=false)](https://hits.seeyoufarm.com)





    



![Interpolation Results](assets/editing_show.drawio.png)

![Interpolation Results](assets/belm_inter_show.drawio.png)

## 🆕 What's New?

### 🔥 We use the thought of bidirectional explicit to enable exact inversion

![Some edits](assets/belm_linear.drawio.png)

> **Schematic description** of DDIM (left) and BELM (right). DDIM uses $`\mathbf{x}_i`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)`$ to calculate $`\mathbf{x}_{i-1}`$ based on a linear relation between $`\mathbf{x}_i`$, $`\mathbf{x}_{i-1}`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)`$ (represented by the blue line). However, DDIM inversion uses $`\mathbf{x}_{i-1}`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)`$ to calculate $`\mathbf{x}_{i}`$ based on a different linear relation represented by the red line. This mismatch leads to the inexact inversion of DDIM. In contrast, BELM seeks to establish a linear relation between $`\mathbf{x}_{i-1}`$, $`\mathbf{x}_i`$, $`\mathbf{x}_{i+1}`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i}, i)`$ (represented by the green line). BELM and its inversion are derived from this unitary relation, which facilitates the exact inversion. Specifically, BELM uses the linear combination of $`\mathbf{x}_i`$, $`\mathbf{x}_{i+1}`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)`$ to calculate $`\mathbf{x}_{i-1}`$, and the BELM inversion uses the linear combination of $`\mathbf{x}_{i-1}`$, $`\mathbf{x}_i`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)`$ to calculate $`\mathbf{x}_{i+1}`$. The bidirectional explicit constraint means this linear relation does not include the derivatives at the bidirectional endpoint, that is, $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i+1},i+1)`$.

### 🔥 We introduce a generic formulation of the exact inversion samplers, BELM.

the general k-step BELM:

```math

\bar{\mathbf{x}}_{i-1} = \sum_{j=1}^{k} a_{i,j}\cdot \bar{\mathbf{x}}_{i-1+j} +\sum_{j=1}^{k-1}b_{i,j}\cdot h_{i-1+j}\cdot\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}_{i-1+j},\bar{\sigma}_{i-1+j}).

```

2-step BELM:

```math

\bar{\mathbf{x}}_{i-1} = a_{i,2}\bar{\mathbf{x}}_{i+1} +a_{i,1}\bar{\mathbf{x}}_{i} + b_{i,1} h_i\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}_i,\bar{\sigma}_i).

```

### 🔥 We derive the optimal coefficients for BELM via LTE minimization.



> **Proposition**  The LTE $`\tau_i`$ of BELM diffusion sampler, which is given by $`\tau_i = \bar{\mathbf{x}}(t_{i-1}) - a_{i,2}\bar{\mathbf{x}}(t_{i+1}) -a_{i,1}\bar{\mathbf{x}}(t_{i}) - b_{i,1} h_i\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}(t_i),\bar{\sigma}_i)`$, can be accurate up to $`\mathcal{O}\left({(h_{i}+h_{i+1})}^3\right)`$ when formulae are designed as $`a_{i,1} = \frac{h_{i+1}^2 - h_i^2}{h_{i+1}^2}`$,$`a_{i,2}=\frac{h_i^2}{h_{i+1}^2}`$,$`b_{i,1}=- \frac{h_i+h_{i+1}}{h_{i+1}} `$.



where $`h_i = \frac{\sigma_i}{\alpha_i}-\frac{\sigma_{i-1}}{\alpha{i-1}}`$

the Optimal-BELM (O-BELM) sampler:

```math

\mathbf{x}_{i-1} = \frac{h_i^2}{h_{i+1}^2}\frac{\alpha_{i-1}}{\alpha_{i+1}}\mathbf{x}_{i+1} +\frac{h_{i+1}^2 - h_i^2}{h_{i+1}^2}\frac{\alpha_{i-1}}{\alpha_{i}}\mathbf{x}_{i} - \frac{h_i(h_i+h_{i+1})}{h_{i+1}}\alpha_{i-1}\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i).

```

The inversion of O-BELM diffusion sampler writes:

```math

\mathbf{x}_{i+1}= \frac{h_{i+1}^2}{h_i^2}\frac{\alpha_{i+1}}{\alpha_{i-1}}\mathbf{x}_{i-1} + \frac{h_i^2-h_{i+1}^2}{h_i^2}\frac{\alpha_{i+1}}{\alpha_{i}}\mathbf{x}_{i}+\frac{h_{i+1}(h_i+h_{i+1})}{h_i}\alpha_{i+1} \boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i).

```

## 👨🏻‍💻 Run the code 

### 1) Get start

* Python 3.8.12

* CUDA 11.7

* NVIDIA A100 40GB PCIe

* Torch 2.0.0

* Torchvision 0.14.0

Please follow **[diffusers](https://github.com/huggingface/diffusers)** to install diffusers.

### 2) Run

first, please switch to the root directory.

#### CIFAR10 sampling

```shell

python3 ./scripts/cifar10.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10

```

#### CelebA-HQ sampling

```shell

python3 ./scripts/celeba.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10

```

#### FID evaluation

```shell

python3 ./scripts/celeba.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10

```

#### intrpolation

```shell

python3 ./scripts/interpolate.py --test_num 10 --batch_size 1 --num_inference_steps 100  --save_dir YOUR/SAVE/DIR --model_id xx

```

#### Reconstruction error calculation

```shell

python3 ./scripts/reconstruction.py --test_num 10 --num_inference_steps 100  --directory WHERE/YOUR/IMAGES/ARE --sampler_type belm

```

#### Image editing

```shell

python3 ./scripts/image_editing.py --num_inference_steps 200 --freeze_step 50 --guidance 2.0  --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxxxx/stable-diffusion-v1-5 --ori_im_path images/imagenet_dog_1.jpg --ori_prompt 'A dog' --res_prompt 'A Dalmatian'

```

## 🪪 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 📝 Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

```

@inproceedings{

wang2024belm,

title={{BELM}: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models},

author={Fangyikang Wang and Hubery Yin and Yue-Jiang Dong and Huminhao Zhu and Chao Zhang and Hanbin Zhao and Hui Qian and Chen Li},

booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},

year={2024},

url={https://openreview.net/forum?id=ccQ4fmwLDb}

}

```

## 📩 Contact me

My e-mail address:

```

[email protected]

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zituitui/belm

Awesome Lists containing this project

README