https://github.com/zituitui/belm
[NeurIPS 2024] Official implementation of "BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models".
https://github.com/zituitui/belm
diffusion-models image-editing neurips-2024 numerical-odes text-to-image-generation
Last synced: about 2 months ago
JSON representation
[NeurIPS 2024] Official implementation of "BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models".
- Host: GitHub
- URL: https://github.com/zituitui/belm
- Owner: zituitui
- License: mit
- Created: 2024-10-12T07:45:16.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-11-25T07:22:09.000Z (6 months ago)
- Last Synced: 2025-03-30T06:03:57.398Z (2 months ago)
- Topics: diffusion-models, image-editing, neurips-2024, numerical-odes, text-to-image-generation
- Language: Python
- Homepage:
- Size: 48.7 MB
- Stars: 122
- Watchers: 2
- Forks: 7
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# BELM: High-quality Exact Inversion sampler of Diffusion Models 🏆
This repository is the official implementation of the **NeurIPS 2024** paper:
_"BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models"_Keywords: Diffusion Model, Exact Inversion, ODE Solver
> **Fangyikang Wang1, Hubery Yin2, Yuejiang Dong3, Huminhao Zhu1,
Chao Zhang1, Hanbin Zhao1, Hui Qian1, Chen Li2**
>
> 1Zhejiang University 2WeChat, Tencent Inc. 3Tsinghua University[](https://arxiv.org/abs/2410.07273)
[](https://opensource.org/licenses/MIT)
[](https://zhuanlan.zhihu.com/p/1379396199)
[](https://hits.seeyoufarm.com)
![]()


## 🆕 What's New?
### 🔥 We use the thought of bidirectional explicit to enable exact inversion

> **Schematic description** of DDIM (left) and BELM (right). DDIM uses $`\mathbf{x}_i`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)`$ to calculate $`\mathbf{x}_{i-1}`$ based on a linear relation between $`\mathbf{x}_i`$, $`\mathbf{x}_{i-1}`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)`$ (represented by the blue line). However, DDIM inversion uses $`\mathbf{x}_{i-1}`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)`$ to calculate $`\mathbf{x}_{i}`$ based on a different linear relation represented by the red line. This mismatch leads to the inexact inversion of DDIM. In contrast, BELM seeks to establish a linear relation between $`\mathbf{x}_{i-1}`$, $`\mathbf{x}_i`$, $`\mathbf{x}_{i+1}`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i}, i)`$ (represented by the green line). BELM and its inversion are derived from this unitary relation, which facilitates the exact inversion. Specifically, BELM uses the linear combination of $`\mathbf{x}_i`$, $`\mathbf{x}_{i+1}`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)`$ to calculate $`\mathbf{x}_{i-1}`$, and the BELM inversion uses the linear combination of $`\mathbf{x}_{i-1}`$, $`\mathbf{x}_i`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)`$ to calculate $`\mathbf{x}_{i+1}`$. The bidirectional explicit constraint means this linear relation does not include the derivatives at the bidirectional endpoint, that is, $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)`$ and $`\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i+1},i+1)`$.### 🔥 We introduce a generic formulation of the exact inversion samplers, BELM.
the general k-step BELM:
```math
\bar{\mathbf{x}}_{i-1} = \sum_{j=1}^{k} a_{i,j}\cdot \bar{\mathbf{x}}_{i-1+j} +\sum_{j=1}^{k-1}b_{i,j}\cdot h_{i-1+j}\cdot\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}_{i-1+j},\bar{\sigma}_{i-1+j}).
```2-step BELM:
```math
\bar{\mathbf{x}}_{i-1} = a_{i,2}\bar{\mathbf{x}}_{i+1} +a_{i,1}\bar{\mathbf{x}}_{i} + b_{i,1} h_i\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}_i,\bar{\sigma}_i).
```### 🔥 We derive the optimal coefficients for BELM via LTE minimization.
> **Proposition** The LTE $`\tau_i`$ of BELM diffusion sampler, which is given by $`\tau_i = \bar{\mathbf{x}}(t_{i-1}) - a_{i,2}\bar{\mathbf{x}}(t_{i+1}) -a_{i,1}\bar{\mathbf{x}}(t_{i}) - b_{i,1} h_i\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}(t_i),\bar{\sigma}_i)`$, can be accurate up to $`\mathcal{O}\left({(h_{i}+h_{i+1})}^3\right)`$ when formulae are designed as $`a_{i,1} = \frac{h_{i+1}^2 - h_i^2}{h_{i+1}^2}`$,$`a_{i,2}=\frac{h_i^2}{h_{i+1}^2}`$,$`b_{i,1}=- \frac{h_i+h_{i+1}}{h_{i+1}} `$.
where $`h_i = \frac{\sigma_i}{\alpha_i}-\frac{\sigma_{i-1}}{\alpha{i-1}}`$
the Optimal-BELM (O-BELM) sampler:
```math
\mathbf{x}_{i-1} = \frac{h_i^2}{h_{i+1}^2}\frac{\alpha_{i-1}}{\alpha_{i+1}}\mathbf{x}_{i+1} +\frac{h_{i+1}^2 - h_i^2}{h_{i+1}^2}\frac{\alpha_{i-1}}{\alpha_{i}}\mathbf{x}_{i} - \frac{h_i(h_i+h_{i+1})}{h_{i+1}}\alpha_{i-1}\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i).
```The inversion of O-BELM diffusion sampler writes:
```math
\mathbf{x}_{i+1}= \frac{h_{i+1}^2}{h_i^2}\frac{\alpha_{i+1}}{\alpha_{i-1}}\mathbf{x}_{i-1} + \frac{h_i^2-h_{i+1}^2}{h_i^2}\frac{\alpha_{i+1}}{\alpha_{i}}\mathbf{x}_{i}+\frac{h_{i+1}(h_i+h_{i+1})}{h_i}\alpha_{i+1} \boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i).
```## 👨🏻💻 Run the code
### 1) Get start
* Python 3.8.12
* CUDA 11.7
* NVIDIA A100 40GB PCIe
* Torch 2.0.0
* Torchvision 0.14.0Please follow **[diffusers](https://github.com/huggingface/diffusers)** to install diffusers.
### 2) Run
first, please switch to the root directory.
#### CIFAR10 sampling
```shell
python3 ./scripts/cifar10.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10
```#### CelebA-HQ sampling
```shell
python3 ./scripts/celeba.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10
```#### FID evaluation
```shell
python3 ./scripts/celeba.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10
```#### intrpolation
```shell
python3 ./scripts/interpolate.py --test_num 10 --batch_size 1 --num_inference_steps 100 --save_dir YOUR/SAVE/DIR --model_id xx
```#### Reconstruction error calculation
```shell
python3 ./scripts/reconstruction.py --test_num 10 --num_inference_steps 100 --directory WHERE/YOUR/IMAGES/ARE --sampler_type belm
```#### Image editing
```shell
python3 ./scripts/image_editing.py --num_inference_steps 200 --freeze_step 50 --guidance 2.0 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxxxx/stable-diffusion-v1-5 --ori_im_path images/imagenet_dog_1.jpg --ori_prompt 'A dog' --res_prompt 'A Dalmatian'
```## 🪪 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.## 📝 Citation
If our work assists your research, feel free to give us a star ⭐ or cite us using:
```
@inproceedings{
wang2024belm,
title={{BELM}: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models},
author={Fangyikang Wang and Hubery Yin and Yue-Jiang Dong and Huminhao Zhu and Chao Zhang and Hanbin Zhao and Hui Qian and Chen Li},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=ccQ4fmwLDb}
}
```## 📩 Contact me
My e-mail address:
```
[email protected]
```