https://github.com/divelab/dsearch
https://github.com/divelab/dsearch
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/divelab/dsearch
- Owner: divelab
- Created: 2025-03-03T17:50:00.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-15T21:41:50.000Z (about 1 year ago)
- Last Synced: 2025-06-27T10:43:48.631Z (12 months ago)
- Language: Python
- Size: 39.9 MB
- Stars: 6
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Dynamic Search for Inference-Time Alignment in Diffusion Models (Images)
This code accompanies the paper on Dynamic Search for Inference-Time Alignment in Diffusion Models (DSearch), where the objective is to maximize downstream reward functions in diffusion models. In this implementation, we focus on generating **images** with high scores.
Nottably, our algorithm is **derivative-free, training-free, and fine-tuning-free**.


## Code
### Installation
Create a conda environment following this repo:
https://github.com/mihirp1998/AlignProp/
Then do:
```bash
pip install scikit-learn
```
### Compressibility
We use Stable Diffusion v1.5 as the pre-trained model. We optimize compressibility.
Test the following for DSearch
### DSearch
DSearch
```bash
CUDA_VISIBLE_DEVICES=0 python inference_decoding_nonp.py --reward 'compressibility' --bs 12 --num_images 12 --duplicate_size 5 --variant PM --w 5 --search_schudule exponential --drop_schudule exponential --oversamplerate 5
```
DSearch-R
Notes: choose a good replacerate for diversity/reward trade-off, usually 0.01~0.05, but note that we need bs*replacerate>=1. Generate multiple batches.
```bash
CUDA_VISIBLE_DEVICES=0 python inference_decoding_nonp.py --reward 'compressibility' --bs 34 --num_images 204 --duplicate_size 5 --variant PM --w 5 --search_schudule exponential --replacerate 0.03
```
### Baseline SVDD-PM
```bash
CUDA_VISIBLE_DEVICES=0 python inference_decoding_nonp.py --reward 'compressibility' --bs 12 --num_images 12 --duplicate_size 20 --variant PM
```
Here is the result.


### Aesthetic score
We use Stable Diffusion v1.5 as the pre-trained model. We optimize aesthetic scores.
Run the following command:
```bash
CUDA_VISIBLE_DEVICES=0 python inference_decoding_nonp.py --reward 'aesthetic' --bs 12 --num_images 12 --duplicate_size 5 --variant PM --w 5 --search_schudule exponential --drop_schudule exponential --oversamplerate 5
```
Here is the result.


### Human preference score
We use Stable Diffusion v1.5 as the pre-trained model. We optimize human preference scores.
Run the following command:
```bash
CUDA_VISIBLE_DEVICES=0 python inference_decoding_nonp.py --reward 'hps' --bs 12 --num_images 12 --duplicate_size 5 --variant PM --w 6 --search_schudule exponential --drop_schudule exponential --oversamplerate 5
```
Here is the result.


### Acknowledgement
Our codebase is directly built on top of [RCGDM](https://github.com/Kaffaljidhmah2/RCGDM)
## Reference
If you find this work useful in your research, please cite:
```bibtex
@article{li2025dynamic,
title={Dynamic Search for Inference-Time Alignment in Diffusion Models},
author={Li, Xiner and Uehara, Masatoshi and Su, Xingyu and Scalia, Gabriele and Biancalani, Tommaso and Regev, Aviv and Levine, Sergey and Ji, Shuiwang},
journal={arXiv preprint arXiv:2503.02039},
year={2025}
}
```
## Acknowledgments
This work was supported in part by National Institutes of Health under grant U01AG070112.