Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nahyeonkaty/textboost
TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder
https://github.com/nahyeonkaty/textboost
ai deep-learning diffusion image-generation pytorch stable-diffusion text2image torch txt2img
Last synced: about 1 month ago
JSON representation
TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder
- Host: GitHub
- URL: https://github.com/nahyeonkaty/textboost
- Owner: nahyeonkaty
- License: mit
- Created: 2024-09-12T16:11:27.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-25T05:54:21.000Z (4 months ago)
- Last Synced: 2024-10-06T02:05:57.291Z (4 months ago)
- Topics: ai, deep-learning, diffusion, image-generation, pytorch, stable-diffusion, text2image, torch, txt2img
- Language: Python
- Homepage: https://textboost.github.io
- Size: 10.3 MB
- Stars: 43
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
README
# TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder
[![arXiv](https://img.shields.io/badge/arXiv-2409.08248-B31B1B.svg)](https://arxiv.org/abs/2409.08248)
[![Project page](https://img.shields.io/badge/Project-Page-brightgreen)](https://textboost.github.io)
Abstract: *Recent breakthroughs in text-to-image models have opened up promising research avenues in personalized image generation, enabling users to create diverse images of a specific subject using natural language prompts. However, existing methods often suffer from performance degradation when given only a single reference image. They tend to overfit the input, producing highly similar outputs regardless of the text prompt. This paper addresses the challenge of one-shot personalization by mitigating overfitting, enabling the creation of controllable images through text prompts. Specifically, we propose a selective fine-tuning strategy that focuses on the text encoder. Furthermore, we introduce three key techniques to enhance personalization performance: (1) augmentation tokens to encourage feature disentanglement and alleviate overfitting, (2) a knowledge-preservation loss to reduce language drift and promote generalizability across diverse prompts, and (3) SNR-weighted sampling for efficient training. Extensive experiments demonstrate that our approach efficiently generates high-quality, diverse images using only a single reference image while significantly reducing memory and storage requirements.*
## Installation
Our code has been tested on `python3.10` with `NVIDIA A6000 GPU`. However, it should work with the other recent Python versions and NVIDIA GPUs.
### Installing Python Packages
We recommend using a Python virtual environment or anaconda for managing dependencies. You can install the required packages using one of the following methods:
#### Using `pip`:
```sh
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```#### Using `conda`:
```sh
conda env create -f environment.yml
conda activate textboost
```For the exact package versions we used, please refer to [requirements.txt](requirements.txt) file.
## Training
To get started, you will need to download the human-written prompts dataset. Follow the instructions from [InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix) to download `human-written-prompts.jsonl`, and then place it in the `data` directory.
We used a single image from each instance of [DreamBooth](https://github.com/google/dreambooth) benchmark.
You can find images for each instance in [data/dreambooth_n1.txt](data/dreambooth_n1.txt).
We provided a simple [script](split_dreambooth.py) to help automate this.```sh
git clone https://github.com/google/dreambooth
python split_dreambooth.py --dreambooth-dir dreambooth/dataset
```If not specified, the code will attempt to use a first `n=--num_samples` images in the directory.
**Notice**: Our method was primarily tested using Stable Diffusion v1.5; however, this version is currently unavailable. You can use another version such as v1.4.
To train the model, you can use the following command:
```sh
accelerate launch train_textboost.py \
--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4 \
--instance_data_dir data/dreambooth_n1_train/dog \
--output_dir=output/tb/dog \
--instance_token ' dog' \
--class_token 'dog' \
--validation_prompt 'a dog in the jungle' \
--validation_steps=50 \
--placeholder_token '' \
--initializer_token 'dog' \
--learning_rate=5e-5 \
--emb_learning_rate=1e-3 \
--train_batch_size=8 \
--max_train_steps=250 \
--checkpointing_steps=50 \
--num_samples=1 \
--augment=paug \
--lora_rank=4 \
--augment_inversion
```Alternatively, you can also use `torchrun` command. Here's an example:
```sh
CUDA_VISIBLE_DEVICES=0 torchrun --rdzv-backend=c10d --rdzv-endpoint=localhost:0 --nproc-per-node=1 train_textboost.py \
--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4 \
--instance_data_dir data/dreambooth_n1_train/dog \
--output_dir=output/tb/dog \
--instance_token ' dog' \
--class_token 'dog' \
--validation_prompt 'a dog in the jungle' \
--validation_steps=50 \
--placeholder_token '' \
--initializer_token 'dog' \
--learning_rate=5e-5 \
--emb_learning_rate=1e-3 \
--train_batch_size=8 \
--max_train_steps=250 \
--checkpointing_steps=50 \
--num_samples=1 \
--augment=paug \
--lora_rank=4 \
--augment_inversion
```### Training on All Instances
To train the model on all DreamBooth instances, run the following command:
```sh
python run_textboost_db.py
```## Inference
After training, you can generate images using the following command:
```sh
python inference.py output/tb/dog --model CompVis/stable-diffusion-v1-4 --prompt "photo of a dog" --output test.jpg
```## Evaluation
To evaluate the trained model, ensure that the folder structure follows the format shown below:
```
.
├── output
│ └── tb-sd1.5-n1
│ ├── backpack
│ ├── backpack_dog
│ ...
│ └── wolf_plushie
└── ...
```Once the folder structure is correctly set up, run the following command:
```sh
CUDA_VISIBLE_DEVICES=0 python eval_dreambooth.py output/tb-sd1.5-n1 --token-format ' SUBJECT'
```* Here, `` can be replaced with your own modifier token (e.g. ``).
## Citation
```bibtex
@article{park2024textboost,
title = {TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder},
author = {Park, NaHyeon and Kim, Kunhee and Shim, Hyunjung},
journal = {arXiv preprint},
year = {2024},
eprint = {arXiv:2409.08248}
}
```## License
All materials in this repository are available under the [MIT License](LICENSE).