https://github.com/haoosz/vico
  
  
    Official PyTorch codes for the paper: "ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation" 
    https://github.com/haoosz/vico
  
personalized-generation text-to-image-diffusion
        Last synced: 7 months ago 
        JSON representation
    
Official PyTorch codes for the paper: "ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation"
- Host: GitHub
 - URL: https://github.com/haoosz/vico
 - Owner: haoosz
 - License: mit
 - Created: 2023-06-01T14:51:36.000Z (over 2 years ago)
 - Default Branch: main
 - Last Pushed: 2024-03-20T14:53:58.000Z (over 1 year ago)
 - Last Synced: 2024-10-30T23:36:28.852Z (about 1 year ago)
 - Topics: personalized-generation, text-to-image-diffusion
 - Language: Jupyter Notebook
 - Homepage:
 - Size: 14.1 MB
 - Stars: 238
 - Watchers: 19
 - Forks: 15
 - Open Issues: 11
 - 
            Metadata Files:
            
- Readme: README.md
 - License: LICENSE
 
 
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
 
README
          # ViCo
[](https://arxiv.org/abs/2306.00971)

### [**ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation**](https://arxiv.org/abs/2306.00971)

## ⏳ To Do
- [x] Release inference code
- [x] Release pretrained models
- [x] Release training code
- [ ] Quantitative evaluation code
- [ ] Hugging Face demo 
## ⚙️ Set-up
Create a conda environment `vico` using
```
conda env create -f environment.yaml
conda activate vico
```
## ⏬ Download
Download the [pretrained stable diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt) under `models/ldm/stable-diffusion-v1`.
We provide the pretrained checkpoints at 300, 350, and 400 steps of 8 objects. You can download the [**sample images**](https://drive.google.com/drive/folders/1m8TCsY-C1tIOflHtWnFzTbw2C6dq67mC?usp=sharing) and their corresponding [**pretrained checkpoints**](https://drive.google.com/drive/folders/1I9BJpTLEGueK2hCaR2RKdQrlTrtF24lC?usp=drive_link). You can also download the data of any object:
|  Object   | Sample images | Checkpoints |
|  :----  | :----:  | :----:  |
|  barn  | [image](https://drive.google.com/drive/folders/1bS3QYwzAOnOJcdqUNQ4VSGFnlBN87elT?usp=drive_link) | [ckpt](https://drive.google.com/drive/folders/1EsLeRkPUg7WH-nMCept28pVaX0IPlCGu?usp=drive_link) |
|  batman | [image](https://drive.google.com/drive/folders/1S_UFE9mAgaqWHNxrb2XudnuIyWafSwlv?usp=drive_link) | [ckpt](https://drive.google.com/drive/folders/1elwu9CNtzx_hwK23SbJiSfLkpMtbA66d?usp=drive_link) |
|  clock  | [image](https://drive.google.com/drive/folders/1L4AqVO0o6dapAxjjfSUCVGwd9iB5hIv2?usp=drive_link)  |  [ckpt](https://drive.google.com/drive/folders/1N0E-he1GLH_3c-H1E8204xYzOKU-RT_X?usp=drive_link)  |
|  dog7  | [image](https://drive.google.com/drive/folders/107YOi1qXHnGeDuAaxxe4AW9fj17hehxX?usp=drive_link)   |  [ckpt](https://drive.google.com/drive/folders/1SujoFfOBeKbZI74mFrdCsDIov_5xprHb?usp=drive_link)  |
|  monster toy  |  [image](https://drive.google.com/drive/folders/18nIAXQsG5KaGys2yNJtIuYso2cgZh-2f?usp=drive_link)  |  [ckpt](https://drive.google.com/drive/folders/1EzDjyyya7_zOflOG5rPkxY--R5OxejYx?usp=drive_link)   |
|  pink sunglasses  |  [image](https://drive.google.com/drive/folders/10it3Sd9U1wbkfksMWfFHXeAch6uanEDr?usp=drive_link)  |   [ckpt](https://drive.google.com/drive/folders/1aHnAgM4dpWFsqiNeg3mIX68G6xjfuZ-X?usp=drive_link)   |
|  teddybear  |  [image](https://drive.google.com/drive/folders/1lT8mOSgeh0P8DlfIh34qC2cvk2QaqSBo?usp=drive_link)  |  [ckpt](https://drive.google.com/drive/folders/1630qFd06T2Kz46pb-hs9OA99v3LD44IQ?usp=drive_link)   |
|  wooden pot  |  [image](https://drive.google.com/drive/folders/1eVDMNAfAEroqMV8AiFlBqRGNcElmWw70?usp=drive_link)  |  [ckpt](https://drive.google.com/drive/folders/1kXQuzfSsAJ895gHZJDiFF-5BHoX49gOx?usp=drive_link)    |
Datasets are originally collected and provided by [Textual Inversion](https://github.com/rinongal/textual_inversion), [DreamBooth](https://github.com/google/dreambooth), and [Custom Diffsuion](https://github.com/adobe-research/custom-diffusion). You can find all [datasets](https://drive.google.com/drive/folders/1o3iTN5P6PX-DK3Ql_wSdH-swVvGYIG9I?usp=sharing) used for quantitaive comparison in our paper.
## 🚀 Inference
Before running the inference command, please set:  
- `REF_IMAGE_PATH`: Path of **the reference image**. It can be any image in the samples like `batman/1.jpg`.
- `CHECKPOINT_PATH`: Path of **the checkpoint weight**. Its 
subfolder should be similar to `checkpoints/*-399.pt`.
- `OUTPUT_PATH`: Path of **the generated images**. For example, it can be like `outputs/batman`.
```
python scripts/vico_txt2img.py \
--ddim_eta 0.0  --n_samples 4  --n_iter 2  --scale 7.5  --ddim_steps 50  \
--ckpt_path models/ldm/stable-diffusion-v1/sd-v1-4.ckpt  \
--image_path REF_IMAGE_PATH \
--ft_path CHECKPOINT_PATH \
--load_step 399 \
--prompt "a photo of * on the beach" \
--outdir OUTPUT_PATH
```
You can specify `load_step` (300,350,400) and personalize `prompt` (a prefix "a photo of" usually makes better results).
## 💻 Training
Before running the training command, please set:
- `RUN_NAME`: Your run name. Will be the name of the folder of logs.
- `GPUS_USED`: GPUs you are using, e.g., "0,1,2,3". (4 RTX 3090 GPUs in my case)
- `TRAIN_DATA_ROOT`: Path of your **training images**.
- `INIT_WORD`: Initialize the word to represent your unique object, e.g., "dog" and "toy".
```
python main.py \
--base configs/stable-diffusion/v1-finetune.yaml -t  \
--actual_resume models/ldm/stable-diffusion-v1/sd-v1-4.ckpt  \
-n RUN_NAME \
--gpus  GPUS_USED \
--data_root TRAIN_DATA_ROOT \
--init_word INIT_WORD
```
## 📖 Citation
If you use this code in your research, please consider citing our paper:
```bibtex
@inproceedings{Hao2023ViCo,
  title={ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation},
  author={Shaozhe Hao and Kai Han and Shihao Zhao and Kwan-Yee K. Wong},
  year={2023}
}
```
## 💐 Acknowledgements
This code repository is based on the great work of [Textual Inversion](https://github.com/rinongal/textual_inversion). Thanks!