https://github.com/shi-labs/prompt-free-diffusion
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, arxiv 2023 / CVPR 2024
https://github.com/shi-labs/prompt-free-diffusion
Last synced: 6 months ago
JSON representation
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, arxiv 2023 / CVPR 2024
- Host: GitHub
- URL: https://github.com/shi-labs/prompt-free-diffusion
- Owner: SHI-Labs
- License: mit
- Created: 2023-05-19T03:23:22.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-11-16T03:42:55.000Z (almost 2 years ago)
- Last Synced: 2025-03-28T01:45:58.387Z (6 months ago)
- Language: Python
- Homepage: https://arxiv.org/abs/2305.16223
- Size: 26.3 MB
- Stars: 746
- Watchers: 11
- Forks: 37
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Prompt-Free Diffusion
[](https://huggingface.co/spaces/shi-labs/Prompt-Free-Diffusion)
[](https://pytorch.org/)
[](https://opensource.org/licenses/MIT)This repo hosts the official implementation of:
[Xingqian Xu](https://ifp-uiuc.github.io/), Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, and [Humphrey Shi](https://www.humphreyshi.com/home), **Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models**, [Paper arXiv Link](https://arxiv.org/abs/2305.16223).
## News
- **[2023.06.20]: SDWebUI plugin is created, repo at this [link](https://github.com/xingqian2018/sd-webui-prompt-free-diffusion)**
- [2023.05.25]: Our demo is running on [HuggingFaceπ€](https://huggingface.co/spaces/shi-labs/Prompt-Free-Diffusion)
- [2023.05.25]: Repo created## Introduction
**Prompt-Free Diffusion** is a diffusion model that relys on only visual inputs to generate new images, handled by **Semantic Context Encoder (SeeCoder)** by substituting the commonly used CLIP-based text encoder. SeeCoder is **reusable to most public T2I models as well as adaptive layers** like ControlNet, LoRA, T2I-Adapter, etc. Just drop in and play!
![]()
## Performance
![]()
## Network
![]()
![]()
## Setup
```
conda create -n prompt-free-diffusion python=3.10
conda activate prompt-free-diffusion
pip install torch==2.0.0+cu117 torchvision==0.15.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt
```## Demo
We provide a WebUI empowered by [Gradio](https://github.com/gradio-app/gradio). Start the WebUI with the following command:
```
python app.py
```## Pretrained models
To support the full functionality of our demo. You need the following models located in these paths:
```
βββ pretrained
βββ pfd
| βββ vae
| β βββ sd-v2-0-base-autokl.pth
| βββ diffuser
| β βββ AbyssOrangeMix-v2.safetensors
| β βββ AbyssOrangeMix-v3.safetensors
| β βββ Anything-v4.safetensors
| β βββ Deliberate-v2-0.safetensors
| β βββ OpenJouney-v4.safetensors
| β βββ RealisticVision-v2-0.safetensors
| β βββ SD-v1-5.safetensors
| βββ seecoder
| βββ seecoder-v1-0.safetensors
| βββ seecoder-pa-v1-0.safetensors
| βββ seecoder-anime-v1-0.safetensors
βββ controlnet
βββ control_sd15_canny_slimmed.safetensors
βββ control_sd15_depth_slimmed.safetensors
βββ control_sd15_hed_slimmed.safetensors
βββ control_sd15_mlsd_slimmed.safetensors
βββ control_sd15_normal_slimmed.safetensors
βββ control_sd15_openpose_slimmed.safetensors
βββ control_sd15_scribble_slimmed.safetensors
βββ control_sd15_seg_slimmed.safetensors
βββ control_v11p_sd15_canny_slimmed.safetensors
βββ control_v11p_sd15_lineart_slimmed.safetensors
βββ control_v11p_sd15_mlsd_slimmed.safetensors
βββ control_v11p_sd15_openpose_slimmed.safetensors
βββ control_v11p_sd15s2_lineart_anime_slimmed.safetensors
βββ control_v11p_sd15_softedge_slimmed.safetensors
βββ preprocess
βββ hed
β βββ ControlNetHED.pth
βββ midas
β βββ dpt_hybrid-midas-501f0c75.pt
βββ mlsd
β βββ mlsd_large_512_fp32.pth
βββ openpose
β βββ body_pose_model.pth
β βββ facenet.pth
β βββ hand_pose_model.pth
βββ pidinet
βββ table5_pidinet.pth
```All models can be downloaded at [HuggingFace link](https://huggingface.co/shi-labs/prompt-free-diffusion).
## Tools
We also provide tools to convert pretrained models from sdwebui and diffuser library to this codebase, please modify the following files:
```
βββ tools
Β Β βββ get_controlnet.py
Β Β βββ model_conversion.pth
```You are expected to do some customized coding to make it work (i.e. changing hardcoded input output file paths)
## Performance Anime
![]()
## Citation
```
@article{xu2023prompt,
title={Prompt-Free Diffusion: Taking" Text" out of Text-to-Image Diffusion Models},
author={Xu, Xingqian and Guo, Jiayi and Wang, Zhangyang and Huang, Gao and Essa, Irfan and Shi, Humphrey},
journal={arXiv preprint arXiv:2305.16223},
year={2023}
}
```## Acknowledgement
Part of the codes reorganizes/reimplements code from the following repositories: [Versatile Diffusion official Github](https://github.com/SHI-Labs/Versatile-Diffusion) and [ControlNet sdwebui Github](https://github.com/Mikubill/sd-webui-controlnet), which are also great influenced by [LDM official Github](https://github.com/CompVis/latent-diffusion) and [DDPM official Github](https://github.com/lucidrains/denoising-diffusion-pytorch)