https://github.com/open-mmlab/styleshot
StyleShot: A SnapShot on Any Style. 一款可以迁移任意风格到任意内容的模型,无需针对图片微调,即能生成高质量的个性风格化图片!
https://github.com/open-mmlab/styleshot
controllable-generation style-transfer text-to-image
Last synced: about 1 month ago
JSON representation
StyleShot: A SnapShot on Any Style. 一款可以迁移任意风格到任意内容的模型,无需针对图片微调,即能生成高质量的个性风格化图片!
- Host: GitHub
- URL: https://github.com/open-mmlab/styleshot
- Owner: open-mmlab
- License: mit
- Created: 2024-07-01T05:08:12.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-02-11T11:00:48.000Z (4 months ago)
- Last Synced: 2025-04-12T04:45:27.754Z (2 months ago)
- Topics: controllable-generation, style-transfer, text-to-image
- Language: Python
- Homepage: https://styleshot.github.io/
- Size: 97.1 MB
- Stars: 372
- Watchers: 4
- Forks: 23
- Open Issues: 26
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ___***StyleShot: A SnapShot on Any Style***___
_**[Junyao Gao](https://jeoyal.github.io/home/), Yanchen Liu, [Yanan Sun](https://scholar.google.com/citations?hl=zh-CN&user=6TA1oPkAAAAJ)‡, Yinhao Tang, [Yanhong Zeng](https://zengyh1900.github.io/), [Kai Chen*](https://chenkai.site/), [Cairong Zhao*](https://vill-lab.github.io/)**_
(* corresponding authors, ‡ project leader)From Tongji University and Shanghai AI lab.
## Abstract
In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning.
We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery.
With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training strategy, and StyleGallery enables the generalization ability.
We further employ a content-fusion encoder to enhance image-driven style transfer.
We highlight that, our approach, named StyleShot, is simple yet effective in mimicking various desired styles, i.e., 3D, flat, abstract or even fine-grained styles, without test-time tuning. Rigorous experiments validate that, StyleShot achieves superior performance across a wide range of styles compared to existing state-of-the-art methods.
## News
- [2024/8/29] 🔥 Thanks to @neverbiasu's contribution. StyleShot is now available on [ComfyUI](https://github.com/neverbiasu/ComfyUI-StyleShot).
- [2024/7/5] 🔥 We release [online demo](https://huggingface.co/spaces/nowsyn/StyleShot) in HuggingFace.
- [2024/7/3] 🔥 We release [StyleShot_lineart](https://huggingface.co/Gaojunyao/StyleShot_lineart), a version taking the lineart of content image as control.
- [2024/7/2] 🔥 We release the [paper](https://arxiv.org/abs/2407.01414).
- [2024/7/1] 🔥 We release the code, [checkpoint](https://huggingface.co/Gaojunyao/StyleShot), [project page](https://styleshot.github.io/) and [online demo](https://openxlab.org.cn/apps/detail/lianchen/StyleShot).## Start
```
# install styleshot
git clone https://github.com/Jeoyal/StyleShot.git
cd StyleShot# create conda env
conda create -n styleshot python==3.8
conda activate styleshot
pip install -r requirements.txt# download the models
git lfs install
git clone https://huggingface.co/Gaojunyao/StyleShot
git clone https://huggingface.co/Gaojunyao/StyleShot_lineart
```## Models
you can download our pretrained weight from [here](https://huggingface.co/Gaojunyao/StyleShot). To run the demo, you should also download the following models:
- [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
- [T2I-Adapter Models](https://huggingface.co/TencentARC)
- [ControlNet models](https://huggingface.co/lllyasviel)
- [CLIP Model](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K)## Inference
For inference, you should download the pretrained weight and prepare your own reference style image or content image.```
# run text-driven style transfer demo
python styleshot_text_driven_demo.py --style "{style_image_path}" --prompt "{prompt}" --output "{save_path}"# run image-driven style transfer demo
python styleshot_image_driven_demo.py --style "{style_image_path}" --content "{content_image_path}" --preprocessor "Contour" --prompt "{prompt}" --output "{save_path}"# integrate styleshot with controlnet and t2i-adapter
python styleshot_t2i-adapter_demo.py --style "{style_image_path}" --condition "{condtion_image_path}" --prompt "{prompt}" --output "{save_path}"
python styleshot_controlnet_demo.py --style "{style_image_path}" --condition "{condtion_image_path}" --prompt "{prompt}" --output "{save_path}"
```- [**styleshot_text_driven_demo**](styleshot_text_driven_demo.py): text-driven style transfer with reference style image and text prompt.
![]()
Text-driven style transfer visualization
- [**styleshot_image_driven_demo**](styleshot_image_driven_demo.py): image-driven style transfer with reference style image and content image.
![]()
Image style transfer visualization
- [**styleshot_controlnet_demo**](styleshot_controlnet_demo.py), [**styleshot_t2i-adapter_demo**](styleshot_t2i-adapter_demo.py): integration with controlnet and t2i-adapter.
## Train
We employ a two-stage training strategy to train our StyleShot for better integration of content and style. For training data, you can use our training dataset [StyleGallery](#style_gallery) or make your own dataset into a json file.```
# training stage-1, only training the style component.
accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \
tutorial_train_styleshot_stage_1.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \
--image_encoder_path="{image_encoder_path}" \
--image_json_file="{data.json}" \
--image_root_path="{image_path}" \
--mixed_precision="fp16" \
--resolution=512 \
--train_batch_size=16 \
--dataloader_num_workers=4 \
--learning_rate=1e-04 \
--weight_decay=0.01 \
--output_dir="{output_dir}" \
--save_steps=10000# training stage-2, only training the content component.
accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \
tutorial_train_styleshot_stage_2.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \
--pretrained_ip_adapter_path="./pretrained_weight/ip.bin" \
--pretrained_style_encoder_path="./pretrained_weight/style_aware_encoder.bin" \
--image_encoder_path="{image_encoder_path}" \
--image_json_file="{data.json}" \
--image_root_path="{image_path}" \
--mixed_precision="fp16" \
--resolution=512 \
--train_batch_size=16 \
--dataloader_num_workers=4 \
--learning_rate=1e-04 \
--weight_decay=0.01 \
--output_dir="{output_dir}" \
--save_steps=10000
```## StyleGallery
We have carefully curated a style-balanced dataset, called **StyleGallery**, with extensive diverse image styles drawn from publicly available datasets for training our StyleShot.
To prepare our dataset StyleGallery, please refer to [tutorial](DATASET.md), or download json file from [here](https://drive.google.com/drive/folders/10T3t58rQKDmYOLschUYj0tzm6zuOngMd?usp=drive_link).## StyleBench
To address the lack of a benchmark in reference-based stylized generation, we establish a style evaluation benchmark containing 40 content images and 73 distinct styles across 490 reference images.## Disclaimer
This project strives to positively impact the domain of AI-driven image generation. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it in a responsible manner. **The developers do not assume any responsibility for potential misuse by users.**
## Citation
If you find StyleShot useful for your research and applications, please cite using this BibTeX:
```bibtex
@article{gao2024styleshot,
title={Styleshot: A snapshot on any style},
author={Gao, Junyao and Liu, Yanchen and Sun, Yanan and Tang, Yinhao and Zeng, Yanhong and Chen, Kai and Zhao, Cairong},
journal={arXiv preprint arXiv:2407.01414},
year={2024}
}
```## Acknowledgements
The code is built upon IP-Adapter.