{"id":17714311,"url":"https://idkiro.github.io/sdxs/","last_synced_at":"2025-03-13T22:32:21.453Z","repository":{"id":229614397,"uuid":"775830979","full_name":"IDKiro/sdxs","owner":"IDKiro","description":"Official repo of our paper \"SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions\"","archived":false,"fork":false,"pushed_at":"2024-05-27T02:56:08.000Z","size":11120,"stargazers_count":608,"open_issues_count":10,"forks_count":23,"subscribers_count":24,"default_branch":"main","last_synced_at":"2024-11-14T13:35:52.931Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IDKiro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-22T06:03:41.000Z","updated_at":"2024-11-13T10:16:58.000Z","dependencies_parsed_at":"2024-03-28T16:04:39.661Z","dependency_job_id":"a8e77823-e060-4a73-8cab-91975d9cc447","html_url":"https://github.com/IDKiro/sdxs","commit_stats":null,"previous_names":["idkiro/sdxs"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDKiro%2Fsdxs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDKiro%2Fsdxs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDKiro%2Fsdxs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDKiro%2Fsdxs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IDKiro","download_url":"https://codeload.github.com/IDKiro/sdxs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243494518,"owners_count":20299827,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-25T11:02:21.193Z","updated_at":"2025-03-13T22:32:21.447Z","avatar_url":"https://github.com/IDKiro.png","language":"Python","funding_links":[],"categories":["Accelerate"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n## SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions\n\n[![Project](https://img.shields.io/badge/Home-Project-green?logo=Houzz\u0026logoColor=white)](https://idkiro.github.io/sdxs)\n[![Paper](https://img.shields.io/badge/arxiv-Paper-blue?logo=arxiv)](https://arxiv.org/abs/2403.16627) \n[![SDXS-512-0.9](https://img.shields.io/badge/🤗Model-512--0.9-gold)](https://huggingface.co/IDKiro/sdxs-512-0.9)\n[![SDXS-512-DreamShaper](https://img.shields.io/badge/🤗Model-512--DreamShaper-gold)](https://huggingface.co/IDKiro/sdxs-512-dreamshaper)\n[![SDXS-512-DreamShaper-Anime](https://img.shields.io/badge/🤗Model-512--DreamShaper--Anime-gold)](https://huggingface.co/IDKiro/sdxs-512-dreamshaper-anime)\n[![SDXS-512-DreamShaper-Sketch](https://img.shields.io/badge/🤗Model-512--DreamShaper--Sketch-gold)](https://huggingface.co/IDKiro/sdxs-512-dreamshaper-sketch)\n[![SDXS-512-DreamShaper-Demo](https://img.shields.io/badge/🤗Demo-Text2Image-pink)](https://huggingface.co/spaces/IDKiro/SDXS-512-DreamShaper)\n[![SDXS-512-DreamShaper-Anime-Demo](https://img.shields.io/badge/🤗Demo-Text2Image--Anime-pink)](https://huggingface.co/spaces/IDKiro/SDXS-512-DreamShaper-Anime)\n[![SDXS-512-DreamShaper-Sketch-Demo](https://img.shields.io/badge/🤗Demo-Sketch2Image-pink)](https://huggingface.co/spaces/IDKiro/SDXS-512-DreamShaper-Sketch)\n\n\n*Yuda Song, Zehao Sun, Xuanwu Yin*\n\n\u003c/div\u003e\n\nWe present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately \u003cb\u003e100 FPS\u003c/b\u003e (30x faster than SD v1.5) and \u003cb\u003e30 FPS\u003c/b\u003e (60x faster than SDXL) on a single GPU. Assuming the image generation time is limited to \u003cb\u003e1 second\u003c/b\u003e, then SDXL can only use 16 NFEs to produce a slightly blurry image, while SDXS-1024 can generate 30 clear images. \n\n![](images/intro.png)\n\nMoreover, our proposed method can also train ControlNet, offering promising applications in image-conditioned control and facilitating efficient image-to-image translation.\n\n\u003cp align=\"left\" \u003e\n\u003cimg src=\"images\\sketch.gif\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n## 🔥News\n\n- **April 11, 2024:** [SDXS-512-DreamShaper-Anime](https://huggingface.co/IDKiro/sdxs-512-dreamshaper-anime) is released. We also create some Gradio demo on Hugging Face.\n- **April 10, 2024:** [SDXS-512-DreamShaper](https://huggingface.co/IDKiro/sdxs-512-dreamshaper) and [SDXS-512-DreamShaper-Sketch](https://huggingface.co/IDKiro/sdxs-512-dreamshaper-sketch) are released. We also upload our demo code.\n- **March 25, 2024:** [SDXS-512-0.9](https://huggingface.co/IDKiro/sdxs-512-0.9) is released, it is an old version of SDXS-512.\n\n## ⚡️Demo\n\nCreate a new environment:\n\n```sh\nconda create -n sdxs\n```\n\nActivate the new environment:\n\n```sh\nconda activate sdxs\n```\n\nInstall requirements:\n\n```sh\nconda install python=3.10 pytorch=2.2.1 torchvision torchaudio pytorch-cuda=11.8 xformers=0.0.25 -c pytorch -c nvidia -c xformers\npip install -r requirements.txt\n```\n\nRun text-to-image demo:\n\n```sh\npython demo.py\n```\n\nRun anime-style text-to-image (LoRA) demo:\n\n```sh\npython demo_anime.py\n```\n\nRun sketch-to-image (ControlNet) demo:\n\n```sh\npython demo_sketch.py\n```\n\n## 💡Train\n\nI found that [DMD2](https://github.com/tianweiy/DMD2) release the training code, and its training scheme is identical to the new version of SDXS, so you can refer to it. \nUnfortunately, the SDXS training code is not allowed to be open-sourced and will most likely not be updated again.\n\n## ✒️Method\n\n### Model Acceleration\n\nWe train an extremely light-weight image decoder to mimic the original VAE decoder’s output through a combination of output distillation loss and GAN loss. We also leverage the block removal distillation strategy to efficiently transfer the knowledge from the original U-Net to a more compact version.\n\n![](images/method1.png)\n\nSDXS demonstrates efficiency far surpassing that of the base models, even achieving image generation at 100 FPS for 512x512 images and 30 FPS for 1024x1024 images on the GPU.\n\n![](images/speed.png)\n\n### Text-to-Image\n\nTo reduce the NFEs, we suggest straightening the sampling trajectory and quickly finetuning the multi-step model into a one-step model by replacing the distillation loss function with the proposed feature matching loss. Then, we extend the Diff-Instruct training strategy, using the gradient of the proposed feature matching loss to replace the gradient provided by score distillation in the latter half of the timestep.\n\n![](images/method2.png)\n\nDespite a noticeable downsizing in both the sizes of the models and the number of sampling steps required, the prompt-following capability of SDXS-512 remains superior to that of SD v1.5. This observation is consistently validated in the performance of SDXS-1024 as well.  \n\n![](images/imgs.png)\n\n### Image-to-Image\n\nWe extend our proposed training strategy to the training of ControlNet, relying on adding the pretrained ControlNet to the score function. \n\n![](images/method3.png)\n\nWe demonstrate its efficacy in facilitating image-to-image conversions utilizing ControlNet, specifically for transformations involving canny edges and depth maps.\n\n![](images/control_imgs.png)\n\n\n## Citation\n\nIf you find this work useful for your research, please cite our paper:\n\n```bibtex\n@article{song2024sdxs,\n  author    = {Yuda Song, Zehao Sun, Xuanwu Yin},\n  title     = {SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions},\n  journal   = {arxiv},\n  year      = {2024},\n}\n```\n\n**Acknowledgment**: the demo code is based on https://github.com/GaParmar/img2img-turbo.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/idkiro.github.io%2Fsdxs%2F","html_url":"https://awesome.ecosyste.ms/projects/idkiro.github.io%2Fsdxs%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/idkiro.github.io%2Fsdxs%2F/lists"}