{"id":13693368,"url":"https://github.com/modelscope/DiffSynth-Studio","last_synced_at":"2025-05-02T21:32:00.423Z","repository":{"id":211375036,"uuid":"728770766","full_name":"modelscope/DiffSynth-Studio","owner":"modelscope","description":"Enjoy the magic of Diffusion models!","archived":false,"fork":false,"pushed_at":"2025-04-22T06:54:16.000Z","size":12616,"stargazers_count":8440,"open_issues_count":225,"forks_count":752,"subscribers_count":77,"default_branch":"main","last_synced_at":"2025-04-23T17:19:35.754Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/modelscope.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-12-07T16:52:15.000Z","updated_at":"2025-04-23T11:53:24.000Z","dependencies_parsed_at":"2023-12-30T14:24:48.834Z","dependency_job_id":"842aca9e-475f-4d7c-8a96-eb8ad0423244","html_url":"https://github.com/modelscope/DiffSynth-Studio","commit_stats":null,"previous_names":["artiprocher/diffsynth-studio","modelscope/diffsynth-studio"],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FDiffSynth-Studio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FDiffSynth-Studio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FDiffSynth-Studio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modelscope%2FDiffSynth-Studio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/modelscope","download_url":"https://codeload.github.com/modelscope/DiffSynth-Studio/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252108916,"owners_count":21696158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T17:01:08.953Z","updated_at":"2025-05-02T21:31:55.412Z","avatar_url":"https://github.com/modelscope.png","language":"Python","readme":"# DiffSynth Studio\n[![PyPI](https://img.shields.io/pypi/v/DiffSynth)](https://pypi.org/project/DiffSynth/)\n[![license](https://img.shields.io/github/license/modelscope/DiffSynth-Studio.svg)](https://github.com/modelscope/DiffSynth-Studio/blob/master/LICENSE)\n[![open issues](https://isitmaintained.com/badge/open/modelscope/DiffSynth-Studio.svg)](https://github.com/modelscope/DiffSynth-Studio/issues)\n[![GitHub pull-requests](https://img.shields.io/github/issues-pr/modelscope/DiffSynth-Studio.svg)](https://GitHub.com/modelscope/DiffSynth-Studio/pull/)\n[![GitHub latest commit](https://badgen.net/github/last-commit/modelscope/DiffSynth-Studio)](https://GitHub.com/modelscope/DiffSynth-Studio/commit/)\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://trendshift.io/repositories/10946\" target=\"_blank\"\u003e\u003cimg src=\"https://trendshift.io/api/badge/repositories/10946\" alt=\"modelscope%2FDiffSynth-Studio | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nDocument: https://diffsynth-studio.readthedocs.io/zh-cn/latest/index.html\n\n## Introduction\n\nDiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!\n\nUntil now, DiffSynth Studio has supported the following models:\n\n* [CogVideoX](https://huggingface.co/THUDM/CogVideoX-5b)\n* [FLUX](https://huggingface.co/black-forest-labs/FLUX.1-dev)\n* [ExVideo](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1)\n* [Kolors](https://huggingface.co/Kwai-Kolors/Kolors)\n* [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium)\n* [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)\n* [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT)\n* [RIFE](https://github.com/hzwer/ECCV2022-RIFE)\n* [ESRGAN](https://github.com/xinntao/ESRGAN)\n* [Ip-Adapter](https://github.com/tencent-ailab/IP-Adapter)\n* [AnimateDiff](https://github.com/guoyww/animatediff/)\n* [ControlNet](https://github.com/lllyasviel/ControlNet)\n* [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)\n* [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5)\n\n## News\n\n- **October 25, 2024** We provide extensive FLUX ControlNet support. This project supports many different ControlNet models that can be freely combined, even if their structures differ. Additionally, ControlNet models are compatible with high-resolution refinement and partition control techniques, enabling very powerful controllable image generation. See [`./examples/ControlNet/`](./examples/ControlNet/).\n\n- **October 8, 2024.** We release the extended LoRA based on CogVideoX-5B and ExVideo. You can download this model from [ModelScope](https://modelscope.cn/models/ECNU-CILab/ExVideo-CogVideoX-LoRA-129f-v1) or [HuggingFace](https://huggingface.co/ECNU-CILab/ExVideo-CogVideoX-LoRA-129f-v1).\n\n- **August 22, 2024.** CogVideoX-5B is supported in this project. See [here](/examples/video_synthesis/). We provide several interesting features for this text-to-video model, including\n  - Text to video\n  - Video editing\n  - Self-upscaling\n  - Video interpolation\n\n- **August 22, 2024.** We have implemented an interesting painter that supports all text-to-image models. Now you can create stunning images using the painter, with assistance from AI!\n  - Use it in our [WebUI](#usage-in-webui).\n\n- **August 21, 2024.** FLUX is supported in DiffSynth-Studio.\n  - Enable CFG and highres-fix to improve visual quality. See [here](/examples/image_synthesis/README.md)\n  - LoRA, ControlNet, and additional models will be available soon.\n\n- **June 21, 2024.** 🔥🔥🔥 We propose ExVideo, a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.\n  - [Project Page](https://ecnu-cilab.github.io/ExVideoProjectPage/)\n  - Source code is released in this repo. See [`examples/ExVideo`](./examples/ExVideo/).\n  - Models are released on [HuggingFace](https://huggingface.co/ECNU-CILab/ExVideo-SVD-128f-v1) and [ModelScope](https://modelscope.cn/models/ECNU-CILab/ExVideo-SVD-128f-v1).\n  - Technical report is released on [arXiv](https://arxiv.org/abs/2406.14130).\n  - You can try ExVideo in this [Demo](https://huggingface.co/spaces/modelscope/ExVideo-SVD-128f-v1)!\n\n- **June 13, 2024.** DiffSynth Studio is transferred to ModelScope. The developers have transitioned from \"I\" to \"we\". Of course, I will still participate in development and maintenance.\n\n- **Jan 29, 2024.** We propose Diffutoon, a fantastic solution for toon shading.\n  - [Project Page](https://ecnu-cilab.github.io/DiffutoonProjectPage/)\n  - The source codes are released in this project.\n  - The technical report (IJCAI 2024) is released on [arXiv](https://arxiv.org/abs/2401.16224).\n\n- **Dec 8, 2023.** We decide to develop a new Project, aiming to release the potential of diffusion models, especially in video synthesis. The development of this project is started.\n\n- **Nov 15, 2023.** We propose FastBlend, a powerful video deflickering algorithm.\n  - The sd-webui extension is released on [GitHub](https://github.com/Artiprocher/sd-webui-fastblend).\n  - Demo videos are shown on Bilibili, including three tasks.\n    - [Video deflickering](https://www.bilibili.com/video/BV1d94y1W7PE)\n    - [Video interpolation](https://www.bilibili.com/video/BV1Lw411m71p)\n    - [Image-driven video rendering](https://www.bilibili.com/video/BV1RB4y1Z7LF)\n  - The technical report is released on [arXiv](https://arxiv.org/abs/2311.09265).\n  - An unofficial ComfyUI extension developed by other users is released on [GitHub](https://github.com/AInseven/ComfyUI-fastblend).\n\n- **Oct 1, 2023.** We release an early version of this project, namely FastSDXL. A try for building a diffusion engine.\n  - The source codes are released on [GitHub](https://github.com/Artiprocher/FastSDXL).\n  - FastSDXL includes a trainable OLSS scheduler for efficiency improvement.\n    - The original repo of OLSS is [here](https://github.com/alibaba/EasyNLP/tree/master/diffusion/olss_scheduler).\n    - The technical report (CIKM 2023) is released on [arXiv](https://arxiv.org/abs/2305.14677).\n    - A demo video is shown on [Bilibili](https://www.bilibili.com/video/BV1w8411y7uj).\n    - Since OLSS requires additional training, we don't implement it in this project.\n\n- **Aug 29, 2023.** We propose DiffSynth, a video synthesis framework.\n  - [Project Page](https://ecnu-cilab.github.io/DiffSynth.github.io/).\n  - The source codes are released in [EasyNLP](https://github.com/alibaba/EasyNLP/tree/master/diffusion/DiffSynth).\n  - The technical report (ECML PKDD 2024) is released on [arXiv](https://arxiv.org/abs/2308.03463).\n\n\n## Installation\n\nInstall from source code (recommended):\n\n```\ngit clone https://github.com/modelscope/DiffSynth-Studio.git\ncd DiffSynth-Studio\npip install -e .\n```\n\nOr install from pypi:\n\n```\npip install diffsynth\n```\n\n## Usage (in Python code)\n\nThe Python examples are in [`examples`](./examples/). We provide an overview here.\n\n### Download Models\n\nDownload the pre-set models. Model IDs can be found in [config file](/diffsynth/configs/model_config.py).\n\n```python\nfrom diffsynth import download_models\n\ndownload_models([\"FLUX.1-dev\", \"Kolors\"])\n```\n\nDownload your own models.\n\n```python\nfrom diffsynth.models.downloader import download_from_huggingface, download_from_modelscope\n\n# From Modelscope (recommended)\ndownload_from_modelscope(\"Kwai-Kolors/Kolors\", \"vae/diffusion_pytorch_model.fp16.bin\", \"models/kolors/Kolors/vae\")\n# From Huggingface\ndownload_from_huggingface(\"Kwai-Kolors/Kolors\", \"vae/diffusion_pytorch_model.fp16.safetensors\", \"models/kolors/Kolors/vae\")\n```\n\n### Video Synthesis\n\n#### Text-to-video using CogVideoX-5B\n\nCogVideoX-5B is released by ZhiPu. We provide an improved pipeline, supporting text-to-video, video editing, self-upscaling and video interpolation. [`examples/video_synthesis`](./examples/video_synthesis/)\n\nThe video on the left is generated using the original text-to-video pipeline, while the video on the right is the result after editing and frame interpolation.\n\nhttps://github.com/user-attachments/assets/26b044c1-4a60-44a4-842f-627ff289d006\n\n#### Long Video Synthesis\n\nWe trained extended video synthesis models, which can generate 128 frames. [`examples/ExVideo`](./examples/ExVideo/)\n\nhttps://github.com/modelscope/DiffSynth-Studio/assets/35051019/d97f6aa9-8064-4b5b-9d49-ed6001bb9acc\n\nhttps://github.com/user-attachments/assets/321ee04b-8c17-479e-8a95-8cbcf21f8d7e\n\n#### Toon Shading\n\nRender realistic videos in a flatten style and enable video editing features. [`examples/Diffutoon`](./examples/Diffutoon/)\n\nhttps://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/b54c05c5-d747-4709-be5e-b39af82404dd\n\nhttps://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/20528af5-5100-474a-8cdc-440b9efdd86c\n\n#### Video Stylization\n\nVideo stylization without video models. [`examples/diffsynth`](./examples/diffsynth/)\n\nhttps://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/59fb2f7b-8de0-4481-b79f-0c3a7361a1ea\n\n### Image Synthesis\n\nGenerate high-resolution images, by breaking the limitation of diffusion models! [`examples/image_synthesis`](./examples/image_synthesis/).\n\nLoRA fine-tuning is supported in [`examples/train`](./examples/train/).\n\n|FLUX|Stable Diffusion 3|\n|-|-|\n|![image_1024_cfg](https://github.com/user-attachments/assets/984561e9-553d-4952-9443-79ce144f379f)|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/4df346db-6f91-420a-b4c1-26e205376098)|\n\n|Kolors|Hunyuan-DiT|\n|-|-|\n|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/53ef6f41-da11-4701-8665-9f64392607bf)|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/60b022c8-df3f-4541-95ab-bf39f2fa8bb5)|\n\n|Stable Diffusion|Stable Diffusion XL|\n|-|-|\n|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/6fc84611-8da6-4a1f-8fee-9a34eba3b4a5)|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/67687748-e738-438c-aee5-96096f09ac90)|\n\n## Usage (in WebUI)\n\nCreate stunning images using the painter, with assistance from AI!\n\nhttps://github.com/user-attachments/assets/95265d21-cdd6-4125-a7cb-9fbcf6ceb7b0\n\n**This video is not rendered in real-time.**\n\nBefore launching the WebUI, please download models to the folder `./models`. See [here](#download-models).\n\n* `Gradio` version\n\n```\npip install gradio\n```\n\n```\npython apps/gradio/DiffSynth_Studio.py\n```\n\n![20240822102002](https://github.com/user-attachments/assets/59613157-de51-4109-99b3-97cbffd88076)\n\n* `Streamlit` version\n\n```\npip install streamlit streamlit-drawable-canvas\n```\n\n```\npython -m streamlit run apps/streamlit/DiffSynth_Studio.py\n```\n\nhttps://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/93085557-73f3-4eee-a205-9829591ef954\n","funding_links":[],"categories":["Python","\u003cspan id=\"video\"\u003eVideo\u003c/span\u003e","Projects","其他_机器视觉","Table of Contents \u003c!-- omit in toc --\u003e","Repos"],"sub_categories":["\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","🎥 Video","网络服务_其他","Open-source Toolboxes and Foundation Models"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2FDiffSynth-Studio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodelscope%2FDiffSynth-Studio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodelscope%2FDiffSynth-Studio/lists"}