{"id":14057484,"url":"https://github.com/bghira/SimpleTuner","last_synced_at":"2025-07-29T03:31:23.671Z","repository":{"id":170406643,"uuid":"646537482","full_name":"bghira/SimpleTuner","owner":"bghira","description":"A general fine-tuning kit geared toward diffusion models.","archived":false,"fork":false,"pushed_at":"2025-07-01T19:00:23.000Z","size":11929,"stargazers_count":2418,"open_issues_count":10,"forks_count":227,"subscribers_count":21,"default_branch":"main","last_synced_at":"2025-07-01T19:42:36.536Z","etag":null,"topics":["diffusers","diffusion-models","fine-tuning","flux-dev","machine-learning","stable-diffusion"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bghira.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-05-28T18:01:02.000Z","updated_at":"2025-07-01T19:00:23.000Z","dependencies_parsed_at":"2024-04-24T02:59:05.284Z","dependency_job_id":"f243ce43-4ed8-43fc-acae-c91a3c23621b","html_url":"https://github.com/bghira/SimpleTuner","commit_stats":null,"previous_names":["bghira/simpletuner"],"tags_count":128,"template":false,"template_full_name":null,"purl":"pkg:github/bghira/SimpleTuner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bghira%2FSimpleTuner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bghira%2FSimpleTuner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bghira%2FSimpleTuner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bghira%2FSimpleTuner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bghira","download_url":"https://codeload.github.com/bghira/SimpleTuner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bghira%2FSimpleTuner/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267622614,"owners_count":24117018,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusers","diffusion-models","fine-tuning","flux-dev","machine-learning","stable-diffusion"],"created_at":"2024-08-13T02:00:47.709Z","updated_at":"2025-07-29T03:31:23.632Z","avatar_url":"https://github.com/bghira.png","language":"Python","readme":"# SimpleTuner 💹\n\n\u003e ℹ️ No data is sent to any third parties except through opt-in flag `report_to`, `push_to_hub`, or webhooks which must be manually configured.\n\n**SimpleTuner** is geared towards simplicity, with a focus on making the code easily understood. This codebase serves as a shared academic exercise, and contributions are welcome.\n\nIf you'd like to join our community, we can be found [on Discord](https://discord.com/invite/eq3cAMZtCC) via Terminus Research Group.\nIf you have any questions, please feel free to reach out to us there.\n\n## Table of Contents\n\n- [Design Philosophy](#design-philosophy)\n- [Tutorial](#tutorial)\n- [Features](#features)\n  - [Flux](#flux1)\n  - [Wan 2.1 Video](#wan-video)\n  - [LTX Video](#ltx-video)\n  - [PixArt Sigma](#pixart-sigma)\n  - [NVLabs Sana](#nvlabs-sana)\n  - [Stable Diffusion 2.0/2.1](#stable-diffusion-20--21)\n  - [Stable Diffusion 3.0](#stable-diffusion-3)\n  - [Kwai Kolors](#kwai-kolors)\n  - [Lumina2](#lumina2)\n- [Hardware Requirements](#hardware-requirements)\n  - [Flux](#flux1-dev-schnell)\n  - [SDXL](#sdxl-1024px)\n  - [Stable Diffusion (Legacy)](#stable-diffusion-2x-768px)\n- [Scripts](#scripts)\n- [Toolkit](#toolkit)\n- [Setup](#setup)\n- [Troubleshooting](#troubleshooting)\n\n## Design Philosophy\n\n- **Simplicity**: Aiming to have good default settings for most use cases, so less tinkering is required.\n- **Versatility**: Designed to handle a wide range of image quantities - from small datasets to extensive collections.\n- **Cutting-Edge Features**: Only incorporates features that have proven efficacy, avoiding the addition of untested options.\n\n## Tutorial\n\nPlease fully explore this README before embarking on [the tutorial](/TUTORIAL.md), as it contains vital information that you might need to know first.\n\nFor a quick start without reading the full documentation, you can use the [Quick Start](/documentation/QUICKSTART.md) guide.\n\nFor memory-constrained systems, see the [DeepSpeed document](/documentation/DEEPSPEED.md) which explains how to use 🤗Accelerate to configure Microsoft's DeepSpeed for optimiser state offload.\n\nFor multi-node distributed training, [this guide](/documentation/DISTRIBUTED.md) will help tweak the configurations from the INSTALL and Quickstart guides to be suitable for multi-node training, and optimising for image datasets numbering in the billions of samples.\n\n---\n\n## Features\n\n- Multi-GPU training\n- New token-wise dropout techniques like [TREAD](/documentation/TREAD.md) for speeding up Flux training, including Kontext\n- Image, video, and caption features (embeds) are cached to the hard drive in advance, so that training runs faster and with less memory consumption\n- Aspect bucketing: support for a variety of image/video sizes and aspect ratios, enabling widescreen and portrait training.\n- Refiner LoRA or full u-net training for SDXL\n- Most models are trainable on a 24G GPU, or even down to 16G at lower base resolutions.\n  - LoRA/LyCORIS training for PixArt, SDXL, SD3, and SD 2.x that uses less than 16G VRAM\n- DeepSpeed integration allowing for [training SDXL's full u-net on 12G of VRAM](/documentation/DEEPSPEED.md), albeit very slowly.\n- Quantised NF4/INT8/FP8 LoRA training, using low-precision base model to reduce VRAM consumption.\n- Optional EMA (Exponential moving average) weight network to counteract model overfitting and improve training stability.\n- Train directly from an S3-compatible storage provider, eliminating the requirement for expensive local storage. (Tested with Cloudflare R2 and Wasabi S3)\n- For SDXL, SD 1.x/2.x, and Flux, full or LoRA based [ControlNet model training](/documentation/CONTROLNET.md) (not ControlLite)\n- Training [Mixture of Experts](/documentation/MIXTURE_OF_EXPERTS.md) for lightweight, high-quality diffusion models\n- [Masked loss training](/documentation/DREAMBOOTH.md#masked-loss) for superior convergence and reduced overfitting on any model\n- Strong [prior regularisation](/documentation/DATALOADER.md#is_regularisation_data) training support for LyCORIS models\n- Webhook support for updating eg. Discord channels with your training progress, validations, and errors\n- Integration with the [Hugging Face Hub](https://huggingface.co) for seamless model upload and nice automatically-generated model cards.\n  - Use the [datasets library](/documentation/data_presets/preset_subjects200k.md) ([more info](/documentation/HUGGINGFACE_DATASETS.md)) to load compatible datasets directly from the hub\n\n### HiDream\n\nFull training support for HiDream is included:\n\n- Custom ControlNet implementation for training via full-rank, LoRA or Lycoris\n- Memory-efficient training for NVIDIA GPUs (AMD support is planned)\n- Dev and Full both functioning and trainable. Fast is untested.\n- Optional MoEGate loss augmentation\n- Lycoris or full tuning via DeepSpeed ZeRO on a single GPU\n- Quantise the base model using `--base_model_precision` to `int8-quanto` or `fp8-quanto` for major memory savings\n- Quantise Llama LLM using `--text_encoder_4_precision` set to `int4-quanto` or `int8-quanto` to run on 24G cards.\n\nSee [hardware requirements](#hidream) or the [quickstart guide](/documentation/quickstart/HIDREAM.md).\n\n### Flux.1\n\nFull training support for Flux.1 is included:\n\n- Double the training speed of Flux.1 with the new `--fuse_qkv_projections` option, taking advantage of Flash Attention 3 on Hopper systems\n- ControlNet training via full-rank, LoRA or Lycoris\n- Instruct fine-tuning for the Kontext \\[dev] editing model implementation generously provided by [Runware](https://runware.ai).\n- Classifier-free guidance training\n  - Leave it disabled and preserve the dev model's distillation qualities\n  - Or, reintroduce CFG to the model and improve its creativity at the cost of inference speed and training time.\n- (optional) T5 attention masked training for superior fine details and generalisation capabilities\n- LoRA or full tuning via DeepSpeed ZeRO on a single GPU\n- Quantise the base model using `--base_model_precision` to `int8-quanto` or `fp8-torchao` for major memory savings\n\nSee [hardware requirements](#flux1-dev-schnell) or the [quickstart guide](/documentation/quickstart/FLUX.md).\n\n### Wan Video\n\nSimpleTuner has preliminary training integration for Wan 2.1 which has a 14B and 1.3B type, both of which work.\n\n- Text to Video training is supported.\n- Image to Video training is not yet supported.\n- Text encoder training is not supported.\n- VAE training is not supported.\n- LyCORIS, PEFT, and full tuning all work as expected\n- ControlNet training is not yet supported\n\nSee the [Wan Video Quickstart](/documentation/quickstart/WAN.md) guide to start training.\n\n### LTX Video\n\nSimpleTuner has preliminary training integration for LTX Video, efficiently training on less than 16G.\n\n- Text encoder training is not supported\n- VAE training is not supported\n- LyCORIS, PEFT, and full tuning all work as expected\n- ControlNet training is not yet supported\n\nSee the [LTX Video Quickstart](/documentation/quickstart/LTXVIDEO.md) guide to start training.\n\n### PixArt Sigma\n\nSimpleTuner has extensive training integration with PixArt Sigma - both the 600M \u0026 900M models load without modification.\n\n- Text encoder training is not supported\n- LyCORIS and full tuning both work as expected\n- ControlNet training is supported for full and PEFT LoRA training\n- [Two-stage PixArt](https://huggingface.co/ptx0/pixart-900m-1024-ft-v0.7-stage1) training support (see: [MIXTURE_OF_EXPERTS](/documentation/MIXTURE_OF_EXPERTS.md))\n\nSee the [PixArt Quickstart](/documentation/quickstart/SIGMA.md) guide to start training.\n\n### NVLabs Sana\n\nSimpleTuner has extensive training integration with NVLabs Sana.\n\nThis is a lightweight, fun, and fast model that makes getting into model training highly accessible to a wider audience.\n\n- LyCORIS and full tuning both work as expected.\n- Text encoder training is not supported.\n- PEFT Standard LoRA is not supported.\n- ControlNet training is not yet supported\n\nSee the [NVLabs Sana Quickstart](/documentation/quickstart/SANA.md) guide to start training.\n\n### Stable Diffusion 3\n\n- LoRA and full finetuning are supported as usual.\n- ControlNet training via full-rank, PEFT LoRA, or Lycoris\n- Certain features such as segmented timestep selection and Compel long prompt weighting are not yet supported.\n- Parameters have been optimised to get the best results, validated through from-scratch training of SD3 models\n\nSee the [Stable Diffusion 3 Quickstart](/documentation/quickstart/SD3.md) to get going.\n\n### Kwai Kolors\n\nAn SDXL-based model with ChatGLM (General Language Model) 6B as its text encoder, **doubling** the hidden dimension size and substantially increasing the level of local detail included in the prompt embeds.\n\nKolors support is almost as deep as SDXL, minus ControlNet training support.\n\n\n### Lumina2\n\nA 2B parameter flow-matching model that uses the 16ch Flux VAE.\n\n- LoRA, Lycoris, and full finetuning are supported\n- ControlNet training is not yet supported\n\nA [Lumina2 Quickstart](/documentation/quickstart/LUMINA2.md) is available with example configurations.\n\n### Cosmos2 Predict (Image)\n\nA 2B / 14B parameter model that can do video as well as text-to-image.\n\n- Currently, only the text-to-image variant is supported.\n- Lycoris or full-rank tuning are supported, but PEFT LoRAs are currently not.\n- ControlNet training is not yet supported.\n\nA [Cosmos2 Predict Quickstart](/documentation/quickstart/COSMOS2IMAGE.md) is available with full example configuration and dataset.\n\n### Legacy Stable Diffusion models\n\nRunwayML's SD 1.5 and StabilityAI's SD 2.x are both trainable under the `legacy` designation.\n\n---\n\n## Hardware Requirements\n\n### NVIDIA\n\nPretty much anything 3080 and up is a safe bet. YMMV.\n\n### AMD\n\nLoRA and full-rank tuning are verified working on a 7900 XTX 24GB and MI300X.\n\nLacking `xformers`, it will use more memory than Nvidia equivalent hardware.\n\n### Apple\n\nLoRA and full-rank tuning are tested to work on an M3 Max with 128G memory, taking about **12G** of \"Wired\" memory and **4G** of system memory for SDXL.\n  - You likely need a 24G or greater machine for machine learning with M-series hardware due to the lack of memory-efficient attention.\n  - Subscribing to Pytorch issues for MPS is probably a good idea, as random bugs will make training stop working.\n\n### HiDream [dev, full]\n\n- A100-80G (Full tune with DeepSpeed)\n- A100-40G (LoRA, LoKr)\n- 3090 24G (LoRA, LoKr)\n\nHiDream has not been tested on 16G cards, but with aggressive quantisation and pre-caching of embeds, you might make it work, though even 24G is pushing limits.\n\n\n### Flux.1 [dev, schnell]\n\n- A100-80G (Full tune with DeepSpeed)\n- A100-40G (LoRA, LoKr)\n- 3090 24G (LoRA, LoKr)\n- 4060 Ti 16G, 4070 Ti 16G, 3080 16G (int8, LoRA, LoKr)\n- 4070 Super 12G, 3080 10G, 3060 12GB (nf4, LoRA, LoKr)\n\nFlux prefers being trained with multiple large GPUs but a single 16G card should be able to do it with quantisation of the transformer and text encoders.\n\nKontext requires a bit beefier compute and memory allocation; a 4090 will go from ~3 to ~6 seconds per step when it is enabled.\n\n### Auraflow\n\n- A100-80G (Full tune with DeepSpeed)\n- A100-40G (LoRA, LoKr)\n- 3090 24G (LoRA, LoKr)\n- 4060 Ti 16G, 4070 Ti 16G, 3080 16G (int8, LoRA, LoKr)\n- 4070 Super 12G, 3080 10G, 3060 12GB (nf4, LoRA, LoKr)\n\n### SDXL, 1024px\n\n- A100-80G (EMA, large batches, LoRA @ insane batch sizes)\n- A6000-48G (EMA@768px, no EMA@1024px, LoRA @ high batch sizes)\n- A100-40G (EMA@1024px, EMA@768px, EMA@512px, LoRA @ high batch sizes)\n- 4090-24G (EMA@1024px, batch size 1-4, LoRA @ medium-high batch sizes)\n- 4080-12G (LoRA @ low-medium batch sizes)\n\n### Stable Diffusion 2.x, 768px\n\n- 16G or better\n\n\n## Toolkit\n\nFor more information about the associated toolkit distributed with SimpleTuner, refer to [the toolkit documentation](/toolkit/README.md).\n\n## Setup\n\nDetailed setup information is available in the [installation documentation](/INSTALL.md).\n\n## Troubleshooting\n\nEnable debug logs for a more detailed insight by adding `export SIMPLETUNER_LOG_LEVEL=DEBUG` to your environment (`config/config.env`) file.\n\nFor performance analysis of the training loop, setting `SIMPLETUNER_TRAINING_LOOP_LOG_LEVEL=DEBUG` will have timestamps that highlight any issues in your configuration.\n\nFor a comprehensive list of options available, consult [this documentation](/OPTIONS.md).\n","funding_links":[],"categories":["Repos","图像生成"],"sub_categories":["资源传输下载"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbghira%2FSimpleTuner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbghira%2FSimpleTuner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbghira%2FSimpleTuner/lists"}