{"id":29009875,"url":"https://github.com/tencentarc/fluxkits","last_synced_at":"2025-06-25T15:33:34.422Z","repository":{"id":265076111,"uuid":"892505408","full_name":"TencentARC/FluxKits","owner":"TencentARC","description":null,"archived":false,"fork":false,"pushed_at":"2024-11-27T13:51:09.000Z","size":3232,"stargazers_count":12,"open_issues_count":0,"forks_count":0,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-11-27T14:44:10.299Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TencentARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-22T08:38:30.000Z","updated_at":"2024-11-27T13:51:13.000Z","dependencies_parsed_at":"2024-11-27T14:54:14.724Z","dependency_job_id":null,"html_url":"https://github.com/TencentARC/FluxKits","commit_stats":null,"previous_names":["tencentarc/fluxkits"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TencentARC/FluxKits","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FFluxKits","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FFluxKits/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FFluxKits/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FFluxKits/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TencentARC","download_url":"https://codeload.github.com/TencentARC/FluxKits/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FFluxKits/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261901405,"owners_count":23227593,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-25T15:33:24.209Z","updated_at":"2025-06-25T15:33:34.412Z","avatar_url":"https://github.com/TencentARC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003cdiv align='center'\u003e🌟 FluxKits \u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n\n[![Static Badge](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/TencentARC/flux-mini)\n[![Static Badge](https://img.shields.io/badge/%F0%9F%A4%97%20Gradio%20Demo-Huggingface-orange)](https://huggingface.co/spaces/TencentARC/Flux-Mini)\n\n\u003c/div\u003e\n\nWe present FluxKits, a repo that facilitate the usage of Flux series models. It consists of the following two parts:\n\n\u003e [**Flux-mini**](./flux-mini/): A 3.2B MMDiT model distilled from Flux-dev.\n\n\u003e [**Flux-NPU**](./flux-npu/): A tool base that helps you to run your flux on NPUs.\n\n\nImages generated with Flux-Mini:\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"assets/flux_distill-flux-mini-teaser.jpg\" width=\"800\" alt=\"Teaser image\"\u003e\n\u003c/div\u003e\n\n\n## 🔥 Flux-Mini\n\nNowadays, text-to-image (T2I) models are growing stronger but larger, which limits their pratical applicability, especially on consumer-level devices. To bridge this gap, we distilled the **12B** `Flux-dev` model into a **3.2B** `Flux-mini` model, and trying to preserve its strong image generation capabilities. Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`. The pruned model is further tuned with denosing and feature alignment objectives on a curated image-text dataset.\n\n🔥🔥 Nonetheless, with limited computing and data resources, the capability of our Flux-mini is still limited in certain domains. To facilitate the development of flux-based models, we open-sourced the codes to distill Flux in [this folder](./flux-npu/). **We appeal people interested in this project to collaborate together to build a more applicable and powerful text-to-image model!**\n\n\n### ⏰ Timeline\n\n**[2024.11.26]** We are delighted to release the first version of Flux-Mini!\n\n\n### ⚡️ Efficiency Comparison\nWe compared our Flux-Mini with Flux-Dev on `a single H20 GPU` with `BF16` precision, with `batch-size=1`, `deepspeed stage=2`, `gradient_checkpoint=True`. For inference, we adopt `num_steps=50`. The costs of T5, CLIP and VAE are included. `OOM` means out-of-memory.\n\n\n|  Resolution  | Training Strategy | Model | Training Speed (s/img) | Training Memory (GB) | Inference Speed (s/img) | Inference Memory (GB) |\n|-------|------|---------|---------|---------|---------|---------|\n| 512 | LoRA(r=16) | Flux-dev | 1.10 | 35.91 | 11.11 | 35.11 |\n| 512 | LoRA(r=16) | Flux-Mini | 0.33 | 19.06 | 3.07 | 18.49 | \n| 512 | Fully Finetune | Flux-dev | OOM | OOM | 11.11 | 35.11 | \n| 512 | Fully Finetune | Flux-Mini | 0.57 | 83.7 | 3.07 | 18.49 | \n| 1024 | LoRA(r=16) | Flux-dev | 2.93 | 38.03 | 38.26 | 42.24 |\n| 1024 | LoRA(r=16) | Flux-Mini | 1.05 | 22.21 | 10.31 | 25.61 |\n| 1024 | Fully Finetune | Flux-dev | OOM | OOM | 38.26 | 42.24 |\n| 1024 | Fully Finetune | Flux-Mini | 1.30 | 83.71 | 10.31 | 25.61 |\n\n\n### ⛅ Limitations\nCompared with advanced text-to-image models, our model was trained with limited computing resources and synthetic data with mediocre quality. \nThus, the generation capability of our model is still limited in certain categories.\n\nThe current model is ok with generating common images such as human/animal faces, landscapes, fantasy and abstract scenes.  \nUnfortunately, it is still incompetent in many scenarios. Including but not limited to:\n* Fine-grained details, such as human and animal structures\n* Typography \n* Perspective and Geometric Structure\n* Dynamics and Motion\n* Commonsense knowledge, e.g., brand logo\n* Physical Plausibility\n* Cultural Diversity\n\nSince our model is trained with prompts in JourneyDB, we encourage users to apply our model with **similar prompt formats** (compositions of nouns and adjectives) to achieve the best quality. \nFor example: \"profile of sad Socrates, full body, high detail, dramatic scene, Epic dynamic action, wide angle, cinematic, hyper-realistic, concept art, warm muted tones as painted by Bernie Wrightson, Frank Frazetta.\"\n\n\nWe welcome everyone in the community of collaborate and PR for this model.\n\n\n## 💻 Flux-NPU\n\nThe widespread development of NPUs has provided extra device options for model training and inference. To facilitate the usage of flux, We provide a codebase that could run the training and inference code of FLUX on NPUs. \n\nPlease find out more details in [this folder](./flux-npu).   \n\n### ⚡️ Efficiency Comparison on NPU.\nWe compared our Flux-Mini with Flux-Dev on a single `Ascend 910B NPU` with `BF16` precision, with `batch-size=1`, `deepspeed stage=2`, `gradient_checkpoint=True`. For inference, we adopt `num_steps=50`. The costs of T5, CLIP and VAE are included. `OOM` means out-of-memory.\n\n\n|  Resolution  | Training Strategy | Model | Training Speed (s/img) | Training Memory (GB) | Inference Speed (s/img) | Inference Memory (GB) |\n|-------|------|---------|---------|---------|---------|---------|\n| 512 | LoRA(r=16) | Flux-dev | 1.07 | 38.45 | 11.00 | 58.62 |\n| 512 | LoRA(r=16) | Flux-Mini | 0.37 | 20.64 | 3.26 | 19.48 | \n| 512 | Fully Finetune | Flux-dev | OOM | OOM | 11.00 | 58.62 | \n| 512 | Fully Finetune | Flux-Mini | OOM | OOM | 3.26 | 19.48 | \n| 1024 | LoRA(r=16) | Flux-dev | 3.01 | 44.69 | OOM | OOM |\n| 1024 | LoRA(r=16) | Flux-Mini | 1.06 | 25.84 | 10.60 | 27.76 |\n| 1024 | Fully Finetune | Flux-dev | OOM | OOM | OOM | OOM |\n| 1024 | Fully Finetune | Flux-Mini | OOM | OOM | 10.60 | 27.76 |\n\n## 🐾 Disclaimer\nUsers are granted the freedom to create images using our model and tools, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.\n\n## 👍 Acknowledgements\nWe thank the authors of the following repos for their excellent contribution!\n\n- [Flux](https://github.com/black-forest-labs/flux)\n- [x-flux](https://github.com/XLabs-AI/x-flux)\n- [MLLM-NPU](https://github.com/TencentARC/mllm-npu)\n\n## 🔎 License\nOur Flux-mini model weights follows the liscence of [Flux-Dev non-commercial License](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev).\n\nThe other codes follow the Apache-2.0 License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Ffluxkits","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftencentarc%2Ffluxkits","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Ffluxkits/lists"}