{"id":33285184,"url":"https://github.com/aidc-ai/diffusion-sdpo","last_synced_at":"2025-12-24T20:56:54.340Z","repository":{"id":323677881,"uuid":"1094000983","full_name":"AIDC-AI/Diffusion-SDPO","owner":"AIDC-AI","description":"Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models","archived":false,"fork":false,"pushed_at":"2025-11-11T13:37:16.000Z","size":5465,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-11T15:21:07.885Z","etag":null,"topics":["diffusion-model","dpo","flowmatching","text-to-image"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AIDC-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-11T05:49:53.000Z","updated_at":"2025-11-11T14:18:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/AIDC-AI/Diffusion-SDPO","commit_stats":null,"previous_names":["aidc-ai/diffusion-sdpo"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/AIDC-AI/Diffusion-SDPO","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIDC-AI%2FDiffusion-SDPO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIDC-AI%2FDiffusion-SDPO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIDC-AI%2FDiffusion-SDPO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIDC-AI%2FDiffusion-SDPO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AIDC-AI","download_url":"https://codeload.github.com/AIDC-AI/Diffusion-SDPO/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIDC-AI%2FDiffusion-SDPO/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284902580,"owners_count":27081908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-17T02:00:06.431Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-model","dpo","flowmatching","text-to-image"],"created_at":"2025-11-17T15:03:52.840Z","updated_at":"2025-11-17T15:03:57.838Z","avatar_url":"https://github.com/AIDC-AI.png","language":"Python","readme":"# Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://huggingface.co/AIDC-AI/Diffusion-SDPO\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/HuggingFace-Repo-FFD21E?logo=huggingface\" alt=\"Hugging Face Repo\" height=\"20\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n## 📝 Introduction\n\n**Diffusion-SDPO** is a plug-in training rule for preference alignment of diffusion models. It computes an adaptive scale for the loser branch based on the alignment between winner and loser output-space gradients, so that each update theoretically **does not increase the winner's loss to first order**. This preserves the preferred output while still widening the preference margin. The safeguard is model-agnostic and drops into Diffusion-DPO, DSPO, and DMPO with negligible overhead. See [our paper](https://arxiv.org/abs/2511.03317) for details (derivation of the safety bound, the output-space approximation, and the closed-form solution). \n\nThis repository is the official implementation of paper [Diffusion-SDPO](https://arxiv.org/abs/2511.03317).\n\n\u003cimg width=\"987\" alt=\"image\" src=\"figs/sdpo_img.png\"\u003e\n\n## 🔧 Setup\n\n```bash\npip install -r requirements.txt\n```\n\n## 📦 Model Checkpoints\n\nAll checkpoints are initialized from Stable Diffusion (SD1.5 or SDXL) and trained as described in the paper.  \nEach name below means *{base model} + {DPO variant} with our safeguarded winner-preserving rule (SDPO)*:\n\n- [**SD1.5-Diffusion-DPO (with SDPO)**](https://huggingface.co/AIDC-AI/Diffusion-SDPO/blob/main/sd1.5/diffusion-dpo/unet.pth) — SD1.5 + Diffusion-DPO augmented by our safeguard  \n- [**SD1.5-DSPO (with SDPO)**](https://huggingface.co/AIDC-AI/Diffusion-SDPO/blob/main/sd1.5/dspo/unet.pth) — SD1.5 + DSPO augmented by our safeguard  \n- [**SD1.5-DMPO (with SDPO)**](https://huggingface.co/AIDC-AI/Diffusion-SDPO/blob/main/sd1.5/dmpo/unet.pth) — SD1.5 + DMPO augmented by our safeguard  \n- [**SDXL-Diffusion-DPO (with SDPO)**](https://huggingface.co/AIDC-AI/Diffusion-SDPO/blob/main/sdxl/diffusion-dpo/unet.pth) — SDXL + Diffusion-DPO augmented by our safeguard  \n- [**SDXL-DSPO (with SDPO)**](https://huggingface.co/AIDC-AI/Diffusion-SDPO/blob/main/sdxl/dspo/unet.pth) — SDXL + DSPO augmented by our safeguard  \n- [**SDXL-DMPO (with SDPO)**](https://huggingface.co/AIDC-AI/Diffusion-SDPO/blob/main/sdxl/dmpo/unet.pth) — SDXL + DMPO augmented by our safeguard\n\n\n## 🚀 Model Training\n\n### Example: SD1.5 + Diffusion-DPO with SDPO safeguard\n\nStart training by running the provided script. It auto-detects the number of GPUs and launches with `accelerate`.\n\n```bash\nbash scripts/train/sd15_diffusion_dpo.sh\n```\n\n**Key arguments in this example**\n\n* `--train_method` selects Diffusion-DPO as the baseline. Choices : [diffusion-dpo, dspo, dmpo]\n* `--beta_dpo` controls the DPO temperature or strength.\n* `--use_winner_preserving` enables our SDPO safeguard that rescales only the loser branch’s backward signal to avoid increasing the winner loss to first order.\n* `--winner_preserving_mu` sets the safeguard strength. Larger values are more conservative.\n* `--mixed_precision bf16` and `--allow_tf32` improve throughput on recent NVIDIA GPUs.\n\n## 📊 Evaluation\n\nWe provide one-click evaluation scripts for SD1.5 and SDXL. They take a `unet.pth` checkpoint and will:\n1) generate images for three prompt groups: **papv2**, **hpsv2**, **partiprompts**\n2) compute **PickScore**, **HPSv2**, **Aesthetics**, **CLIP**, and **ImageReward**\n3) print a summary to the console\n4) optionally, compare two model checkpoints and report per-metric win rates across all prompts\n\n\nThe prompts come from `prompts/`:\n- `papv2.json` is deduplicated from the [Pick-a-Pic v2](https://huggingface.co/datasets/yuvalkirstain/pickapic_v2) test set to ensure prompts are unique\n- `hpsv2.json` and `partiprompts.json` are standard prompt suites used for qualitative and quantitative checks integrated from [HPDv2](https://huggingface.co/datasets/zhwang/HPDv2/tree/main/benchmark) and [Parti](https://github.com/google-research/parti).\n\n### Quick start\n\n**SD1.5 checkpoint**\n```bash\nbash scripts/eval/test_sd15.sh /path/to/your/unet.pth\n```\n\n**SDXL checkpoint**\n```bash\nbash scripts/eval/test_sdxl.sh /path/to/your/unet.pth\n```\n\n**Win-rate comparison**\n```bash\n# A/B win-rate comparison across all prompts from one group (papv2, hpsv2, partiprompts)\n# A.json / B.json are the generation manifests produced by your eval runs.\n\nbash scripts/eval/test_vs.sh \\\n  --json_a path/to/A.json \\\n  --json_b path/to/B.json \\\n  --label_a \"your label A\" \\\n  --label_b \"your label B\"\n```\nExample:\n```bash\nbash scripts/eval/test_vs.sh \\\n  --json_a /path/to/sdxl/diffusion-dpo/hpsv2_seed0_1024x1024_50s_7.5cfg.json \\\n  --json_b /path/to/sdxl/dmpo/hpsv2_seed0_1024x1024_50s_7.5cfg.json \\\n  --label_a \"diffusion_dpo_sdxl_hpsv2\" \\\n  --label_b \"dmpo_sdxl_hpsv2\"\n```\n\n\n## 📚 Citation\n\nIf you find TeEFusion helpful, please cite our paper:\n\n```bibtex\n@article{fu2025diffusion,\n  title={{Diffusion-SDPO}: Safeguarded Direct Preference Optimization for Diffusion Models},\n  author={Fu, Minghao and Wang, Guo-Hua and Cui, Tianyu and Chen, Qing-Guo and Xu, Zhao and Luo, Weihua and Zhang, Kaifu},\n  journal={arXiv:2511.03317},\n  year={2025}\n}\n```\n\n## 🙏 Acknowledgments\n\nThe code is built upon [Diffusers](https://github.com/huggingface/diffusers), [Transformers](https://github.com/huggingface/transformers), [Diffusion-DPO](https://github.com/SalesforceAIResearch/DiffusionDPO) and [DSPO](https://github.com/huaishengzhu/DSPO/tree/main).\n\n## 📄 License\n\nThis project is licensed under the Apache License, Version 2.0 (SPDX-License-Identifier: Apache-2.0).\n\n## 🚨 Disclaimer\n\nWe used compliance checking algorithms during the training process, to ensure the compliance of the trained model(s) to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faidc-ai%2Fdiffusion-sdpo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faidc-ai%2Fdiffusion-sdpo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faidc-ai%2Fdiffusion-sdpo/lists"}