{"id":13488337,"url":"https://github.com/mihirp1998/AlignProp","last_synced_at":"2025-03-28T00:33:42.135Z","repository":{"id":198600444,"uuid":"701109860","full_name":"mihirp1998/AlignProp","owner":"mihirp1998","description":"AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion","archived":false,"fork":false,"pushed_at":"2024-03-14T03:47:33.000Z","size":4050,"stargazers_count":189,"open_issues_count":8,"forks_count":7,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-05-15T23:58:45.185Z","etag":null,"topics":["alignment","diffusion-models","reinforcement-learning","stable-diffusion","text-to-image"],"latest_commit_sha":null,"homepage":"https://align-prop.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mihirp1998.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-06T00:03:26.000Z","updated_at":"2024-07-31T20:53:01.449Z","dependencies_parsed_at":"2023-10-12T06:04:00.254Z","dependency_job_id":"232633d2-2e0d-436b-8b9b-3ffe144b3190","html_url":"https://github.com/mihirp1998/AlignProp","commit_stats":{"total_commits":10,"total_committers":4,"mean_commits":2.5,"dds":0.4,"last_synced_commit":"b519ad87484cdafcfca761013928e59a3fac0923"},"previous_names":["mihirp1998/alignprop"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mihirp1998%2FAlignProp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mihirp1998%2FAlignProp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mihirp1998%2FAlignProp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mihirp1998%2FAlignProp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mihirp1998","download_url":"https://codeload.github.com/mihirp1998/AlignProp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245949275,"owners_count":20698912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","diffusion-models","reinforcement-learning","stable-diffusion","text-to-image"],"created_at":"2024-07-31T18:01:13.977Z","updated_at":"2025-03-28T00:33:37.118Z","avatar_url":"https://github.com/mihirp1998.png","language":"Python","funding_links":[],"categories":["T2I Diffusion Model augmentation"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003c!-- TITLE --\u003e\n# **Aligning Text-to-Image Diffusion Models with Reward Backpropagation**\n![AlignProp](assets/method.png)\n\n[![arXiv](https://img.shields.io/badge/cs.LG-arXiv:2310.03739-b31b1b.svg)](https://arxiv.org/abs/2310.03739)\n[![Website](https://img.shields.io/badge/🌎-Website-blue.svg)](http://align-prop.github.io)\n\u003c/div\u003e\n\nThis is the official implementation of our paper [Aligning Text-to-Image Diffusion Models with Reward Backpropagation](https://arxiv.org/abs/2310.03739) by Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, and Katerina Fragkiadaki.\n\n\n\u003c!-- DESCRIPTION --\u003e\n## Abstract\nText-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets.  Due to the weakly supervised   training,  controlling  their behavior in downstream tasks, such as maximizing human-perceived image quality,  image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations.  We show AlignProp  achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest.\n\n## Code\n\n### Installation \nCreate a conda environment with the following command:\n```bash\nconda create -n alignprop python=3.10\nconda activate alignprop\npip install -r requirements.txt\n```\nPlease use accelerate==0.17.0, other library dependancies might be flexible.\n\n### Training Code\n\nAccelerate will automatically handle multi-GPU setting. \nThe code can work on a single GPU, as we automatically handle gradient accumulation as per the available GPUs in the CUDA_VISIBLE_DEVICES environment variable.\nFor our experiments, we used 4 A100s- 40GB RAM to run our code. If you are using a GPU with a smaller RAM, please edit the `per_gpu_capacity` variable accordingly. Further if u are bottlenecked by GPU memory, consider using AlignProp with K=1, this will significantly reduce the memroy usage. \n\n#### Aesthetic Reward model.\nCurrently we early stop the code to prevent overfitting, however feel free to play with the `num_epochs` variable as per your needs.\n\n```bash\naccelerate launch main.py --config config/align_prop.py:aesthetic\n```\n\nIf you are memory bottlenecked use AlignProp K=1, feel free to vary `trunc_backprop_timestep` as per ur memory avaibility, use the following command. Lower values of `trunc_backprop_timestep` (higher values of K) can help with focusing on more semantic details:\n\n\n```bash\naccelerate launch main.py --config config/align_prop.py:aesthetic_k1\n```\n\n#### HPSv2 Reward model.\n\n```bash\naccelerate launch main.py --config config/align_prop.py:hps\n```\n\nIf you are memory bottlenecked use AlignProp K=1, feel free to vary `trunc_backprop_timestep` as per ur memory avaibility, use the following command. Lower values of `trunc_backprop_timestep` (higher values of K) can help with focusing on more semantic details:\n\n\n```bash\naccelerate launch main.py --config config/align_prop.py:hps_k1\n```\n\n### Evaluation \u0026 Checkpoints\nPlease find the checkpoints for Aesthetic reward function [here](https://drive.google.com/file/d/1r7291awe3z37drfKyxLyqcNq6dHl6Egf/view?usp=sharing) and Hps-v2 reward function [here](https://drive.google.com/file/d/1nvSxwxf-OnDrKq4ob-j5islfUSif8lQb/view?usp=sharing)\n\nEvaluates the model checkpoint, as per the `resume_from` variable in the config file.  Evaluation includes calculating the reward and storing/uploading the images to local/wandb.\n\n#### normal evaluation.\n\n```bash\naccelerate launch main.py --config config/align_prop.py:evaluate\n```\n#### with mixing.\nUpdate the `resume_from` and `resume_from_2` varaibles to mention the checkpoints to mix. Set `resume_from_2` to `stablediffusion` to interpolate between `resume_from` and Stable diffusion weights.  The coefficient of mixing is based on the variable `mixing_coef_1` which can be edited in the config file.\n\n```bash\naccelerate launch main.py --config config/align_prop.py:evaluate_soup\n```\n\n### Acknowledgement\n\nOur codebase is directly built on top of [DDPO](https://github.com/kvablack/ddpo-pytorch). \nWe would like to thank Kevin Black and team, for opensourcing their code.\n\n## Citation\n\nIf you find this work useful in your research, please cite:\n\n```bibtex\n@misc{prabhudesai2023aligning,\n      title={Aligning Text-to-Image Diffusion Models with Reward Backpropagation}, \n      author={Mihir Prabhudesai and Anirudh Goyal and Deepak Pathak and Katerina Fragkiadaki},\n      year={2023},\n      eprint={2310.03739},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmihirp1998%2FAlignProp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmihirp1998%2FAlignProp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmihirp1998%2FAlignProp/lists"}