{"id":13638001,"url":"https://github.com/ZiyiZhang27/tdpo","last_synced_at":"2025-04-19T17:32:36.438Z","repository":{"id":240516186,"uuid":"802744094","full_name":"ZiyiZhang27/tdpo","owner":"ZiyiZhang27","description":"[ICML 2024] Code for the paper \"Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases\"","archived":false,"fork":false,"pushed_at":"2024-05-20T15:11:00.000Z","size":3437,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-05-21T13:44:15.120Z","etag":null,"topics":["alignment","diffusion-models","human-feedback","reinforcement-learning","rlhf","stable-diffusion","text-to-image"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2402.08552","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZiyiZhang27.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-19T06:29:40.000Z","updated_at":"2024-07-12T16:39:52.372Z","dependencies_parsed_at":"2024-05-19T13:43:48.837Z","dependency_job_id":"5d47970f-9829-45fd-acfc-4e25fe26cde1","html_url":"https://github.com/ZiyiZhang27/tdpo","commit_stats":null,"previous_names":["ziyizhang27/tdpo"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZiyiZhang27%2Ftdpo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZiyiZhang27%2Ftdpo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZiyiZhang27%2Ftdpo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZiyiZhang27%2Ftdpo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZiyiZhang27","download_url":"https://codeload.github.com/ZiyiZhang27/tdpo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223805032,"owners_count":17205839,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","diffusion-models","human-feedback","reinforcement-learning","rlhf","stable-diffusion","text-to-image"],"created_at":"2024-08-02T01:00:38.376Z","updated_at":"2024-11-09T08:30:34.895Z","avatar_url":"https://github.com/ZiyiZhang27.png","language":"Python","funding_links":[],"categories":["Papers","A01_文本生成_文本对话"],"sub_categories":["2024","大语言对话模型及数据"],"readme":"# Temporal Diffusion Policy Optimization (TDPO)\n\nThis is an official PyTorch implementation of **Temporal Diffusion Policy Optimization (TDPO)** from our paper [*Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases*](https://openreview.net/pdf?id=v2o9rRJcEv), which is accepted by **ICML 2024**.\n\n## Installation\nPython 3.10 or a newer version is required. In order to install the requirements, create a conda environment and run the `setup.py` file in this repository, e.g. run the following commands:\n\n```bash\nconda create -p tdpo python=3.10.12 -y\nconda activate tdpo\n\ngit clone git@github.com:ZiyiZhang27/tdpo.git\ncd tdpo\npip install -e .\n```\n\n## Training\n\nTo train on **Aesthetic Score** and evaluate *cross-reward generalization* by out-of-domain reward functions, run this command:\n\n```bash\naccelerate launch scripts/train_tdpo.py --config config/config_tdpo.py:aesthetic\n```\nTo train on **PickScore** and evaluate *cross-reward generalization* by out-of-domain reward functions, run this command:\n\n```bash\naccelerate launch scripts/train_tdpo.py --config config/config_tdpo.py:pickscore\n```\n\nTo train on **HPSv2** and evaluate *cross-reward generalization* by out-of-domain reward functions, run this command:\n\n```bash\naccelerate launch scripts/train_tdpo.py --config config/config_tdpo.py:hpsv2\n```\n\nFor detailed explanations of all hyperparameters, please refer to the configuration files `config/base_tdpo.py` and `config/config_tdpo.py`. These files are pre-configured for training with 8 x NVIDIA A100 GPUs (each with 40GB of memory).\n\n**Note:** Some hyperparameters might appear in both configuration files. In such cases, only the values set in `config/config_tdpo.py` will be used during training as this file has higher priority.\n\n## Citation\n\nIf you find this work useful in your research, please consider citing:\n\n```bibtex\n@inproceedings{zhang2024confronting,\n  title={Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases},\n  author={Ziyi Zhang and Sen Zhang and Yibing Zhan and Yong Luo and Yonggang Wen and Dacheng Tao},\n  booktitle={Forty-first International Conference on Machine Learning},\n  year={2024}\n}\n```\n\n## Acknowledgement\n\n- This repository is built upon the [PyTorch codebase of DDPO](https://github.com/kvablack/ddpo-pytorch) developed by Kevin Black and his team. We are grateful for their contribution to the field.\n\n- We also extend our thanks to Timo Klein for open-sourcing the [PyTorch reimplementation](https://github.com/timoklein/redo/) of [ReDo](https://arxiv.org/abs/2302.12902).\n\n- We also acknowledge the contributions of [PickScore](https://github.com/yuvalkirstain/PickScore), [HPSv2](https://github.com/tgxs002/HPSv2), and [ImageReward](https://github.com/THUDM/ImageReward) projects to this work.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZiyiZhang27%2Ftdpo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FZiyiZhang27%2Ftdpo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZiyiZhang27%2Ftdpo/lists"}