{"id":19287171,"url":"https://github.com/opendilab/generativerl","last_synced_at":"2025-10-30T20:36:59.668Z","repository":{"id":244087404,"uuid":"785061703","full_name":"opendilab/GenerativeRL","owner":"opendilab","description":"Python library for solving reinforcement learning (RL) problems using generative models (e.g. Diffusion Models).","archived":false,"fork":false,"pushed_at":"2025-02-18T04:49:47.000Z","size":9301,"stargazers_count":123,"open_issues_count":1,"forks_count":8,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-28T21:01:29.573Z","etag":null,"topics":["diffusion","diffusion-models","diffusion-policy","flow-model","generative-ai","generative-model","offline-rl","reinforcement-learning","rl"],"latest_commit_sha":null,"homepage":"https://opendilab.github.io/GenerativeRL/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opendilab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-11T05:45:37.000Z","updated_at":"2025-03-27T04:01:44.000Z","dependencies_parsed_at":"2024-12-03T07:21:31.470Z","dependency_job_id":"25e29cbe-6a01-4770-b478-0634a4f76404","html_url":"https://github.com/opendilab/GenerativeRL","commit_stats":null,"previous_names":["opendilab/generativerl"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opendilab%2FGenerativeRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opendilab%2FGenerativeRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opendilab%2FGenerativeRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opendilab%2FGenerativeRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opendilab","download_url":"https://codeload.github.com/opendilab/GenerativeRL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247256093,"owners_count":20909240,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion","diffusion-models","diffusion-policy","flow-model","generative-ai","generative-model","offline-rl","reinforcement-learning","rl"],"created_at":"2024-11-09T22:05:35.606Z","updated_at":"2025-10-30T20:36:54.625Z","avatar_url":"https://github.com/opendilab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Generative Reinforcement Learning\n\n[![Twitter](https://img.shields.io/twitter/url?style=social\u0026url=https%3A%2F%2Ftwitter.com%2Fopendilab)](https://twitter.com/opendilab)    \n[![GitHub stars](https://img.shields.io/github/stars/opendilab/GenerativeRL)](https://github.com/opendilab/GenerativeRL/stargazers)\n[![GitHub forks](https://img.shields.io/github/forks/opendilab/GenerativeRL)](https://github.com/opendilab/GenerativeRL/network)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/m/opendilab/GenerativeRL)\n[![GitHub issues](https://img.shields.io/github/issues/opendilab/GenerativeRL)](https://github.com/opendilab/GenerativeRL/issues)\n[![GitHub pulls](https://img.shields.io/github/issues-pr/opendilab/GenerativeRL)](https://github.com/opendilab/GenerativeRL/pulls)\n[![Contributors](https://img.shields.io/github/contributors/opendilab/GenerativeRL)](https://github.com/opendilab/GenerativeRL/graphs/contributors)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![GenerativeRL Preprint](http://img.shields.io/badge/paper-arxiv.2412.01245-B31B1B.svg)](https://arxiv.org/abs/2412.01245)\n\nEnglish | [简体中文(Simplified Chinese)](https://github.com/opendilab/GenerativeRL/blob/main/README.zh.md)\n\n**GenerativeRL**, short for Generative Reinforcement Learning, is a Python library for solving reinforcement learning (RL) problems using generative models, such as diffusion models and flow models. This library aims to provide a framework for combining the power of generative models with the decision-making capabilities of reinforcement learning algorithms.\n\n## Outline\n\n- [Features](#features)\n- [Framework Structure](#framework-structure)\n- [Integrated Generative Models](#integrated-generative-models)\n- [Integrated Algorithms](#integrated-algorithms)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Documentation](#documentation)\n- [Tutorials](#tutorials)\n- [Benchmark experiments](#benchmark-experiments)\n\n## Features\n\n- Support for training, evaluation and deploying diverse generative models, including diffusion models and flow models\n- Integration of generative models for state representation, action representation, policy learning and dynamic model learning in RL\n- Implementation of popular RL algorithms tailored for generative models, such as Q-guided policy optimization (QGPO)\n- Support for various RL environments and benchmarks\n- Easy-to-use API for training and evaluation\n\n## Framework Structure\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/framework.png\" alt=\"Image Description 1\" width=\"80%\" height=\"auto\" style=\"margin: 0 1%;\"\u003e\n\u003c/p\u003e\n\n## Integrated Generative Models\n\n|                                                                                     | [Score Matching](https://ieeexplore.ieee.org/document/6795935) | [Flow Matching](https://arxiv.org/abs/2210.02747) |\n|-------------------------------------------------------------------------------------| -------------------------------------------------------------- | ------------------------------------------------- |\n| **Diffusion Model**   [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/18yHUAmcMh_7xq2U6TBCtcLKX2y4YvNyk)    |             |         |\n| [Linear VP SDE](https://arxiv.org/abs/2011.13456)                                   | ✔                                                              | ✔                                                |\n| [Generalized VP SDE](https://arxiv.org/abs/2209.15571)                              | ✔                                                              | ✔                                                |\n| [Linear SDE](https://arxiv.org/abs/2206.00364)                                      | ✔                                                              | ✔                                                |\n| **Flow Model**    [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1vrxREVXKsSbnsv9G2CnKPVvrbFZleElI)    |            |              |\n| [Independent Conditional Flow Matching](https://arxiv.org/abs/2302.00482)           |  🚫                                                            | ✔                                                |\n| [Optimal Transport Conditional Flow Matching](https://arxiv.org/abs/2302.00482)     |  🚫                                                            | ✔                                                |\n\n\n\n## Integrated Algorithms\n\n| Algo./Models                                        | Diffusion Model                                                                                                                                             |  Flow Model            |\n|---------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------- |\n| [IDQL](https://arxiv.org/abs/2304.10573)            | ✔                                                                                                                                                           |  🚫                   |\n| [QGPO](https://arxiv.org/abs/2304.12824)            | ✔                                                                                                                                                           |  🚫                   |\n| [SRPO](https://arxiv.org/abs/2310.07297)            | ✔                                                                                                                                                           |  🚫                   |\n| GMPO                                                | ✔  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1A79ueOdLvTfrytjOPyfxb6zSKXi1aePv)  | ✔                     |\n| GMPG                                                | ✔  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hhMvQsrV-mruvpSCpmnsOxmCb6bMPOBq)  | ✔                     |\n\n\n## Installation\n\n```bash\npip install GenerativeRL\n```\n\nOr, if you want to install from source:\n\n```bash\ngit clone https://github.com/opendilab/GenerativeRL.git\ncd GenerativeRL\npip install -e .\n```\n\nOr you can use the docker image:\n```bash\ndocker pull opendilab/grl:torch2.3.0-cuda12.1-cudnn8-runtime\ndocker run -it --rm --gpus all opendilab/grl:torch2.3.0-cuda12.1-cudnn8-runtime /bin/bash\n```\n\n## Quick Start\n\nHere is an example of how to train a diffusion model for Q-guided policy optimization (QGPO) in the [LunarLanderContinuous-v2](https://www.gymlibrary.dev/environments/box2d/lunar_lander/) environment using GenerativeRL.\n\nInstall the required dependencies:\n```bash\npip install 'gym[box2d]==0.23.1'\n```\n(The gym version can be from 0.23 to 0.25 for box2d environments, but it is recommended to use 0.23.1 for compatibility with D4RL.)\n\nDownload dataset from [here](https://drive.google.com/file/d/1YnT-Oeu9LPKuS_ZqNc5kol_pMlJ1DwyG/view?usp=drive_link) and save it as `data.npz` in the current directory.\n\nGenerativeRL uses WandB for logging. It will ask you to log in to your account when you use it. You can disable it by running:\n```bash\nwandb offline\n```\n\n```python\nimport gym\n\nfrom grl.algorithms.qgpo import QGPOAlgorithm\nfrom grl.datasets import QGPOCustomizedTensorDictDataset\nfrom grl.utils.log import log\nfrom grl_pipelines.diffusion_model.configurations.lunarlander_continuous_qgpo import config\n\ndef qgpo_pipeline(config):\n    qgpo = QGPOAlgorithm(config, dataset=QGPOCustomizedTensorDictDataset(numpy_data_path=\"./data.npz\", action_augment_num=config.train.parameter.action_augment_num))\n    qgpo.train()\n\n    agent = qgpo.deploy()\n    env = gym.make(config.deploy.env.env_id)\n    observation = env.reset()\n    for _ in range(config.deploy.num_deploy_steps):\n        env.render()\n        observation, reward, done, _ = env.step(agent.act(observation))\n\nif __name__ == '__main__':\n    log.info(\"config: \\n{}\".format(config))\n    qgpo_pipeline(config)\n```\n\nFor more detailed examples and documentation, please refer to the GenerativeRL documentation.\n\n## Documentation\n\nThe full documentation for GenerativeRL can be found at [GenerativeRL Documentation](https://opendilab.github.io/GenerativeRL/).\n\n## Tutorials\n\nWe provide several case tutorials to help you better understand GenerativeRL. See more at [tutorials](https://github.com/opendilab/GenerativeRL/tree/main/grl_pipelines/tutorials).\n\n## Benchmark experiments\n\nWe offer some baseline experiments to evaluate the performance of generative reinforcement learning algorithms. See more at [benchmark](https://github.com/opendilab/GenerativeRL/tree/main/grl_pipelines/benchmark).\n\n## Contributing\n\nWe welcome contributions to GenerativeRL! If you are interested in contributing, please refer to the [Contributing Guide](CONTRIBUTING.md).\n\n## Citation\n\nIf you find GenerativeRL useful in your research, please consider citing the following paper:\n\n```latex\n@misc{zhang2024generative_rl,\n      title={Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective}, \n      author={Jinouwen Zhang and Rongkun Xue and Yazhe Niu and Yun Chen and Jing Yang and Hongsheng Li and Yu Liu},\n      year={2024},\n      eprint={2412.01245},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2412.01245}, \n}\n```\n\n### Papers implemented in GenerativeRL\n\n- [Data-driven Aerodynamic Shape Optimization and Multi-fidelity Design Exploration using Conditional Diffusion-based Geometry Sampling Method](https://www.icas.org/ICAS_ARCHIVE/ICAS2024/data/papers/ICAS2024_0431_paper.pdf) (Yang et al. 2024)\n- [Pretrained Reversible Generation as Unsupervised Visual Representation Learning](https://arxiv.org/abs/2412.01787) (Xue et al. 2024)\n\n## License\n\nGenerativeRL is licensed under the Apache License 2.0. See [LICENSE](LICENSE) for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopendilab%2Fgenerativerl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopendilab%2Fgenerativerl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopendilab%2Fgenerativerl/lists"}