{"id":13528399,"url":"https://github.com/chainer/chainerrl","last_synced_at":"2025-05-16T06:07:01.757Z","repository":{"id":38417142,"uuid":"80394882","full_name":"chainer/chainerrl","owner":"chainer","description":"ChainerRL is a deep reinforcement learning library built on top of Chainer.","archived":false,"fork":false,"pushed_at":"2021-08-10T18:25:48.000Z","size":14530,"stargazers_count":1170,"open_issues_count":65,"forks_count":224,"subscribers_count":70,"default_branch":"master","last_synced_at":"2024-10-29T20:32:56.986Z","etag":null,"topics":["actor-critic","chainer","deep-learning","dqn","machine-learning","python","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chainer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-30T04:58:15.000Z","updated_at":"2024-10-24T00:54:00.000Z","dependencies_parsed_at":"2022-07-14T23:30:41.281Z","dependency_job_id":null,"html_url":"https://github.com/chainer/chainerrl","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chainer%2Fchainerrl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chainer%2Fchainerrl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chainer%2Fchainerrl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chainer%2Fchainerrl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chainer","download_url":"https://codeload.github.com/chainer/chainerrl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254478190,"owners_count":22077676,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actor-critic","chainer","deep-learning","dqn","machine-learning","python","reinforcement-learning"],"created_at":"2024-08-01T07:00:18.393Z","updated_at":"2025-05-16T06:06:56.736Z","avatar_url":"https://github.com/chainer.png","language":"Python","funding_links":[],"categories":["Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL)","Reinforcement Learning","Libraries","Models/Projects","Python","强化学习","Codes","General benchmark frameworks","Uncategorized"],"sub_categories":["RL/DRL Algorithm Implementations and Software Frameworks","NLP","Official Add-on Packages","Uncategorized"],"readme":"\u003cdiv align=\"center\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/chainer/chainerrl/master/assets/ChainerRL.png\" width=\"400\"/\u003e\u003c/div\u003e\n\n# ChainerRL and PFRL\n[![Build Status](https://travis-ci.org/chainer/chainerrl.svg?branch=master)](https://travis-ci.org/chainer/chainerrl)\n[![Coverage Status](https://coveralls.io/repos/github/chainer/chainerrl/badge.svg?branch=master)](https://coveralls.io/github/chainer/chainerrl?branch=master)\n[![Documentation Status](https://readthedocs.org/projects/chainerrl/badge/?version=latest)](http://chainerrl.readthedocs.io/en/latest/?badge=latest)\n[![PyPI](https://img.shields.io/pypi/v/chainerrl.svg)](https://pypi.python.org/pypi/chainerrl)\n\nChainerRL (this repository) is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using [Chainer](https://github.com/chainer/chainer), a flexible deep learning framework. [PFRL](https://github.com/pfnet/pfrl) is the PyTorch analog of ChainerRL.\n\n![Breakout](assets/breakout.gif)\n![Humanoid](assets/humanoid.gif)\n![Grasping](assets/grasping.gif)\n![Atlas](examples/atlas/assets/atlas.gif)\n\n## Installation\n\nChainerRL is tested with 3.6. For other requirements, see [requirements.txt](requirements.txt).\n\nChainerRL can be installed via PyPI:\n```\npip install chainerrl\n```\n\nIt can also be installed from the source code:\n```\npython setup.py install\n```\n\nRefer to [Installation](http://chainerrl.readthedocs.io/en/latest/install.html) for more information on installation. \n\n## Getting started\n\nYou can try [ChainerRL Quickstart Guide](examples/quickstart/quickstart.ipynb) first, or check the [examples](examples) ready for Atari 2600 and Open AI Gym.\n\nFor more information, you can refer to [ChainerRL's documentation](http://chainerrl.readthedocs.io/en/latest/index.html).\n\n## Algorithms\n\n| Algorithm | Discrete Action | Continous Action | Recurrent Model | Batch Training | CPU Async Training |\n|:----------|:---------------:|:----------------:|:---------------:|:--------------:|:------------------:|\n| DQN (including DoubleDQN etc.) | ✓ | ✓ (NAF) | ✓ | ✓ | x |\n| Categorical DQN | ✓ | x | ✓ | ✓ | x |\n| Rainbow | ✓ | x | ✓ | ✓ | x |\n| IQN | ✓ | x | ✓ | ✓ | x |\n| DDPG | x | ✓ | ✓ | ✓ | x |\n| A3C  | ✓ | ✓ | ✓ | ✓ (A2C) | ✓ |\n| ACER | ✓ | ✓ | ✓ | x | ✓ |\n| NSQ (N-step Q-learning) | ✓ | ✓ (NAF) | ✓ | x | ✓ |\n| PCL (Path Consistency Learning) | ✓ | ✓ | ✓ | x | ✓ |\n| PPO  | ✓ | ✓ | ✓ | ✓ | x |\n| TRPO | ✓ | ✓ | ✓ | ✓ | x |\n| TD3 | x | ✓ | x | ✓ | x |\n| SAC | x | ✓ | x | ✓ | x |\n\nFollowing algorithms have been implemented in ChainerRL:\n- [A2C (Synchronous variant of A3C)](https://openai.com/blog/baselines-acktr-a2c/)\n  - examples: [[atari (batched)]](examples/atari/train_a2c_ale.py) [[general gym (batched)]](examples/gym/train_a2c_gym.py)\n- [A3C (Asynchronous Advantage Actor-Critic)](https://arxiv.org/abs/1602.01783)\n  - examples: [[atari reproduction]](examples/atari/reproduction/a3c) [[atari]](examples/atari/train_a3c_ale.py) [[general gym]](examples/gym/train_a3c_gym.py)\n- [ACER (Actor-Critic with Experience Replay)](https://arxiv.org/abs/1611.01224)\n  - examples: [[atari]](examples/atari/train_acer_ale.py) [[general gym]](examples/gym/train_acer_gym.py)\n- [Asynchronous N-step Q-learning](https://arxiv.org/abs/1602.01783)\n  - examples: [[atari]](examples/atari/train_nsq_ale.py)\n- [Categorical DQN](https://arxiv.org/abs/1707.06887)\n  - examples: [[atari]](examples/atari/train_categorical_dqn_ale.py) [[general gym]](examples/gym/train_categorical_dqn_gym.py)\n- [DQN (Deep Q-Network)](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) (including [Double DQN](https://arxiv.org/abs/1509.06461), [Persistent Advantage Learning (PAL)](https://arxiv.org/abs/1512.04860), Double PAL, [Dynamic Policy Programming (DPP)](http://www.jmlr.org/papers/volume13/azar12a/azar12a.pdf))\n  - examples: [[atari reproduction]](examples/atari/reproduction/dqn) [[atari]](examples/atari/train_dqn_ale.py) [[atari (batched)]](examples/atari/train_dqn_batch_ale.py) [[flickering atari]](examples/atari/train_drqn_ale.py) [[general gym]](examples/gym/train_dqn_gym.py)\n- [DDPG (Deep Deterministic Policy Gradients)](https://arxiv.org/abs/1509.02971) (including [SVG(0)](https://arxiv.org/abs/1510.09142))\n  - examples: [[mujoco reproduction]](examples/mujoco/reproduction/ddpg) [[mujoco]](examples/mujoco/train_ddpg_gym.py) [[mujoco (batched)]](examples/mujoco/train_ddpg_batch_gym.py)\n- [IQN (Implicit Quantile Networks)](https://arxiv.org/abs/1806.06923)\n  - examples: [[atari reproduction]](examples/atari/reproduction/iqn) [[general gym]](examples/gym/train_iqn_gym.py)\n- [PCL (Path Consistency Learning)](https://arxiv.org/abs/1702.08892)\n  - examples: [[general gym]](examples/gym/train_pcl_gym.py)\n- [PPO (Proximal Policy Optimization)](https://arxiv.org/abs/1707.06347)\n  - examples: [[mujoco reproduction]](examples/mujoco/reproduction/ppo) [[atari]](examples/atari/train_ppo_ale.py) [[mujoco]](examples/mujoco/train_ppo_gym.py) [[mujoco (batched)]](examples/mujoco/train_ppo_batch_gym.py)\n- [Rainbow](https://arxiv.org/abs/1710.02298)\n  - examples: [[atari reproduction]](examples/atari/reproduction/rainbow)\n- [REINFORCE](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf)\n  - examples: [[general gym]](examples/gym/train_reinforce_gym.py)\n- [SAC (Soft Actor-Critic)](https://arxiv.org/abs/1812.05905)\n  - examples: [[mujoco reproduction]](examples/mujoco/reproduction/soft_actor_critic)\n- [TRPO (Trust Region Policy Optimization)](https://arxiv.org/abs/1502.05477) with [GAE (Generalized Advantage Estimation)](https://arxiv.org/abs/1506.02438)\n  - examples: [[mujoco]](examples/mujoco/train_trpo_gym.py)\n- [TD3 (Twin Delayed Deep Deterministic policy gradient algorithm)](https://arxiv.org/abs/1802.09477)\n  - examples: [[mujoco reproduction]](examples/mujoco/reproduction/td3)\n\nFollowing useful techniques have been also implemented in ChainerRL:\n- [NoisyNet](https://arxiv.org/abs/1706.10295)\n  - examples: [[Rainbow]](examples/atari/reproduction/rainbow) [[DQN/DoubleDQN/PAL]](examples/atari/train_dqn_ale.py)\n- [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)\n  - examples: [[Rainbow]](examples/atari/reproduction/rainbow) [[DQN/DoubleDQN/PAL]](examples/atari/train_dqn_ale.py)\n- [Dueling Network](https://arxiv.org/abs/1511.06581)\n  - examples: [[Rainbow]](examples/atari/reproduction/rainbow) [[DQN/DoubleDQN/PAL]](examples/atari/train_dqn_ale.py)\n- [Normalized Advantage Function](https://arxiv.org/abs/1603.00748)\n  - examples: [[DQN]](examples/gym/train_dqn_gym.py) (for continuous-action envs only)\n- [Deep Recurrent Q-Network](https://arxiv.org/abs/1507.06527)\n  - examples: [[DQN]](examples/atari/train_drqn_ale.py)\n\n\n## Visualization\n\nChainerRL has a set of accompanying [visualization tools](https://github.com/chainer/chainerrl-visualizer) in order to aid developers' ability to understand and debug their RL agents. With this visualization tool, the behavior of ChainerRL agents can be easily inspected from a browser UI.\n\n\n## Environments\n\nEnvironments that support the subset of OpenAI Gym's interface (`reset` and `step` methods) can be used.\n\n## Contributing\n\nAny kind of contribution to ChainerRL would be highly appreciated! If you are interested in contributing to ChainerRL, please read [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\n[MIT License](LICENSE).\n\n## Citations\n\nTo cite ChainerRL in publications, please cite our [JMLR paper](https://www.jmlr.org/papers/v22/20-376.html):\n\n```\n@article{JMLR:v22:20-376,\n  author  = {Yasuhiro Fujita and Prabhat Nagarajan and Toshiki Kataoka and Takahiro Ishikawa},\n  title   = {ChainerRL: A Deep Reinforcement Learning Library},\n  journal = {Journal of Machine Learning Research},\n  year    = {2021},\n  volume  = {22},\n  number  = {77},\n  pages   = {1-14},\n  url     = {http://jmlr.org/papers/v22/20-376.html}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchainer%2Fchainerrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchainer%2Fchainerrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchainer%2Fchainerrl/lists"}