{"id":13653490,"url":"https://github.com/pfnet/pfrl","last_synced_at":"2025-05-15T04:02:50.875Z","repository":{"id":41050106,"uuid":"274629316","full_name":"pfnet/pfrl","owner":"pfnet","description":"PFRL: a PyTorch-based deep reinforcement learning library","archived":false,"fork":false,"pushed_at":"2024-08-04T22:39:35.000Z","size":15533,"stargazers_count":1225,"open_issues_count":35,"forks_count":159,"subscribers_count":89,"default_branch":"master","last_synced_at":"2025-04-03T20:08:05.072Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pfnet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-24T09:31:50.000Z","updated_at":"2025-03-31T02:23:46.000Z","dependencies_parsed_at":"2024-06-18T22:41:18.591Z","dependency_job_id":"76331f1c-6dfe-4df8-8781-93b6919431b1","html_url":"https://github.com/pfnet/pfrl","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pfnet%2Fpfrl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pfnet%2Fpfrl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pfnet%2Fpfrl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pfnet%2Fpfrl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pfnet","download_url":"https://codeload.github.com/pfnet/pfrl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248368822,"owners_count":21092460,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T02:01:11.294Z","updated_at":"2025-04-11T09:38:43.872Z","avatar_url":"https://github.com/pfnet.png","language":"Python","funding_links":[],"categories":["Python","\u003cspan id=\"head41\"\u003e3.5. Machine Learning and Deep Learning\u003c/span\u003e","Reinforcement Learning","强化学习","PyTorch Tools, Libraries, and Frameworks"],"sub_categories":["\u003cspan id=\"head44\"\u003e3.5.3. Reinforce Learning\u003c/span\u003e","Others"],"readme":"\u003cdiv align=\"center\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/pfnet/pfrl/master/assets/PFRL.png\" height=150/\u003e\u003c/div\u003e\n\n# PFRL\n[![Documentation Status](https://readthedocs.org/projects/pfrl/badge/?version=latest)](http://pfrl.readthedocs.io/en/latest/?badge=latest)\n[![PyPI](https://img.shields.io/pypi/v/pfrl.svg)](https://pypi.python.org/pypi/pfrl)\n\nPFRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using [PyTorch](https://github.com/pytorch/pytorch).\n\n![Boxing](assets/boxing.gif)\n![Humanoid](assets/humanoid.gif)\n![Grasping](assets/grasping.gif)\n![Atlas](examples/atlas/assets/atlas.gif)\n![SlimeVolley](examples/slimevolley/assets/slimevolley.gif)\n\n## Installation\n\nPFRL is tested with Python 3.7.7. For other requirements, see [requirements.txt](requirements.txt).\n\nPFRL can be installed via PyPI:\n```\npip install pfrl\n```\n\nIt can also be installed from the source code:\n```\npython setup.py install\n```\n\nRefer to [Installation](http://pfrl.readthedocs.io/en/latest/install.html) for more information on installation. \n\n## Getting started\n\nYou can try [PFRL Quickstart Guide](examples/quickstart/quickstart.ipynb) first, or check the [examples](examples) ready for Atari 2600 and Open AI Gym.\n\nFor more information, you can refer to [PFRL's documentation](http://pfrl.readthedocs.io/en/latest/index.html).\n\n### Blog Posts\n- [Introducing PFRL: A PyTorch-based Deep RL Library](https://t.co/VaT06nejSC?amp=1)\n- [PFRL’s Pretrained Model Zoo](https://bit.ly/3fNx5xH)\n\n## Algorithms\n\n| Algorithm | Discrete Action | Continuous Action | Recurrent Model | Batch Training | CPU Async Training | Pretrained models\u003csup\u003e*\u003c/sup\u003e |\n|:----------|:---------------:|:----------------:|:---------------:|:--------------:|:------------------:|:------------------:|\n| DQN (including DoubleDQN etc.) | ✓ | ✓ (NAF) | ✓ | ✓ | x | ✓ |\n| Categorical DQN | ✓ | x | ✓ | ✓ | x | x |\n| Rainbow | ✓ | x | ✓ | ✓ | x | ✓ |\n| IQN | ✓ | x | ✓ | ✓ | x | ✓ |\n| DDPG | x | ✓ | x | ✓ | x | ✓ |\n| A3C  | ✓ | ✓ | ✓ | ✓ (A2C) | ✓ | ✓ |\n| ACER | ✓ | ✓ | ✓ | x | ✓ | x |\n| PPO  | ✓ | ✓ | ✓ | ✓ | x | ✓ |\n| TRPO | ✓ | ✓ | ✓ | ✓ | x | ✓ |\n| TD3 | x | ✓ | x | ✓ | x | ✓ |\n| SAC | x | ✓ | x | ✓ | x | ✓ |\n\n**\u003csup\u003e*\u003c/sup\u003eNote on Pretrained models**: PFRL provides pretrained models (sometimes called a 'model zoo') for our reproducibility scripts on [Atari environments](https://github.com/pfnet/pfrl/tree/master/examples/atari/reproduction) (DQN, IQN, Rainbow, and A3C) and [Mujoco environments](https://github.com/pfnet/pfrl/tree/master/examples/mujoco/reproduction) (DDPG, TRPO, PPO, TD3, SAC), for each benchmarked environment. \n\nFollowing algorithms have been implemented in PFRL:\n- [A2C (Synchronous variant of A3C)](https://openai.com/blog/baselines-acktr-a2c/)\n  - examples: [[atari (batched)]](examples/atari/train_a2c_ale.py)\n- [A3C (Asynchronous Advantage Actor-Critic)](https://arxiv.org/abs/1602.01783)\n  - examples: [[atari reproduction]](examples/atari/reproduction/a3c) [[atari]](examples/atari/train_a3c_ale.py)\n- [ACER (Actor-Critic with Experience Replay)](https://arxiv.org/abs/1611.01224)\n  - examples: [[atari]](examples/atari/train_acer_ale.py)\n- [Categorical DQN](https://arxiv.org/abs/1707.06887)\n  - examples: [[atari]](examples/atari/train_categorical_dqn_ale.py) [[general gym]](examples/gym/train_categorical_dqn_gym.py)\n- [DQN (Deep Q-Network)](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) (including [Double DQN](https://arxiv.org/abs/1509.06461), [Persistent Advantage Learning (PAL)](https://arxiv.org/abs/1512.04860), Double PAL, [Dynamic Policy Programming (DPP)](http://www.jmlr.org/papers/volume13/azar12a/azar12a.pdf))\n  - examples: [[atari reproduction]](examples/atari/reproduction/dqn) [[atari]](examples/atari/train_dqn_ale.py) [[atari (batched)]](examples/atari/train_dqn_batch_ale.py) [[flickering atari]](examples/atari/train_drqn_ale.py) [[general gym]](examples/gym/train_dqn_gym.py)\n- [DDPG (Deep Deterministic Policy Gradients)](https://arxiv.org/abs/1509.02971) (including [SVG(0)](https://arxiv.org/abs/1510.09142))\n  - examples: [[mujoco reproduction]](examples/mujoco/reproduction/ddpg)\n- [IQN (Implicit Quantile Networks)](https://arxiv.org/abs/1806.06923)\n  - examples: [[atari reproduction]](examples/atari/reproduction/iqn)\n- [PPO (Proximal Policy Optimization)](https://arxiv.org/abs/1707.06347)\n  - examples: [[mujoco reproduction]](examples/mujoco/reproduction/ppo) [[atari]](examples/atari/train_ppo_ale.py)\n- [Rainbow](https://arxiv.org/abs/1710.02298)\n  - examples: [[atari reproduction]](examples/atari/reproduction/rainbow) [[Slime volleyball]](examples/slimevolley/)\n- [REINFORCE](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf)\n  - examples: [[general gym]](examples/gym/train_reinforce_gym.py)\n- [SAC (Soft Actor-Critic)](https://arxiv.org/abs/1812.05905)\n  - examples: [[mujoco reproduction]](examples/mujoco/reproduction/soft_actor_critic) [[Atlas walk]](examples/atlas/)\n- [TRPO (Trust Region Policy Optimization)](https://arxiv.org/abs/1502.05477) with [GAE (Generalized Advantage Estimation)](https://arxiv.org/abs/1506.02438)\n  - examples: [[mujoco reproduction]](examples/mujoco/reproduction/trpo)\n- [TD3 (Twin Delayed Deep Deterministic policy gradient algorithm)](https://arxiv.org/abs/1802.09477)\n  - examples: [[mujoco reproduction]](examples/mujoco/reproduction/td3)\n\nFollowing useful techniques have been also implemented in PFRL:\n- [NoisyNet](https://arxiv.org/abs/1706.10295)\n  - examples: [[Rainbow]](examples/atari/reproduction/rainbow) [[DQN/DoubleDQN/PAL]](examples/atari/train_dqn_ale.py)\n- [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)\n  - examples: [[Rainbow]](examples/atari/reproduction/rainbow) [[DQN/DoubleDQN/PAL]](examples/atari/train_dqn_ale.py)\n- [Dueling Network](https://arxiv.org/abs/1511.06581)\n  - examples: [[Rainbow]](examples/atari/reproduction/rainbow) [[DQN/DoubleDQN/PAL]](examples/atari/train_dqn_ale.py)\n- [Normalized Advantage Function](https://arxiv.org/abs/1603.00748)\n  - examples: [[DQN]](examples/gym/train_dqn_gym.py) (for continuous-action envs only)\n- [Deep Recurrent Q-Network](https://arxiv.org/abs/1507.06527)\n  - examples: [[DQN]](examples/atari/train_drqn_ale.py)\n\n\n## Environments\n\nEnvironments that support the subset of OpenAI Gym's interface (`reset` and `step` methods) can be used.\n\n## Contributing\n\nAny kind of contribution to PFRL would be highly appreciated! If you are interested in contributing to PFRL, please read [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\n[MIT License](LICENSE).\n\n## Citations\n\nTo cite PFRL in publications, please cite our [paper](https://www.jmlr.org/papers/v22/20-376.html) on ChainerRL, the library on which PFRL is based:\n\n```\n@article{JMLR:v22:20-376,\n  author  = {Yasuhiro Fujita and Prabhat Nagarajan and Toshiki Kataoka and Takahiro Ishikawa},\n  title   = {ChainerRL: A Deep Reinforcement Learning Library},\n  journal = {Journal of Machine Learning Research},\n  year    = {2021},\n  volume  = {22},\n  number  = {77},\n  pages   = {1-14},\n  url     = {http://jmlr.org/papers/v22/20-376.html}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpfnet%2Fpfrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpfnet%2Fpfrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpfnet%2Fpfrl/lists"}