{"id":16271687,"url":"https://github.com/jianzhnie/rltoolkit","last_synced_at":"2025-08-03T01:32:11.240Z","repository":{"id":45822156,"uuid":"512641623","full_name":"jianzhnie/RLToolkit","owner":"jianzhnie","description":"RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....","archived":false,"fork":false,"pushed_at":"2023-12-14T05:29:41.000Z","size":15469,"stargazers_count":17,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-12-02T08:54:21.329Z","etag":null,"topics":["a2c","actor-critic","ddpg","ddqn","dqn","maddpg","mappo","ppo","qmix","rl","sac","td3","trpo"],"latest_commit_sha":null,"homepage":"https://jianzhnie.github.io/machine-learning-wiki/#/deep-rl/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jianzhnie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-07-11T06:30:04.000Z","updated_at":"2024-11-15T02:10:03.000Z","dependencies_parsed_at":"2024-02-20T09:25:45.528Z","dependency_job_id":null,"html_url":"https://github.com/jianzhnie/RLToolkit","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jianzhnie%2FRLToolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jianzhnie%2FRLToolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jianzhnie%2FRLToolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jianzhnie%2FRLToolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jianzhnie","download_url":"https://codeload.github.com/jianzhnie/RLToolkit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228514331,"owners_count":17932384,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a2c","actor-critic","ddpg","ddqn","dqn","maddpg","mappo","ppo","qmix","rl","sac","td3","trpo"],"created_at":"2024-10-10T18:14:25.728Z","updated_at":"2024-12-06T18:39:11.855Z","avatar_url":"https://github.com/jianzhnie.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\n\n * @Author: jianzhnie\n * @LastEditors: jianzhnie\n * @Description: RLToolKit is a flexible and high-efficient reinforcement learning framework.\n * Copyright (c) 2022 by jianzhnie@126.com, All Rights Reserved.\n--\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/images/logo.png\" alt=\"logo\" width=\"1000\"/\u003e\n\u003c/p\u003e\n---\n\n[![Documentation Status](https://readthedocs.org/projects/deep-rl-docs/badge/?version=latest)](https://deep-rl-docs.readthedocs.io/en/latest/?badge=latest)\n\n## Overview\n\nRLToolkit is a flexible and high-efficient reinforcement learning framework. RLToolkit ([website](https://github.com/jianzhnie/deep-rl-toolkit))) is developed for practitioners with the following advantages:\n\n- **Reproducible**. We provide algorithms that stably reproduce the result of many influential reinforcement learning algorithms.\n\n- **Extensible**. Build new algorithms quickly by inheriting the abstract class in the framework.\n\n- **Reusable**.  Algorithms provided in the repository could be directly adapted to a new task by defining a forward network and training mechanism will be built automatically.\n\n- **Elastic**: allows to elastically and automatically allocate computing resources on the cloud.\n\n- **Lightweight**: the core codes \u003c1,000 lines (check [Demo](./examples/tutorials/lesson3/DQN/train.py)).\n\n- **Stable**: much more stable than [Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3) by utilizing various ensemble methods.\n\n\n## Table of Content\n- [Overview](#overview)\n- [Table of Content](#table-of-content)\n- [Abstractions](#abstractions)\n  - [Model](#model)\n  - [Algorithm](#algorithm)\n  - [Agent](#agent)\n- [Supported Algorithms](#supported-algorithms)\n- [Supported Envs](#supported-envs)\n- [Examples](#examples)\n- [Experimental Demos](#experimental-demos)\n- [Contributions](#contributions)\n- [References](#references)\n- [Citation](#citation)\n\n\n## Abstractions\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./docs/images/abstractions.png\" alt=\"abstractions\" width=\"400\"/\u003e\n\u003c/p\u003e\n\nRLToolkit aims to build an agent for training algorithms to perform complex tasks.\nThe main abstractions introduced by PARL that are used to build an agent recursively are the following:\n\n### Model\n`Model` is abstracted to construct the forward network which defines a policy network or critic network given state as input.\n\n### Algorithm\n`Algorithm` describes the mechanism to update parameters in `Model` and often contains at least one model.\n\n### Agent\n`Agent`, a data bridge between the environment and the algorithm, is responsible for data I/O with the outside environment and describes data preprocessing before feeding data into the training process.\n\n## Supported Algorithms\n\nRLToolkit implements the following model-free deep reinforcement learning (DRL) algorithms:\n\n![../_images/rl_algorithms_9_15.svg](https://spinningup.openai.com/en/latest/_images/rl_algorithms_9_15.svg)\n\nA non-exhaustive, but useful taxonomy of algorithms in modern RL.\n\n\u003cimg src=\"docs/images/algorithms.png\" alt=\"Coach Design\" style=\"width: 800px;\"/\u003e\n\n##  Supported Envs\n\n- **OpenAI Gym**\n- **Atari**\n- **MuJoCo**\n- **PyBullet**\n\nFor the details of DRL algorithms, please check out the educational webpage [OpenAI Spinning Up](https://spinningup.openai.com/en/latest/).\n\n## Examples\nIf you want to learn more about deep reinforcemnet learning, please read the [deep-rl-class](https://jianzhnie.github.io/machine-learning-wiki/#/deep-rl/) and run the [examples](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/tutorials).\n\n\n\n[//]: # (Image References)\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/images/trained.gif\" alt=\"logo\" width=\"810\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"examples/tutorials/assets/img/breakout.gif\" width = \"200\" height =\"200\"/\u003e \u003cimg src=\"examples/tutorials/assets/img/spaceinvaders.gif\" width = \"200\" height =\"200\"/\u003e \u003cimg src=\"examples/tutorials/assets/img/seaquest.gif\" width = \"200\" height =\"200\"/\u003e\u003cimg src=\"docs/images/Breakout.gif\" width = \"200\" height =\"200\" alt=\"Breakout\"/\u003e\n\u003cbr\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/images/performance.gif\" width = \"265\" height =\"200\" alt=\"NeurlIPS2018\"/\u003e \u003cimg src=\"docs/images/Half-Cheetah.gif\" width = \"265\" height =\"200\" alt=\"Half-Cheetah\"/\u003e \u003cimg src=\"examples/tutorials/assets/img/snowballfight.gif\" width = \"265\" height =\"200\"/\u003e\n\u003cbr\u003e\n\n- [QuickStart](./benchmark/quickstart/train.py)\n- [DQN](./examples/tutorials/lesson3/DQN/train.py)\n- [N-step-DQN](./examples/tutorials/lesson3/N-step-DQN/train.py)\n- [Noisy-DQN](./examples/tutorials/lesson3/Noisy-DQN/train.py)\n- [Rainbow](./examples/tutorials/lesson3/Rainbow/train.py)\n- [PG](./examples/tutorials/lesson4/pg/train.py)\n- [TRPO](./examples/tutorials/lesson4/trpo/train.py)\n- [PPO](./examples/tutorials/lesson4/ppo/train.py)\n- [AC](./examples/tutorials/lesson4/ac\u0026a2c/train.py)\n- [A2C](./examples/tutorials/lesson4/ac\u0026a2c/train.py)\n- [DDPG](./examples/tutorials/lesson5/ddpg-pendulum)\n- [SAC](./examples/tutorials/lesson5/sac/train.py)\n- [TD3](./examples/tutorials/lesson5/td3/train.py)\n- [QMIX](./examples/tutorials/lesson6/qmix/train.py)\n- [IDQN](./examples/tutorials/lesson6/idqn/train.py)\n- [MADDPG](./examples/tutorials/lesson6/maddpg/train.py)\n- [vdn](./examples/tutorials/lesson6/vdn/train.py)\n\n\n## Experimental Demos\n\n- **Quick start**\n```python\n# into demo dirs\ncd  benchmark/quickstart/\n# train\npython train.py\n```\n\n**DNQ  example**\n```python\n# into demo dirs\ncd  examples/tutorials/lesson3/DQN/\n# train\npython train.py\n```\n\n**PPO Example**\n```python\n# into demo dirs\ncd  examples/tutorials/lesson3/DQN/\n# train\npython train.py\n```\n\n**DDPG for Pendulum-v1**\n\n```python\n# into demo dirs\ncd  examples/tutorials/lesson5/ddpg/\n# train\npython train.py\n```\n...\n\n\n## Contributions\n\nWe welcome any contributions to the codebase, but we ask that you please **do not** submit/push code that breaks the tests. Also, please shy away from modifying the tests just to get your proposed changes to pass them. As it stands, the tests on their own are quite minimal (instantiating environments, training agents for one step, etc.), so if they're breaking, it's almost certainly a problem with your code and not with the tests.\n\nWe're actively working on refactoring and trying to make the codebase cleaner and more performant as a whole. If you'd like to help us clean up some code, we'd strongly encourage you to also watch [Uncle Bob's clean coding lessons](https://www.youtube.com/playlist?list=PLmmYSbUCWJ4x1GO839azG_BBw8rkh-zOj) if you haven't already.\n\n## References\n\n1. Deep Q-Network (DQN) \u003csub\u003e\u003csup\u003e ([V. Mnih et al. 2015](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n2. Double DQN (DDQN) \u003csub\u003e\u003csup\u003e ([H. Van Hasselt et al. 2015](https://arxiv.org/abs/1509.06461)) \u003c/sup\u003e\u003c/sub\u003e\n3. Advantage Actor Critic (A2C)\n4. Vanilla Policy Gradient (VPG)\n5. Natural Policy Gradient (NPG) \u003csub\u003e\u003csup\u003e ([S. Kakade et al. 2002](http://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n6. Trust Region Policy Optimization (TRPO) \u003csub\u003e\u003csup\u003e ([J. Schulman et al. 2015](https://arxiv.org/abs/1502.05477)) \u003c/sup\u003e\u003c/sub\u003e\n7. Proximal Policy Optimization (PPO) \u003csub\u003e\u003csup\u003e ([J. Schulman et al. 2017](https://arxiv.org/abs/1707.06347)) \u003c/sup\u003e\u003c/sub\u003e\n8. Deep Deterministic Policy Gradient (DDPG) \u003csub\u003e\u003csup\u003e ([T. Lillicrap et al. 2015](https://arxiv.org/abs/1509.02971)) \u003c/sup\u003e\u003c/sub\u003e\n9. Twin Delayed DDPG (TD3) \u003csub\u003e\u003csup\u003e ([S. Fujimoto et al. 2018](https://arxiv.org/abs/1802.09477)) \u003c/sup\u003e\u003c/sub\u003e\n10. Soft Actor-Critic (SAC) \u003csub\u003e\u003csup\u003e ([T. Haarnoja et al. 2018](https://arxiv.org/abs/1801.01290)) \u003c/sup\u003e\u003c/sub\u003e\n11. SAC with automatic entropy adjustment (SAC-AEA) \u003csub\u003e\u003csup\u003e ([T. Haarnoja et al. 2018](https://arxiv.org/abs/1812.05905)) \u003c/sup\u003e\u003c/sub\u003e\n\n\n\n## Citation\n\nTo cite this repository:\n\n```\n@misc{erl,\n  author = {jianzhnie},\n  title = {{RLToolkit}: An Easy  Deep Reinforcement Learning Toolkit},\n  year = {2022},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/jianzhnie/deep-rl-toolkit}},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjianzhnie%2Frltoolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjianzhnie%2Frltoolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjianzhnie%2Frltoolkit/lists"}