https://github.com/jianzhnie/deep-rl-toolkit
RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementation of DQN, AC,A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
https://github.com/jianzhnie/deep-rl-toolkit
actor-critic atari ddpg deep-reinforcement-learning dqn gym mujoco ppo sac td3 trpo
Last synced: about 1 month ago
JSON representation
RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementation of DQN, AC,A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
- Host: GitHub
- URL: https://github.com/jianzhnie/deep-rl-toolkit
- Owner: jianzhnie
- License: apache-2.0
- Created: 2024-02-20T06:59:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-30T07:09:26.000Z (12 months ago)
- Last Synced: 2024-12-29T13:44:24.479Z (10 months ago)
- Topics: actor-critic, atari, ddpg, deep-reinforcement-learning, dqn, gym, mujoco, ppo, sac, td3, trpo
- Language: Python
- Homepage: https://jianzhnie.github.io/llmtech/
- Size: 536 KB
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deep-RL-Toolkit
![]()
## Overview
Deep RL Toolkit is a flexible and high-efficient reinforcement learning framework. RLToolkit is developed for practitioners with the following advantages:
- **Reproducible**. We provide algorithms that stably reproduce the result of many influential reinforcement learning algorithms.
- **Extensible**. Build new algorithms quickly by inheriting the abstract class in the framework.
- **Reusable**. Algorithms provided in the repository could be directly adapted to a new task by defining a forward network and training mechanism will be built automatically.
- **Elastic**: allows to elastically and automatically allocate computing resources on the cloud.
- **Lightweight**: the core codes \<1,000 lines (check [Demo](examples/cleanrl/cleanrl_runner.py)).
- **Stable**: much more stable than [Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3) by utilizing various ensemble methods.
## Table of Content
- [Deep-RL-Toolkit](#deep-rl-toolkit)
- [Overview](#overview)
- [Table of Content](#table-of-content)
- [Supported Algorithms](#supported-algorithms)
- [Supported Envs](#supported-envs)
- [Examples](#examples)
- [Quick Start](#quick-start)
- [References](#references)
- [Reference Papers](#reference-papers)
- [References code](#references-code)## Supported Algorithms
RLToolkit implements the following model-free deep reinforcement learning (DRL) algorithms:

## Supported Envs
- **OpenAI Gym**
- **Atari**
- **MuJoCo**
- **PyBullet**For the details of DRL algorithms, please check out the educational webpage [OpenAI Spinning Up](https://spinningup.openai.com/en/latest/).
## Examples
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
If you want to learn more about deep reinforcemnet learning, please read the [deep-rl-class](https://jianzhnie.github.io/llmtech/) and run the [examples](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples).
- [Classic Control](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/discrete)
- [Atari Benchmark](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/atari)
- [Box2d Benchmark](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/box2d)
- [Mujuco Benchmark](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/mujoco)
- [Petting Zoo](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/pettingzoo)### Quick Start
```bash
git clone https://github.com/jianzhnie/deep-rl-toolkit.git# Run the DQN algorithm on the CartPole-v0 environment
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dqn
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo ddqn
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dueling_dqn
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dueling_ddqn# Run the C51 algorithm on the CartPole-v0 environment
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo c51# Run the DDPG algorithm on the Pendulum-v1 environment
python examples/cleanrl/cleanrl_runner.py --env Pendulum-v0 --algo ddpg# Run the PPO algorithm on the CartPole-v0 environment
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo ppo
```## References
### Reference Papers
01. Deep Q-Network (DQN) ([V. Mnih et al. 2015](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf))
02. Double DQN (DDQN) ([H. Van Hasselt et al. 2015](https://arxiv.org/abs/1509.06461))
03. Advantage Actor Critic (A2C)
04. Vanilla Policy Gradient (VPG)
05. Natural Policy Gradient (NPG) ([S. Kakade et al. 2002](http://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf))
06. Trust Region Policy Optimization (TRPO) ([J. Schulman et al. 2015](https://arxiv.org/abs/1502.05477))
07. Proximal Policy Optimization (PPO) ([J. Schulman et al. 2017](https://arxiv.org/abs/1707.06347))
08. Deep Deterministic Policy Gradient (DDPG) ([T. Lillicrap et al. 2015](https://arxiv.org/abs/1509.02971))
09. Twin Delayed DDPG (TD3) ([S. Fujimoto et al. 2018](https://arxiv.org/abs/1802.09477))
10. Soft Actor-Critic (SAC) ([T. Haarnoja et al. 2018](https://arxiv.org/abs/1801.01290))
11. SAC with automatic entropy adjustment (SAC-AEA) ([T. Haarnoja et al. 2018](https://arxiv.org/abs/1812.05905))### References code
- rllib
- https://github.com/ray-project/ray
- https://docs.ray.io/en/latest/rllib/index.html- coach
- https://github.com/IntelLabs/coach
- https://intellabs.github.io/coach- Pearl
- https://github.com/facebookresearch/Pearl
- https://pearlagent.github.io/- tianshou
- https://github.com/thu-ml/tianshou
- https://tianshou.org/en/stable/- stable-baselines3
- https://github.com/DLR-RM/stable-baselines3
- https://stable-baselines3.readthedocs.io/en/master/- PARL
- https://github.com/PaddlePaddle/PARL
- https://parl.readthedocs.io/zh-cn/latest/- openrl
- https://github.com/OpenRL-Lab/openrl/
- https://openrl-docs.readthedocs.io/zh/latest/- cleanrl
- https://github.com/vwxyzjn/cleanrl
- https://docs.cleanrl.dev/