https://github.com/jianzhnie/deep-rl-toolkit

RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementation of DQN, AC,A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
https://github.com/jianzhnie/deep-rl-toolkit

actor-critic atari ddpg deep-reinforcement-learning dqn gym mujoco ppo sac td3 trpo

Last synced: about 1 month ago
JSON representation

RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementation of DQN, AC,A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

Host: GitHub
URL: https://github.com/jianzhnie/deep-rl-toolkit
Owner: jianzhnie
License: apache-2.0
Created: 2024-02-20T06:59:26.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-30T07:09:26.000Z (12 months ago)
Last Synced: 2024-12-29T13:44:24.479Z (10 months ago)
Topics: actor-critic, atari, ddpg, deep-reinforcement-learning, dqn, gym, mujoco, ppo, sac, td3, trpo
Language: Python
Homepage: https://jianzhnie.github.io/llmtech/
Size: 536 KB
Stars: 7
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Deep-RL-Toolkit







## Overview

Deep RL Toolkit is a flexible and high-efficient reinforcement learning framework. RLToolkit is developed for practitioners with the following advantages:

- **Reproducible**. We provide algorithms that stably reproduce the result of many influential reinforcement learning algorithms.

- **Extensible**. Build new algorithms quickly by inheriting the abstract class in the framework.

- **Reusable**.  Algorithms provided in the repository could be directly adapted to a new task by defining a forward network and training mechanism will be built automatically.

- **Elastic**: allows to elastically and automatically allocate computing resources on the cloud.

- **Lightweight**: the core codes \<1,000 lines (check [Demo](examples/cleanrl/cleanrl_runner.py)).

- **Stable**: much more stable than [Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3) by utilizing various ensemble methods.

## Table of Content

- [Deep-RL-Toolkit](#deep-rl-toolkit)

  - [Overview](#overview)

  - [Table of Content](#table-of-content)

  - [Supported Algorithms](#supported-algorithms)

  - [Supported Envs](#supported-envs)

  - [Examples](#examples)

    - [Quick Start](#quick-start)

  - [References](#references)

    - [Reference Papers](#reference-papers)

    - [References code](#references-code)

## Supported Algorithms

RLToolkit implements the following model-free deep reinforcement learning (DRL) algorithms:

![../_images/rl_algorithms_9_15.svg](https://spinningup.openai.com/en/latest/_images/rl_algorithms_9_15.svg)

## Supported Envs

- **OpenAI Gym**

- **Atari**

- **MuJoCo**

- **PyBullet**

For the details of DRL algorithms, please check out the educational webpage [OpenAI Spinning Up](https://spinningup.openai.com/en/latest/).

## Examples































If you want to learn more about deep reinforcemnet learning, please read the [deep-rl-class](https://jianzhnie.github.io/llmtech/) and run the [examples](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples).

- [Classic Control](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/discrete)

- [Atari Benchmark](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/atari)

- [Box2d Benchmark](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/box2d)

- [Mujuco Benchmark](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/mujoco)

- [Petting Zoo](https://github.com/jianzhnie/deep-rl-toolkit/blob/main/examples/pettingzoo)

### Quick Start

```bash

git clone https://github.com/jianzhnie/deep-rl-toolkit.git

# Run the DQN algorithm on the CartPole-v0 environment

python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dqn

python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo ddqn

python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dueling_dqn

python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dueling_ddqn

# Run the C51 algorithm on the CartPole-v0 environment

python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo c51

# Run the DDPG algorithm on the Pendulum-v1 environment

python examples/cleanrl/cleanrl_runner.py --env Pendulum-v0 --algo ddpg

# Run the PPO algorithm on the CartPole-v0 environment

python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo ppo

```

## References

### Reference Papers

01. Deep Q-Network (DQN) _{^{([V. Mnih et al. 2015](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf))}}

02. Double DQN (DDQN) _{^{([H. Van Hasselt et al. 2015](https://arxiv.org/abs/1509.06461))}}

03. Advantage Actor Critic (A2C)

04. Vanilla Policy Gradient (VPG)

05. Natural Policy Gradient (NPG) _{^{([S. Kakade et al. 2002](http://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf))}}

06. Trust Region Policy Optimization (TRPO) _{^{([J. Schulman et al. 2015](https://arxiv.org/abs/1502.05477))}}

07. Proximal Policy Optimization (PPO) _{^{([J. Schulman et al. 2017](https://arxiv.org/abs/1707.06347))}}

08. Deep Deterministic Policy Gradient (DDPG) _{^{([T. Lillicrap et al. 2015](https://arxiv.org/abs/1509.02971))}}

09. Twin Delayed DDPG (TD3) _{^{([S. Fujimoto et al. 2018](https://arxiv.org/abs/1802.09477))}}

10. Soft Actor-Critic (SAC) _{^{([T. Haarnoja et al. 2018](https://arxiv.org/abs/1801.01290))}}

11. SAC with automatic entropy adjustment (SAC-AEA) _{^{([T. Haarnoja et al. 2018](https://arxiv.org/abs/1812.05905))}}

### References code

- rllib

  - https://github.com/ray-project/ray

  - https://docs.ray.io/en/latest/rllib/index.html

- coach

  - https://github.com/IntelLabs/coach

  - https://intellabs.github.io/coach

- Pearl

  - https://github.com/facebookresearch/Pearl

  - https://pearlagent.github.io/

- tianshou

  - https://github.com/thu-ml/tianshou

  - https://tianshou.org/en/stable/

- stable-baselines3

  - https://github.com/DLR-RM/stable-baselines3

  - https://stable-baselines3.readthedocs.io/en/master/

- PARL

  - https://github.com/PaddlePaddle/PARL

  - https://parl.readthedocs.io/zh-cn/latest/

- openrl

  - https://github.com/OpenRL-Lab/openrl/

  - https://openrl-docs.readthedocs.io/zh/latest/

- cleanrl

  - https://github.com/vwxyzjn/cleanrl

  - https://docs.cleanrl.dev/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jianzhnie/deep-rl-toolkit

Awesome Lists containing this project

README