Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lgvaz/rlbox
RLbox: Solving OpenAI Gym with TensorFlow
https://github.com/lgvaz/rlbox
atari continuous-control deep-reinforcement-learning deep-rl dqn mujoco openai-gym ppo proximal-policy-optimization tensorflow
Last synced: 2 months ago
JSON representation
RLbox: Solving OpenAI Gym with TensorFlow
- Host: GitHub
- URL: https://github.com/lgvaz/rlbox
- Owner: lgvaz
- License: mit
- Created: 2017-07-15T19:52:02.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-04-19T18:43:10.000Z (over 6 years ago)
- Last Synced: 2024-08-15T00:20:36.004Z (5 months ago)
- Topics: atari, continuous-control, deep-reinforcement-learning, deep-rl, dqn, mujoco, openai-gym, ppo, proximal-policy-optimization, tensorflow
- Language: Python
- Homepage:
- Size: 2.62 MB
- Stars: 7
- Watchers: 4
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Reinforcement Learning Box
*Discountinued I'm now working on a [Pytorch RL library](https://github.com/lgvaz/torchrl)*
RLbox provides a framework for rapid experimentation with popular Deep Reinforcement Learning algorithms, it focus on making very easy to implement new ideias, which can be rapidly evaluated using [OpenAI Gym](https://github.com/openai/gym).## Installation
```bash
git clone https://github.com/apparatusbox/rlbox.git
cd rlbox
pip install -e .
```## How to use
Examples on how to run different agents can be found on the [examples](https://github.com/lgvaz/rlbox/tree/master/examples) folder.## Implemented algorithms
### State of the art
* [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) and extensions
* [Double Q-learning](https://arxiv.org/pdf/1509.06461.pdf)
* [Dueling networks](https://arxiv.org/pdf/1511.06581.pdf)
* N-step learning
* Soft target update
* [PPO](https://arxiv.org/pdf/1707.06347.pdf)
* Clipped Surrogate Objective
* Adaptive KL Penalty Coefficient
### Classical
* Vanilla Policy Gradient
* REINFORCE
* Actor-Critic## Results
* __DQN on BreakoutNoFrameskip-v4__
Episode 0 ---------------- Episode 3500 ----------- Episode 6000 ----------- Episode 7500 ----------- Episode 21500
![episode 0](assets/ep0_nolegend.gif)
![episode 3500](assets/ep3500_nolegend.gif)
![episode 6000](assets/ep6000_nolegend.gif)
![episode 7500](assets/ep7500_nolegend.gif)
![episode 21500](assets/ep21500_nolegend.gif)
Mean reward after training: 421 (Averaged over 100 episodes)
Dark blue: Standard DQN
Light blue: Double DQN
![Breakout reward](assets/breakout_plots.png)* __PPO on Hopper-v1__ [Video](https://www.youtube.com/watch?v=QHAu8EWRJJ0&feature=youtu.be)
![Hopper reward](assets/ppo_reward.png)