Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Kaixhin/spinning-up-basic

Basic versions of agents from Spinning Up in Deep RL written in PyTorch
https://github.com/Kaixhin/spinning-up-basic

deep-learning deep-reinforcement-learning

Last synced: 1 day ago
JSON representation

Basic versions of agents from Spinning Up in Deep RL written in PyTorch

Host: GitHub
URL: https://github.com/Kaixhin/spinning-up-basic
Owner: Kaixhin
License: mit
Created: 2019-01-18T15:07:49.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2021-05-20T06:53:54.000Z (over 3 years ago)
Last Synced: 2024-08-03T15:16:45.340Z (3 months ago)
Topics: deep-learning, deep-reinforcement-learning
Language: Python
Size: 785 KB
Stars: 195
Watchers: 9
Forks: 19
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md

Awesome Lists containing this project

README

        # spinning-up-basic

Basic versions of agents from [Spinning Up in Deep RL](https://spinningup.openai.com/) written in [PyTorch](https://pytorch.org/). Designed to run quickly on CPU on [`Pendulum-v0`](https://gym.openai.com/envs/Pendulum-v0/) from [OpenAI Gym](https://gym.openai.com/).

To see differences between algorithms, try running `diff -y  `, e.g., `diff -y ddpg.py td3.py`.

For MPI versions of on-policy algorithms, see the [`mpi` branch](https://github.com/Kaixhin/spinning-up-basic/tree/mpi).

## Algorithms

- [Vanilla Policy Gradient](https://spinningup.openai.com/en/latest/algorithms/vpg.html)/Advantage Actor-Critic (`vpg.py`)

- [Trust Region Policy Gradient](https://spinningup.openai.com/en/latest/algorithms/trpo.html) (`trpo.py`)

- [Proximal Policy Optimization](https://spinningup.openai.com/en/latest/algorithms/ppo.html) (`ppo.py`)

- [Deep Deterministic Policy Gradient](https://spinningup.openai.com/en/latest/algorithms/ddpg.html) (`ddpg.py`)

- [Twin Delayed DDPG](https://spinningup.openai.com/en/latest/algorithms/td3.html) (`td3.py`)

- [Soft Actor-Critic](https://spinningup.openai.com/en/latest/algorithms/sac.html) (`sac.py`)

- Deep Q-Network (`dqn.py`)

## Implementation Details

Note that implementation details can have a significant effect on performance, as discussed in [What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study](https://arxiv.org/abs/2006.05990). This codebase attempts to be as simple as possible, but note that for instance on-policy algorithms use separate actor and critic networks, a state-independent policy standard deviation, per-minibatch advantage normalisation, and several critic updates per minibatch, while the deterministic off-policy algorithms use layer normalisation. Equally, soft actor-critic uses a transformed Normal distribution by default, but this can also help the on-policy algorithms.

## Results

### Vanilla Policy Gradient/Advantage Actor-Critic

![VPG](results/vpg.png)

### Trust Region Policy Gradient

![TRPO](results/trpo.png)

### Proximal Policy Optimization

![PPO](results/ppo.png)

### Deep Deterministic Policy Gradient

![DDPG](results/ddpg.png)

### Twin Delayed DDPG

![TD3](results/td3.png)

### Soft Actor-Critic

![SAC](results/sac.png)

### Deep Q-Network

![DQN](results/dqn.png)

## Code Links

- [Spinning Up in Deep RL](https://github.com/openai/spinningup) (TensorFlow)

- [Fired Up in Deep RL](https://github.com/kashif/firedup) (PyTorch)