Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vwxyzjn/ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
https://github.com/vwxyzjn/ppo-implementation-details
Last synced: 17 days ago
JSON representation
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
- Host: GitHub
- URL: https://github.com/vwxyzjn/ppo-implementation-details
- Owner: vwxyzjn
- License: other
- Created: 2022-01-14T18:46:35.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-03-23T04:47:28.000Z (8 months ago)
- Last Synced: 2024-10-11T05:50:58.253Z (about 1 month ago)
- Language: Python
- Homepage: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
- Size: 173 KB
- Stars: 626
- Watchers: 3
- Forks: 98
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# The 37 Implementation Details of Proximal Policy Optimization
This repo contains the source code for the blog post *The 37 Implementation Details of Proximal Policy Optimization*
* Blog post url: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
* Tracked Weights and Biases experiments: https://wandb.ai/vwxyzjn/ppo-detailsIf you like this repo, consider checking out CleanRL (https://github.com/vwxyzjn/cleanrl), the RL library that we used to build this repo.
## Get started
Prerequisites:
* Python 3.8+
* [Poetry](https://python-poetry.org)Install dependencies:
```
poetry install
```Train agents:
```
poetry run python ppo.py
```Train agents with experiment tracking:
```
poetry run python ppo.py --track --capture-video
```### Atari
Install dependencies:
```
poetry install -E atari
```
Train agents:
```
poetry run python ppo_atari.py
```
Train agents with experiment tracking:
```
poetry run python ppo_atari.py --track --capture-video
```### Pybullet
Install dependencies:
```
poetry install -E pybullet
```
Train agents:
```
poetry run python ppo_continuous_action.py
```
Train agents with experiment tracking:
```
poetry run python ppo_continuous_action.py --track --capture-video
```### Gym-microrts (MultiDiscrete)
Install dependencies:
```
poetry install -E gym-microrts
```
Train agents:
```
poetry run python ppo_multidiscrete.py
```
Train agents with experiment tracking:
```
poetry run python ppo_multidiscrete.py --track --capture-video
```
Train agents with invalid action masking:
```
poetry run python ppo_multidiscrete_mask.py
```
Train agents with invalid action masking and experiment tracking:
```
poetry run python ppo_multidiscrete_mask.py --track --capture-video
```### Atari with Envpool
Install dependencies:
```
poetry install -E envpool
```
Train agents:
```
poetry run python ppo_atari_envpool.py
```
Train agents with experiment tracking:
```
poetry run python ppo_atari_envpool.py --track
```
Solve `Pong-v5` in 5 mins:
```
poetry run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3
```
400 game scores in `Breakout-v5` with PPO in ~1 hour (side-effects-free 3-4x speed up compared to `ppo_atari.py` with `SyncVectorEnv`):
```
poetry run python ppo_atari_envpool.py --gym-id Breakout-v5
```### Procgen
Install dependencies:
```
poetry install -E procgen
```
Train agents:
```
poetry run python ppo_procgen.py
```
Train agents with experiment tracking:
```
poetry run python ppo_procgen.py --track
```## Reproduction of all of our results
To reproduce the results run with `openai/baselines`, install our fork at [hhttps://github.com/vwxyzjn/baselines](hhttps://github.com/vwxyzjn/baselines). Then follow the scripts in `scripts/baselines`. To reproduce our results, follow the scripts in `scripts/ours`.
## Citation
```bibtex
@inproceedings{shengyi2022the37implementation,
author = {Huang, Shengyi and Dossa, Rousslan Fernand Julien and Raffin, Antonin and Kanervisto, Anssi and Wang, Weixun},
title = {The 37 Implementation Details of Proximal Policy Optimization},
booktitle = {ICLR Blog Track},
year = {2022},
note = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/},
url = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/}
}
```