https://github.com/nikhilbarhate99/PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
https://github.com/nikhilbarhate99/PPO-PyTorch

deep-learning deep-reinforcement-learning policy-gradient ppo ppo-pytorch proximal-policy-optimization pytorch pytorch-implmention pytorch-tutorial reinforcement-learning reinforcement-learning-algorithms

Last synced: 2 months ago
JSON representation

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch

Host: GitHub
URL: https://github.com/nikhilbarhate99/PPO-PyTorch
Owner: nikhilbarhate99
License: mit
Created: 2018-09-27T15:07:12.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2024-07-09T21:11:04.000Z (12 months ago)
Last Synced: 2024-11-11T16:24:48.463Z (8 months ago)
Topics: deep-learning, deep-reinforcement-learning, policy-gradient, ppo, ppo-pytorch, proximal-policy-optimization, pytorch, pytorch-implmention, pytorch-tutorial, reinforcement-learning, reinforcement-learning-algorithms
Language: Python
Homepage:
Size: 12.1 MB
Stars: 1,712
Watchers: 8
Forks: 351
Open Issues: 16
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # PPO-PyTorch

### UPDATE [April 2021] : 

- merged discrete and continuous algorithms

- added linear decaying for the continuous action space `action_std`; to make training more stable for complex environments

- added different learning rates for actor and critic

- episodes, timesteps and rewards are now logged in `.csv` files

- utils to plot graphs from log files

- utils to test and make gifs from preTrained networks

- `PPO_colab.ipynb` combining all the files to train / test / plot graphs / make gifs on google colab in a convenient jupyter-notebook

#### [Open `PPO_colab.ipynb` in Google Colab](https://colab.research.google.com/github/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb)

## Introduction

This repository provides a Minimal PyTorch implementation of Proximal Policy Optimization (PPO) with clipped objective for OpenAI gym environments. It is primarily intended for beginners in [Reinforcement Learning](https://en.wikipedia.org/wiki/Reinforcement_learning) for understanding the PPO algorithm. It can still be used for complex environments but may require some hyperparameter-tuning or changes in the code. A concise explaination of PPO algorithm can be found [here](https://stackoverflow.com/questions/46422845/what-is-the-way-to-understand-proximal-policy-optimization-algorithm-in-rl) and a thorough explaination of all the details for implementing best performing PPO can be found [here](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) (All are not implemented in this repo yet). 

To keep the training procedure simple : 

  - It has a **constant standard deviation** for the output action distribution (**multivariate normal with diagonal covariance matrix**) for the continuous environments, i.e. it is a hyperparameter and NOT a trainable parameter. However, it is **linearly decayed**. (action_std significantly affects performance)

  - It uses simple **monte-carlo estimate** for calculating advantages and NOT Generalized Advantage Estimate (check out the OpenAI spinning up implementation for that).

  - It is a **single threaded implementation**, i.e. only one worker collects experience. [One of the older forks](https://github.com/rhklite/Parallel-PPO-PyTorch) of this repository has been modified to have Parallel workers

## Usage

- To train a new network : run `train.py`

- To test a preTrained network : run `test.py`

- To plot graphs using log files : run `plot_graph.py`

- To save images for gif and make gif using a preTrained network : run `make_gif.py`

- All parameters and hyperparamters to control training / testing / graphs / gifs are in their respective `.py` file

- `PPO_colab.ipynb` combines all the files in a jupyter-notebook

- All the **hyperparameters used for training (preTrained) policies are listed** in the [`README.md` in PPO_preTrained directory](https://github.com/nikhilbarhate99/PPO-PyTorch/tree/master/PPO_preTrained)

#### Note :

  - if the environment runs on CPU, use CPU as device for faster training. Box-2d and Roboschool run on CPU and training them on GPU device will be significantly slower because the data will be moved between CPU and GPU often

## Citing 

Please use this bibtex if you want to cite this repository in your publications :

    @misc{pytorch_minimal_ppo,

        author = {Barhate, Nikhil},

        title = {Minimal PyTorch Implementation of Proximal Policy Optimization},

        year = {2021},

        publisher = {GitHub},

        journal = {GitHub repository},

        howpublished = {\url{https://github.com/nikhilbarhate99/PPO-PyTorch}},

    }

## Results

| PPO Continuous RoboschoolHalfCheetah-v1  | PPO Continuous RoboschoolHalfCheetah-v1 |

| :-------------------------:|:-------------------------: |

| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/RoboschoolHalfCheetah-v1/PPO_RoboschoolHalfCheetah-v1_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/RoboschoolHalfCheetah-v1/PPO_RoboschoolHalfCheetah-v1_fig_0.png) |

| PPO Continuous RoboschoolHopper-v1  | PPO Continuous RoboschoolHopper-v1 |

| :-------------------------:|:-------------------------: |

| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/RoboschoolHopper-v1/PPO_RoboschoolHopper-v1_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/RoboschoolHopper-v1/PPO_RoboschoolHopper-v1_fig_0.png) |

| PPO Continuous RoboschoolWalker2d-v1  | PPO Continuous RoboschoolWalker2d-v1 |

| :-------------------------:|:-------------------------: |

| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/RoboschoolWalker2d-v1/PPO_RoboschoolWalker2d-v1_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/RoboschoolWalker2d-v1/PPO_RoboschoolWalker2d-v1_fig_0.png) |

| PPO Continuous BipedalWalker-v2  | PPO Continuous BipedalWalker-v2 |

| :-------------------------:|:-------------------------: |

| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/BipedalWalker-v2/PPO_BipedalWalker-v2_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/BipedalWalker-v2/PPO_BipedalWalker-v2_fig_0.png) |

| PPO Discrete CartPole-v1  | PPO Discrete CartPole-v1 |

| :-------------------------:|:-------------------------: |

| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/CartPole-v1/PPO_CartPole-v1_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/CartPole-v1/PPO_CartPole-v1_fig_0.png) |

| PPO Discrete LunarLander-v2  | PPO Discrete LunarLander-v2 |

| :-------------------------:|:-------------------------: |

| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/LunarLander-v2/PPO_LunarLander-v2_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/LunarLander-v2/PPO_LunarLander-v2_fig_0.png) |

## Dependencies

Trained and Tested on:

```

Python 3

PyTorch

NumPy

gym

```

Training Environments 

```

Box-2d

Roboschool

pybullet

```

Graphs and gifs

```

pandas

matplotlib

Pillow

```

## References

- [PPO paper](https://arxiv.org/abs/1707.06347)

- [OpenAI Spinning up](https://spinningup.openai.com/en/latest/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nikhilbarhate99/PPO-PyTorch

Awesome Lists containing this project

README