{"id":13678662,"url":"https://github.com/nikhilbarhate99/PPO-PyTorch","last_synced_at":"2025-04-29T15:32:35.834Z","repository":{"id":41164329,"uuid":"150605839","full_name":"nikhilbarhate99/PPO-PyTorch","owner":"nikhilbarhate99","description":"Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch","archived":false,"fork":false,"pushed_at":"2024-07-09T21:11:04.000Z","size":12673,"stargazers_count":1712,"open_issues_count":16,"forks_count":351,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-11-11T16:24:48.463Z","etag":null,"topics":["deep-learning","deep-reinforcement-learning","policy-gradient","ppo","ppo-pytorch","proximal-policy-optimization","pytorch","pytorch-implmention","pytorch-tutorial","reinforcement-learning","reinforcement-learning-algorithms"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nikhilbarhate99.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-27T15:07:12.000Z","updated_at":"2024-11-11T08:27:38.000Z","dependencies_parsed_at":"2023-02-06T09:32:06.992Z","dependency_job_id":"fab4b25c-b2ef-4595-bd94-e78cc8a3e403","html_url":"https://github.com/nikhilbarhate99/PPO-PyTorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikhilbarhate99%2FPPO-PyTorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikhilbarhate99%2FPPO-PyTorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikhilbarhate99%2FPPO-PyTorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikhilbarhate99%2FPPO-PyTorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nikhilbarhate99","download_url":"https://codeload.github.com/nikhilbarhate99/PPO-PyTorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224179014,"owners_count":17268989,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-reinforcement-learning","policy-gradient","ppo","ppo-pytorch","proximal-policy-optimization","pytorch","pytorch-implmention","pytorch-tutorial","reinforcement-learning","reinforcement-learning-algorithms"],"created_at":"2024-08-02T13:00:56.755Z","updated_at":"2024-11-11T21:31:04.737Z","avatar_url":"https://github.com/nikhilbarhate99.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# PPO-PyTorch\n\n### UPDATE [April 2021] : \n\n- merged discrete and continuous algorithms\n- added linear decaying for the continuous action space `action_std`; to make training more stable for complex environments\n- added different learning rates for actor and critic\n- episodes, timesteps and rewards are now logged in `.csv` files\n- utils to plot graphs from log files\n- utils to test and make gifs from preTrained networks\n- `PPO_colab.ipynb` combining all the files to train / test / plot graphs / make gifs on google colab in a convenient jupyter-notebook\n\n#### [Open `PPO_colab.ipynb` in Google Colab](https://colab.research.google.com/github/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb)\n\n\n## Introduction\n\nThis repository provides a Minimal PyTorch implementation of Proximal Policy Optimization (PPO) with clipped objective for OpenAI gym environments. It is primarily intended for beginners in [Reinforcement Learning](https://en.wikipedia.org/wiki/Reinforcement_learning) for understanding the PPO algorithm. It can still be used for complex environments but may require some hyperparameter-tuning or changes in the code. A concise explaination of PPO algorithm can be found [here](https://stackoverflow.com/questions/46422845/what-is-the-way-to-understand-proximal-policy-optimization-algorithm-in-rl) and a thorough explaination of all the details for implementing best performing PPO can be found [here](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/) (All are not implemented in this repo yet). \n\n\nTo keep the training procedure simple : \n  - It has a **constant standard deviation** for the output action distribution (**multivariate normal with diagonal covariance matrix**) for the continuous environments, i.e. it is a hyperparameter and NOT a trainable parameter. However, it is **linearly decayed**. (action_std significantly affects performance)\n  - It uses simple **monte-carlo estimate** for calculating advantages and NOT Generalized Advantage Estimate (check out the OpenAI spinning up implementation for that).\n  - It is a **single threaded implementation**, i.e. only one worker collects experience. [One of the older forks](https://github.com/rhklite/Parallel-PPO-PyTorch) of this repository has been modified to have Parallel workers\n\n## Usage\n\n- To train a new network : run `train.py`\n- To test a preTrained network : run `test.py`\n- To plot graphs using log files : run `plot_graph.py`\n- To save images for gif and make gif using a preTrained network : run `make_gif.py`\n- All parameters and hyperparamters to control training / testing / graphs / gifs are in their respective `.py` file\n- `PPO_colab.ipynb` combines all the files in a jupyter-notebook\n- All the **hyperparameters used for training (preTrained) policies are listed** in the [`README.md` in PPO_preTrained directory](https://github.com/nikhilbarhate99/PPO-PyTorch/tree/master/PPO_preTrained)\n\n#### Note :\n  - if the environment runs on CPU, use CPU as device for faster training. Box-2d and Roboschool run on CPU and training them on GPU device will be significantly slower because the data will be moved between CPU and GPU often\n\n## Citing \n\nPlease use this bibtex if you want to cite this repository in your publications :\n\n    @misc{pytorch_minimal_ppo,\n        author = {Barhate, Nikhil},\n        title = {Minimal PyTorch Implementation of Proximal Policy Optimization},\n        year = {2021},\n        publisher = {GitHub},\n        journal = {GitHub repository},\n        howpublished = {\\url{https://github.com/nikhilbarhate99/PPO-PyTorch}},\n    }\n\n## Results\n\n| PPO Continuous RoboschoolHalfCheetah-v1  | PPO Continuous RoboschoolHalfCheetah-v1 |\n| :-------------------------:|:-------------------------: |\n| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/RoboschoolHalfCheetah-v1/PPO_RoboschoolHalfCheetah-v1_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/RoboschoolHalfCheetah-v1/PPO_RoboschoolHalfCheetah-v1_fig_0.png) |\n\n\n| PPO Continuous RoboschoolHopper-v1  | PPO Continuous RoboschoolHopper-v1 |\n| :-------------------------:|:-------------------------: |\n| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/RoboschoolHopper-v1/PPO_RoboschoolHopper-v1_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/RoboschoolHopper-v1/PPO_RoboschoolHopper-v1_fig_0.png) |\n\n\n| PPO Continuous RoboschoolWalker2d-v1  | PPO Continuous RoboschoolWalker2d-v1 |\n| :-------------------------:|:-------------------------: |\n| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/RoboschoolWalker2d-v1/PPO_RoboschoolWalker2d-v1_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/RoboschoolWalker2d-v1/PPO_RoboschoolWalker2d-v1_fig_0.png) |\n\n\n| PPO Continuous BipedalWalker-v2  | PPO Continuous BipedalWalker-v2 |\n| :-------------------------:|:-------------------------: |\n| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/BipedalWalker-v2/PPO_BipedalWalker-v2_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/BipedalWalker-v2/PPO_BipedalWalker-v2_fig_0.png) |\n\n\n| PPO Discrete CartPole-v1  | PPO Discrete CartPole-v1 |\n| :-------------------------:|:-------------------------: |\n| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/CartPole-v1/PPO_CartPole-v1_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/CartPole-v1/PPO_CartPole-v1_fig_0.png) |\n\n\n| PPO Discrete LunarLander-v2  | PPO Discrete LunarLander-v2 |\n| :-------------------------:|:-------------------------: |\n| ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_gifs/LunarLander-v2/PPO_LunarLander-v2_gif_0.gif) |  ![](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_figs/LunarLander-v2/PPO_LunarLander-v2_fig_0.png) |\n\n\n## Dependencies\nTrained and Tested on:\n```\nPython 3\nPyTorch\nNumPy\ngym\n```\nTraining Environments \n```\nBox-2d\nRoboschool\npybullet\n```\nGraphs and gifs\n```\npandas\nmatplotlib\nPillow\n```\n\n\n## References\n\n- [PPO paper](https://arxiv.org/abs/1707.06347)\n- [OpenAI Spinning up](https://spinningup.openai.com/en/latest/)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikhilbarhate99%2FPPO-PyTorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnikhilbarhate99%2FPPO-PyTorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikhilbarhate99%2FPPO-PyTorch/lists"}