Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rwightman/obstacle-tower-pytorch-a2c-ppo
PPO/A2C in PyTorch for the Obstacle Tower Challenge
https://github.com/rwightman/obstacle-tower-pytorch-a2c-ppo
obstacle-tower-challenge ppo pytorch reinforcement-learning
Last synced: 11 days ago
JSON representation
PPO/A2C in PyTorch for the Obstacle Tower Challenge
- Host: GitHub
- URL: https://github.com/rwightman/obstacle-tower-pytorch-a2c-ppo
- Owner: rwightman
- License: mit
- Created: 2019-03-12T06:23:33.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-05-16T19:21:17.000Z (over 5 years ago)
- Last Synced: 2024-10-23T04:17:45.889Z (16 days ago)
- Topics: obstacle-tower-challenge, ppo, pytorch, reinforcement-learning
- Language: Python
- Homepage:
- Size: 433 KB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PyTorch A2C/PPO for Obstacle Tower Challenge
Adapted from my A2C/PPO experiments for Pommerman, this was the basis for some experiments with actor-critic PG algorithms for the Obstacle Tower Challenge (https://github.com/Unity-Technologies/obstacle-tower-challenge)
The reinforcement learning codebase is based upon Ilya Kostrikov's awesome work (https://github.com/ikostrikov/pytorch-a2c-ppo-acktr)
## Changes
In short:
* Add Noisy Networks to replace entropy regularization for exploration
* Replace GRU RNN policy with LSTMMy initial attempts with A2C and PPO went nowhere. After training for days, average floor remained at 1 with little progress made by either algo. Seeing the success of my experiments with Rainbow (https://github.com/rwightman/obstacle-towers-pytorch-rainbow), I decided to bring in Noisy Networks from (https://github.com/Kaixhin/Rainbow) to improve exploration. Additionally I replaced the GRU with LSTM.
This resulted in some progress. Floor average around 6-7 was reached, not quite as good as my Rainbow experiments. Top floor of 9-10 was hit on occasion but also a lot of failures to move past floor 2 with the same policy.
Minimal time was spent searching the hyper-param space, I'm sure much better could be achieved.
## Usage
* Setup a new Conda Python 3.6 environment (do not use 3.7! compatibility issues with Unity's support modules)
* Install recent (ver 1.x) of PyTorch
* Setup environment download engine as per: https://github.com/Unity-Technologies/obstacle-tower-challenge#local-setup-for-training but using this repo in place of that clone and do it within the same Conda env
* Run `python main.py --env-name obt --algo ppo --use-gae --recurrent-policy --num-processes 32 --num-mini-batch 8 --num-steps 120 --entropy-coef 0.0 --lr 1e-4 --clip-param 0.1` and wait...
* enjoy.py can be used to watch the trained policy in realtime