https://github.com/rwightman/obstacle-tower-pytorch-a2c-ppo

PPO/A2C in PyTorch for the Obstacle Tower Challenge
https://github.com/rwightman/obstacle-tower-pytorch-a2c-ppo

obstacle-tower-challenge ppo pytorch reinforcement-learning

Last synced: 3 months ago
JSON representation

PPO/A2C in PyTorch for the Obstacle Tower Challenge

Host: GitHub
URL: https://github.com/rwightman/obstacle-tower-pytorch-a2c-ppo
Owner: rwightman
License: mit
Created: 2019-03-12T06:23:33.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-05-16T19:21:17.000Z (about 6 years ago)
Last Synced: 2025-02-08T15:12:55.926Z (5 months ago)
Topics: obstacle-tower-challenge, ppo, pytorch, reinforcement-learning
Language: Python
Homepage:
Size: 433 KB
Stars: 2
Watchers: 3
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# PyTorch A2C/PPO for Obstacle Tower Challenge

Adapted from my A2C/PPO experiments for Pommerman, this was the basis for some experiments with actor-critic PG algorithms for the Obstacle Tower Challenge (https://github.com/Unity-Technologies/obstacle-tower-challenge)

The reinforcement learning codebase is based upon Ilya Kostrikov's awesome work (https://github.com/ikostrikov/pytorch-a2c-ppo-acktr)

## Changes

In short:
* Add Noisy Networks to replace entropy regularization for exploration
* Replace GRU RNN policy with LSTM

My initial attempts with A2C and PPO went nowhere. After training for days, average floor remained at 1 with little progress made by either algo. Seeing the success of my experiments with Rainbow (https://github.com/rwightman/obstacle-towers-pytorch-rainbow), I decided to bring in Noisy Networks from (https://github.com/Kaixhin/Rainbow) to improve exploration. Additionally I replaced the GRU with LSTM.

This resulted in some progress. Floor average around 6-7 was reached, not quite as good as my Rainbow experiments. Top floor of 9-10 was hit on occasion but also a lot of failures to move past floor 2 with the same policy.

Minimal time was spent searching the hyper-param space, I'm sure much better could be achieved.

## Usage

* Setup a new Conda Python 3.6 environment (do not use 3.7! compatibility issues with Unity's support modules)
* Install recent (ver 1.x) of PyTorch
* Setup environment download engine as per: https://github.com/Unity-Technologies/obstacle-tower-challenge#local-setup-for-training but using this repo in place of that clone and do it within the same Conda env
* Run `python main.py --env-name obt --algo ppo --use-gae --recurrent-policy --num-processes 32 --num-mini-batch 8 --num-steps 120 --entropy-coef 0.0 --lr 1e-4 --clip-param 0.1` and wait...
* enjoy.py can be used to watch the trained policy in realtime

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rwightman/obstacle-tower-pytorch-a2c-ppo

Awesome Lists containing this project

README