https://github.com/takuseno/ppo

Proximal Policy Optimization implementation with TensorFlow
https://github.com/takuseno/ppo

reinforcement-learning tensorflow

Last synced: 3 months ago
JSON representation

Proximal Policy Optimization implementation with TensorFlow

Host: GitHub
URL: https://github.com/takuseno/ppo
Owner: takuseno
License: mit
Created: 2017-10-31T06:35:10.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-10-09T12:21:24.000Z (over 6 years ago)
Last Synced: 2025-04-12T07:12:41.244Z (3 months ago)
Topics: reinforcement-learning, tensorflow
Language: Python
Homepage:
Size: 60.5 KB
Stars: 106
Watchers: 5
Forks: 22
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## PPO
Proximal Policy Optimization implementation with Tensorflow.

https://arxiv.org/pdf/1707.06347.pdf

This repository has been much updated from commit id `a4fbd383f0f89ce2d881a8b78d6b8a03294e5c7c` .
New PPO requires a new dependency, [rlsaber](https://github.com/imai-laboratory/rlsaber) which is my utility repository that can be shared across different algorithms.

Some of my design follow [OpenAI baselines](https://github.com/openai/baselines).
But, I used as many default tensorflow packages as possible unlike baselines, that makes my codes easier to be read.

In addition, my PPO automatically switches between continuous action-space and discrete action-space depending on environments.
If you want to change hyper parameters, check `atari_constants.py` or `box_constants.py`, which will be loaded depending on environments too.

## requirements
- Python3

## dependencies
- tensorflow
- gym[atari]
- opencv-python
- git+https://github.com/imai-laboratory/rlsaber

## usage
### training
```
$ python train.py [--env env-id] [--render] [--logdir log-name]
```
example
```
$ python train.py --env BreakoutNoFrameskip-v4 --logdir breakout
```

### playing
```
$ python train.py --demo --load results/path-to-model [--env env-id] [--render]
```
example
```
$ python train.py --demo --load results/breakout/model.ckpt-xxxx --env BreakoutNoFrameskip-v4 --render
```

### performance examples
#### Pendulumn-v0
![image](https://user-images.githubusercontent.com/5235131/46388030-e4f72980-c704-11e8-9d76-1790dcb88067.png)

#### BreakoutNoFrameskip-v4
![image](https://user-images.githubusercontent.com/5235131/46402330-6321f300-c73a-11e8-9b46-46959bce4c3d.png)

### implementation
This is inspired by following projects.

- [DQN](https://github.com/imai-laboratory/dqn)
- [OpenAI Baselines](https://github.com/openai/baselines)

## License
This repository is MIT-licensed.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/takuseno/ppo

Awesome Lists containing this project

README