https://github.com/takuseno/ppo
Proximal Policy Optimization implementation with TensorFlow
https://github.com/takuseno/ppo
reinforcement-learning tensorflow
Last synced: 7 months ago
JSON representation
Proximal Policy Optimization implementation with TensorFlow
- Host: GitHub
- URL: https://github.com/takuseno/ppo
- Owner: takuseno
- License: mit
- Created: 2017-10-31T06:35:10.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-10-09T12:21:24.000Z (about 7 years ago)
- Last Synced: 2025-04-12T07:12:41.244Z (7 months ago)
- Topics: reinforcement-learning, tensorflow
- Language: Python
- Homepage:
- Size: 60.5 KB
- Stars: 106
- Watchers: 5
- Forks: 22
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## PPO
Proximal Policy Optimization implementation with Tensorflow.
https://arxiv.org/pdf/1707.06347.pdf
This repository has been much updated from commit id `a4fbd383f0f89ce2d881a8b78d6b8a03294e5c7c` .
New PPO requires a new dependency, [rlsaber](https://github.com/imai-laboratory/rlsaber) which is my utility repository that can be shared across different algorithms.
Some of my design follow [OpenAI baselines](https://github.com/openai/baselines).
But, I used as many default tensorflow packages as possible unlike baselines, that makes my codes easier to be read.
In addition, my PPO automatically switches between continuous action-space and discrete action-space depending on environments.
If you want to change hyper parameters, check `atari_constants.py` or `box_constants.py`, which will be loaded depending on environments too.
## requirements
- Python3
## dependencies
- tensorflow
- gym[atari]
- opencv-python
- git+https://github.com/imai-laboratory/rlsaber
## usage
### training
```
$ python train.py [--env env-id] [--render] [--logdir log-name]
```
example
```
$ python train.py --env BreakoutNoFrameskip-v4 --logdir breakout
```
### playing
```
$ python train.py --demo --load results/path-to-model [--env env-id] [--render]
```
example
```
$ python train.py --demo --load results/breakout/model.ckpt-xxxx --env BreakoutNoFrameskip-v4 --render
```
### performance examples
#### Pendulumn-v0

#### BreakoutNoFrameskip-v4

### implementation
This is inspired by following projects.
- [DQN](https://github.com/imai-laboratory/dqn)
- [OpenAI Baselines](https://github.com/openai/baselines)
## License
This repository is MIT-licensed.