Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/NVlabs/GA3C
Hybrid CPU/GPU implementation of the A3C algorithm for deep reinforcement learning.
https://github.com/NVlabs/GA3C
Last synced: about 2 months ago
JSON representation
Hybrid CPU/GPU implementation of the A3C algorithm for deep reinforcement learning.
- Host: GitHub
- URL: https://github.com/NVlabs/GA3C
- Owner: NVlabs
- License: bsd-3-clause
- Created: 2016-11-19T21:22:59.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2020-02-25T00:15:09.000Z (over 4 years ago)
- Last Synced: 2024-06-16T00:37:29.083Z (3 months ago)
- Language: Python
- Size: 26.4 KB
- Stars: 647
- Watchers: 72
- Forks: 196
- Open Issues: 22
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GA3C: Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU
A hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. This CPU/GPU implementation, based on TensorFlow, achieves a significant speed up compared to a similar CPU implementation.
## How do I get set up? ###
* Install [Python > 3.0](https://www.python.org/)
* Install [TensorFlow 1.0](https://www.tensorflow.org/install/install_linux)
* Install [OpenAI Gym](https://github.com/openai/gym)
* Clone the repo.
* That's it folks!## How to Train a model from scratch? ###
Run `sh _clean.sh` first, and then `sh _train.sh`.
The script `_clean.sh` cleans the checkpoints folder, which contains the network models saved during the training process, as well as removing `results.txt`, which is a log of the scores achieved during training.> Remember to save your trained models and scores in a different folder if needed before cleaning.
`_train.sh` launches the training procedure, following the parameters in `Config.py`.
You can modify the training parameters directly in `Config.py`, or pass them as argument to `_train.sh`.
E.g., launching `sh _train.sh LEARNING_RATE_START=0.001` overwrites the starting value of the learning rate in `Config.py` with the one passed as argument (see below).
You may want to modify `_train.sh` for your particular needs.The output should look like below:
...
[Time: 33] [Episode: 26 Score: -19.0000] [RScore: -20.5000 RPPS: 822] [PPS: 823 TPS: 183] [NT: 2 NP: 2 NA: 32]
[Time: 33] [Episode: 27 Score: -20.0000] [RScore: -20.4815 RPPS: 855] [PPS: 856 TPS: 183] [NT: 2 NP: 2 NA: 32]
[Time: 35] [Episode: 28 Score: -20.0000] [RScore: -20.4643 RPPS: 854] [PPS: 855 TPS: 185] [NT: 2 NP: 2 NA: 32]
[Time: 35] [Episode: 29 Score: -19.0000] [RScore: -20.4138 RPPS: 877] [PPS: 878 TPS: 185] [NT: 2 NP: 2 NA: 32]
[Time: 36] [Episode: 30 Score: -20.0000] [RScore: -20.4000 RPPS: 899] [PPS: 900 TPS: 186] [NT: 2 NP: 2 NA: 32]
...**PPS** (predictions per second) demonstrates the speed of processing frames, while **Score** shows the achieved score.
**RPPS** and **RScore** are the rolling average of the above values.To stop the training procedure, adjuts `EPISODES` in `Config.py` propoerly, or simply use ctrl + c.
## How to continue training a model? ###
If you want to continue training a model, set `LOAD_CHECKPOINTS=True` in `Config.py`, and set `LOAD_EPISODE` to the episode number you want to load.
Be sure that the corresponding model has been saved in the checkpoints folder (the model name includes the number of the episode).> Be sure not to use `_clean.sh` if you want to stop and then continue training!
## How to play a game with a trained agent? ###
Run `_play.sh`
You may want to modify this script for your particular needs.## How to change the game, configurations, etc.? ###
All the configurations are in `Config.py`
As mentioned before, one useful way of modifying a config is to pass it as an argument to `_train.sh`. For example, to save the models while training, just run: `train.sh TRAINERS=4`.## Sample learning curves
Typical learning curves for Pong and Boxing are shown here. These are easily obtained from the results.txt file.
![Convergence Curves](http://mb2.web.engr.illinois.edu/images/pong_boxing.png)### References ###
If you use this code, please refer to our [ICLR 2017 paper](https://openreview.net/forum?id=r1VGvBcxl):
```
@conference{babaeizadeh2017ga3c,
title={Reinforcement Learning thorugh Asynchronous Advantage Actor-Critic on a GPU},
author={Babaeizadeh, Mohammad and Frosio, Iuri and Tyree, Stephen and Clemons, Jason and Kautz, Jan},
booktitle={ICLR},
biurl={https://openreview.net/forum?id=r1VGvBcxl},
year={2017}
}
```
This work was first presented in an oral talk at the [The 1st International Workshop on Efficient Methods for Deep Neural Networks](http://allenai.org/plato/emdnn/papers.html), NIPS Workshop, Barcelona (Spain), Dec. 9, 2016:```
@article{babaeizadeh2016ga3c,
title={{GA3C:} {GPU}-based {A3C} for Deep Reinforcement Learning},
author={Babaeizadeh, Mohammad and Frosio, Iuri and Tyree, Stephen and Clemons, Jason and Kautz, Jan},
journal={NIPS Workshop},
biurl={arXiv preprint arXiv:1611.06256},
year={2016}
}
```