Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/NVlabs/GA3C

Hybrid CPU/GPU implementation of the A3C algorithm for deep reinforcement learning.
https://github.com/NVlabs/GA3C

Last synced: about 2 months ago
JSON representation

Hybrid CPU/GPU implementation of the A3C algorithm for deep reinforcement learning.

Host: GitHub
URL: https://github.com/NVlabs/GA3C
Owner: NVlabs
License: bsd-3-clause
Created: 2016-11-19T21:22:59.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2020-02-25T00:15:09.000Z (over 4 years ago)
Last Synced: 2024-06-16T00:37:29.083Z (3 months ago)
Language: Python
Size: 26.4 KB
Stars: 647
Watchers: 72
Forks: 196
Open Issues: 22
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # GA3C: Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

A hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. This CPU/GPU implementation, based on TensorFlow, achieves a significant speed up compared to a similar CPU implementation.

## How do I get set up? ###

* Install [Python > 3.0](https://www.python.org/)

* Install [TensorFlow 1.0](https://www.tensorflow.org/install/install_linux) 

* Install [OpenAI Gym](https://github.com/openai/gym)

* Clone the repo.

* That's it folks!

## How to Train a model from scratch? ###

Run `sh _clean.sh` first, and then `sh _train.sh`.

The script `_clean.sh` cleans the checkpoints folder, which contains the network models saved during the training process, as well as removing `results.txt`, which is a log of the scores achieved during training.

> Remember to save your trained models and scores in a different folder if needed before cleaning.

`_train.sh` launches the training procedure, following the parameters in `Config.py`.

You can modify the training parameters directly in `Config.py`, or pass them as argument to `_train.sh`.

E.g., launching `sh _train.sh LEARNING_RATE_START=0.001` overwrites the starting value of the learning rate in `Config.py` with the one passed as argument (see below).

You may want to modify `_train.sh` for your particular needs. 

The output should look like below:

...  

[Time:       33] [Episode:       26 Score:   -19.0000] [RScore:   -20.5000 RPPS:   822] [PPS:   823 TPS:   183] [NT:  2 NP:  2 NA: 32]  

[Time:       33] [Episode:       27 Score:   -20.0000] [RScore:   -20.4815 RPPS:   855] [PPS:   856 TPS:   183] [NT:  2 NP:  2 NA: 32]  

[Time:       35] [Episode:       28 Score:   -20.0000] [RScore:   -20.4643 RPPS:   854] [PPS:   855 TPS:   185] [NT:  2 NP:  2 NA: 32]  

[Time:       35] [Episode:       29 Score:   -19.0000] [RScore:   -20.4138 RPPS:   877] [PPS:   878 TPS:   185] [NT:  2 NP:  2 NA: 32]  

[Time:       36] [Episode:       30 Score:   -20.0000] [RScore:   -20.4000 RPPS:   899] [PPS:   900 TPS:   186] [NT:  2 NP:  2 NA: 32]  

...  

**PPS** (predictions per second) demonstrates the speed of processing frames, while **Score** shows the achieved score.  

**RPPS** and **RScore** are the rolling average of the above values.

To stop the training procedure, adjuts `EPISODES` in `Config.py` propoerly, or simply use ctrl + c.

## How to continue training a model? ###

If you want to continue training a model, set `LOAD_CHECKPOINTS=True` in `Config.py`, and set `LOAD_EPISODE` to the episode number you want to load.

Be sure that the corresponding model has been saved in the checkpoints folder (the model name includes the number of the episode).

> Be sure not to use `_clean.sh` if you want to stop and then continue training! 

## How to play a game with a trained agent? ###

Run `_play.sh`

You may want to modify this script for your particular needs.

## How to change the game, configurations, etc.? ###

All the configurations are in `Config.py`  

As mentioned before, one useful way of modifying a config is to pass it as an argument to `_train.sh`. For example, to save the models while training, just run: `train.sh TRAINERS=4`.

## Sample learning curves

Typical learning curves for Pong and Boxing are shown here. These are easily obtained from the results.txt file.

![Convergence Curves](http://mb2.web.engr.illinois.edu/images/pong_boxing.png)

### References ###

If you use this code, please refer to our [ICLR 2017 paper](https://openreview.net/forum?id=r1VGvBcxl):

```

@conference{babaeizadeh2017ga3c,

  title={Reinforcement Learning thorugh Asynchronous Advantage Actor-Critic on a GPU},

  author={Babaeizadeh, Mohammad and Frosio, Iuri and Tyree, Stephen and Clemons, Jason and Kautz, Jan},

  booktitle={ICLR},

  biurl={https://openreview.net/forum?id=r1VGvBcxl},

  year={2017}

}

```

This work was first presented in an oral talk at the [The 1st International Workshop on Efficient Methods for Deep Neural Networks](http://allenai.org/plato/emdnn/papers.html), NIPS Workshop, Barcelona (Spain), Dec. 9, 2016:

```

@article{babaeizadeh2016ga3c,

  title={{GA3C:} {GPU}-based {A3C} for Deep Reinforcement Learning},

  author={Babaeizadeh, Mohammad and Frosio, Iuri and Tyree, Stephen and Clemons, Jason and Kautz, Jan},

  journal={NIPS Workshop},

  biurl={arXiv preprint arXiv:1611.06256},

  year={2016}

}

```