https://github.com/miyosuda/async_deep_reinforce

Asynchronous Methods for Deep Reinforcement Learning
https://github.com/miyosuda/async_deep_reinforce

a3c deep-learning reinforcement-learning tensorflow

Last synced: 2 months ago
JSON representation

Asynchronous Methods for Deep Reinforcement Learning

Host: GitHub
URL: https://github.com/miyosuda/async_deep_reinforce
Owner: miyosuda
License: apache-2.0
Created: 2016-04-10T16:10:40.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2018-08-09T09:29:30.000Z (almost 7 years ago)
Last Synced: 2024-08-03T23:03:32.554Z (11 months ago)
Topics: a3c, deep-learning, reinforcement-learning, tensorflow
Language: Python
Homepage:
Size: 299 KB
Stars: 590
Watchers: 49
Forks: 194
Open Issues: 36
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

Github-Repositories - Asynchronous Methods for Deep Reinforcement Learning

README

        # async_deep_reinforce

Asynchronous deep reinforcement learning

## About

An attempt to repdroduce Google Deep Mind's paper "Asynchronous Methods for Deep Reinforcement Learning."

http://arxiv.org/abs/1602.01783

Asynchronous Advantage Actor-Critic (A3C) method for playing "Atari Pong" is implemented with TensorFlow.

Both A3C-FF and A3C-LSTM are implemented.

Learning result movment after 26 hours (A3C-FF) is like this.

[![Learning result after 26 hour](http://narr.jp/private/miyoshi/deep_learning/a3c_preview_image.jpg)](https://youtu.be/ZU71YdAedZs)

Any advice or suggestion is strongly welcomed in issues thread.

https://github.com/miyosuda/async_deep_reinforce/issues/1

## How to build

First we need to build multi thread ready version of Arcade Learning Enviroment.

I made some modification to it to run it on multi thread enviroment.

    $ git clone https://github.com/miyosuda/Arcade-Learning-Environment.git

    $ cd Arcade-Learning-Environment

    $ cmake -DUSE_SDL=ON -DUSE_RLGLUE=OFF -DBUILD_EXAMPLES=OFF .

    $ make -j 4

	

    $ pip install .

I recommend to install it on VirtualEnv environment.

## How to run

To train,

    $python a3c.py

To display the result with game play,

    $python a3c_disp.py

## Using GPU

To enable gpu, change "USE_GPU" flag in "constants.py".

When running with 8 parallel game environemts, speeds of GPU (GTX980Ti) and CPU(Core i7 6700) were like this. (Recorded with LOCAL_T_MAX=20 setting.)

|type | A3C-FF             |A3C-LSTM          |

|-----|--------------------|------------------|

| GPU | 1722 steps per sec |864 steps per sec |

| CPU | 1077 steps per sec |540 steps per sec |

## Result

Score plots of local threads of pong were like these. (with GTX980Ti)

### A3C-LSTM LOCAL_T_MAX = 5

![A3C-LSTM T=5](./docs/graph_t5.png)

### A3C-LSTM LOCAL_T_MAX = 20

![A3C-LSTM T=20](./docs/graph_t20.png)

Scores are not averaged using global network unlike the original paper.

## Requirements

- TensorFlow r1.0

- numpy

- cv2

- matplotlib

## References

This project uses setting written in muupan's wiki [muuupan/async-rl] (https://github.com/muupan/async-rl/wiki)

## Acknowledgements

- [@aravindsrinivas](https://github.com/aravindsrinivas) for providing information for some of the hyper parameters.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/miyosuda/async_deep_reinforce

Awesome Lists containing this project

README