https://github.com/morvanzhou/pytorch-a3c

Simple A3C implementation with pytorch + multiprocessing
https://github.com/morvanzhou/pytorch-a3c

a3c actor-critic asynchronous-advantage-actor-critic gym multiprocessing neural-network pytorch toy-example

Last synced: 8 months ago
JSON representation

Simple A3C implementation with pytorch + multiprocessing

Host: GitHub
URL: https://github.com/morvanzhou/pytorch-a3c
Owner: MorvanZhou
License: mit
Created: 2018-01-18T10:51:29.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2023-03-10T07:28:08.000Z (over 2 years ago)
Last Synced: 2025-03-28T13:06:07.245Z (8 months ago)
Topics: a3c, actor-critic, asynchronous-advantage-actor-critic, gym, multiprocessing, neural-network, pytorch, toy-example
Language: Python
Homepage: https://mofanpy.com
Size: 139 KB
Stars: 632
Watchers: 14
Forks: 146
Open Issues: 17
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Simple implementation of Reinforcement Learning (A3C) using Pytorch

This is a toy example of using multiprocessing in Python to asynchronously train a

neural network to play discrete action [CartPole](https://gym.openai.com/envs/CartPole-v0/) and

continuous action [Pendulum](https://gym.openai.com/envs/Pendulum-v0/) games.

The asynchronous algorithm I used is called [Asynchronous Advantage Actor-Critic](https://arxiv.org/pdf/1602.01783.pdf) or A3C.

I believe it would be the simplest toy implementation you can find at the moment (2018-01).

## What are the main focuses in this implementation?

* Pytorch + multiprocessing (NOT threading) for parallel training

* Both discrete and continuous action environments

* To be simple and easy to dig into the code (less than 200 lines)

## Reason of using [Pytorch](http://pytorch.org/) instead of [Tensorflow](https://www.tensorflow.org/)

Both of them are great for building your customized neural network. But to work

with multiprocessing, Tensorflow is not that great due to its low compatibility with multiprocessing.

I have an implementation of [Tensorflow A3C build on threading](https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/10_A3C).

I even tried to implement [distributed Tensorflow](https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/10_A3C/A3C_distributed_tf.py).

However, the distributed version is for cluster computing which I don't have.

When using only one machine, it is slower than threading version I wrote.

Fortunately, Pytorch gets the [multiprocessing compatibility](http://pytorch.org/docs/master/notes/multiprocessing.html).

I went through many Pytorch A3C examples ([there](https://github.com/ikostrikov/pytorch-a3c), [there](https://github.com/jingweiz/pytorch-rl)

and [there](https://github.com/ShangtongZhang/DeepRL)). They are great but too complicated to dig into the code.

Therefore, this is my motivation to write my simple example codes.

BTW, if you are interested to learn Pytorch, [there](https://github.com/MorvanZhou/PyTorch-Tutorial)

 is my simple tutorial code with many visualizations. I also made the tensorflow tutorial (same as pytorch) available in [here](https://github.com/MorvanZhou/Tensorflow-Tutorial).

## Codes & Results

* [shared_adam.py](/shared_adam.py): optimizer that shares its parameters in parallel

* [utils.py](/utils.py): useful function that can be used more than once

* [discrete_A3C.py](/discrete_A3C.py): CartPole, neural net and training for discrete action space

* [continuous_A3C.py](/continuous_A3C.py): Pendulum, neural net and training for continuous action space

CartPole result

![cartpole](/results/cartpole.png)

Pendulum result

![pendulum](/results/pendulum.png)

## Dependencies

* pytorch >= 0.4.0

* numpy

* gym

* matplotlib

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/morvanzhou/pytorch-a3c

Awesome Lists containing this project

README