Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ikostrikov/pytorch-a3c
PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
https://github.com/ikostrikov/pytorch-a3c
a3c actor-critic asynch asynchronous-advantage-actor-critic asynchronous-methods deep-learning deep-reinforcement-learning python pytorch pytorch-a3c reinforcement-learning
Last synced: 2 days ago
JSON representation
PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
- Host: GitHub
- URL: https://github.com/ikostrikov/pytorch-a3c
- Owner: ikostrikov
- License: mit
- Created: 2017-02-13T03:57:55.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2019-09-25T18:08:56.000Z (about 5 years ago)
- Last Synced: 2023-11-07T12:14:55.693Z (about 1 year ago)
- Topics: a3c, actor-critic, asynch, asynchronous-advantage-actor-critic, asynchronous-methods, deep-learning, deep-reinforcement-learning, python, pytorch, pytorch-a3c, reinforcement-learning
- Language: Python
- Homepage:
- Size: 205 KB
- Stars: 1,138
- Watchers: 44
- Forks: 281
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# pytorch-a3c
This is a PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from ["Asynchronous Methods for Deep Reinforcement Learning"](https://arxiv.org/pdf/1602.01783v1.pdf).
This implementation is inspired by [Universe Starter Agent](https://github.com/openai/universe-starter-agent).
In contrast to the starter agent, it uses an optimizer with shared statistics as in the original paper.Please use this bibtex if you want to cite this repository in your publications:
@misc{pytorchaaac,
author = {Kostrikov, Ilya},
title = {PyTorch Implementations of Asynchronous Advantage Actor Critic},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ikostrikov/pytorch-a3c}},
}## A2C
I **highly recommend** to check a sychronous version and other algorithms: [pytorch-a2c-ppo-acktr](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr).
In my experience, A2C works better than A3C and ACKTR is better than both of them. Moreover, PPO is a great algorithm for continuous control. Thus, I recommend to try A2C/PPO/ACKTR first and use A3C only if you need it specifically for some reasons.
Also read [OpenAI blog](https://blog.openai.com/baselines-acktr-a2c/) for more information.
## Contributions
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.
## Usage
```bash
# Works only wih Python 3.
python3 main.py --env-name "PongDeterministic-v4" --num-processes 16
```This code runs evaluation in a separate thread in addition to 16 processes.
## Results
With 16 processes it converges for PongDeterministic-v4 in 15 minutes.
![PongDeterministic-v4](images/PongReward.png)For BreakoutDeterministic-v4 it takes more than several hours.