https://github.com/ikostrikov/pytorch-rl
https://github.com/ikostrikov/pytorch-rl
pytorch reinforcement-learning reinforcement-learning-algorithms
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/ikostrikov/pytorch-rl
- Owner: ikostrikov
- Created: 2017-09-08T00:00:28.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-08-28T02:06:01.000Z (about 7 years ago)
- Last Synced: 2025-04-05T22:31:44.633Z (6 months ago)
- Topics: pytorch, reinforcement-learning, reinforcement-learning-algorithms
- Size: 1.95 KB
- Stars: 56
- Watchers: 3
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# pytorch-rl
A list of references to my reimplementations of RL algorithms:
* Asynchronous Methods for Deep Reinforcement Learning (A3C) ([arxiv](https://arxiv.org/abs/1602.01783), [my code](https://github.com/ikostrikov/pytorch-a3c))
* Advantage Actor Critic (A2C) ([my code](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr))
* Proximal Policy Optimization Algorithms (PPO) ([arxiv](https://arxiv.org/abs/1707.06347), [my code](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr))
* Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR)([arxiv](https://arxiv.org/abs/1707.06347), [my code](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr))
* Trust Region Policy Optimization (TRPO) ([arxiv](https://arxiv.org/pdf/1502.05477.pdf), [my code](https://github.com/ikostrikov/pytorch-trpo))
* Continuous Deep Q-Learning with Model-based Acceleration (NAF) ([arxiv](https://arxiv.org/abs/1603.00748), [my code](https://github.com/ikostrikov/pytorch-naf))
# TODO (volunteers are welcome)
* Move TRPO to a2c-ppo-acktr code, implement it as a hessian free optimizer (as ACKTR is implemented as KFAC)