https://github.com/choru-k/reinforcement-learning-pytorch-cartpole
Simple Cartpole example writed with pytorch.
https://github.com/choru-k/reinforcement-learning-pytorch-cartpole
cartpole deep-reinforcement-learning pytorch pytorch-cartpole reinforcement-learning
Last synced: 6 months ago
JSON representation
Simple Cartpole example writed with pytorch.
- Host: GitHub
- URL: https://github.com/choru-k/reinforcement-learning-pytorch-cartpole
- Owner: choru-k
- License: mit
- Created: 2018-11-19T08:58:25.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-10-29T02:51:02.000Z (almost 6 years ago)
- Last Synced: 2025-04-12T21:12:03.337Z (6 months ago)
- Topics: cartpole, deep-reinforcement-learning, pytorch, pytorch-cartpole, reinforcement-learning
- Language: Python
- Homepage:
- Size: 288 KB
- Stars: 167
- Watchers: 8
- Forks: 23
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PyTorch CartPole Example
Simple Cartpole example writed with pytorch.## Why Cartpole?
Cartpole is very easy problem and is converged very fast in many case.
So you can run this example in your computer(maybe it take just only 1~2 minitue).## Rainbow
- [x] DQN [[1]](#reference)
- [x] Double [[2]](#reference)
- [x] Duel [[3]](#reference)
- [x] Multi-step [[4]](#reference)
- [x] PER(Prioritized Experience Replay) [[5]](#reference)
- [x] Nosiy-Net [[6]](#reference)
- [x] Distributional(C51) [[7]](#reference)
- [x] Rainbow [[8]](#reference)## PG(Policy Gradient)
- [x] REINFORCE [[9]](#reference)
- [x] Actor Critic [[10]](#reference)
- [x] Advantage Actor Critic
- [x] GAE(Generalized Advantage Estimation) [[12]](#reference)
- [x] TNPG [[20]](#reference)
- [x] TRPO [[13]](#reference)
- [x] PPO - Single Version [[14]](#reference)## Parallel
- [x] Asynchronous Q-learning [[11]](#reference)
- [x] A3C (Asynchronous Advantage Actor Critic) [[11]](#reference)
- [x] ACER [[21]](#reference)
- [ ] PPO [[14]](#reference)
- [x] APE-X DQN [[15]](#reference)
- [ ] IMPALA [[23]](#reference)
- [ ] R2D2 [[16]](#reference)## Distributional DQN
- [x] QRDQN [[18]](#reference)
- [x] IQN [[19]](#reference)## Exploration
- [ ] ICM [[22]](#refercence)
- [ ] RND [[17]](#reference)## POMDP (With RNN)
- [x] DQN (use state stack)
- [x] DRQN [[24]](#reference) [[25]](#reference)
- [x] DRQN (use state stack)
- [x] DRQN (store Rnn State) [[16]](#reference)
- [x] R2D2 - Single Version [[16]](#reference)## Reference
[1][Playing Atari with Deep Reinforcement Learning](http://arxiv.org/abs/1312.5602)
[2][Deep Reinforcement Learning with Double Q-learning](http://arxiv.org/abs/1509.06461)
[3][Dueling Network Architectures for Deep Reinforcement Learning](http://arxiv.org/abs/1511.06581)
[4][Reinforcement Learning: An Introduction](http://www.incompleteideas.net/sutton/book/ebook/the-book.html)
[5][Prioritized Experience Replay](http://arxiv.org/abs/1511.05952)
[6][Noisy Networks for Exploration](https://arxiv.org/abs/1706.10295)
[7][A Distributional Perspective on Reinforcement Learning](https://arxiv.org/abs/1707.06887)
[8][Rainbow: Combining Improvements in Deep Reinforcement Learning](https://arxiv.org/abs/1710.02298)
[9][Policy Gradient Methods for Reinforcement Learning with Function Approximation ](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)
[10][Actor-Critic Algorithms](https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf)
[11][Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1602.01783.pdf)
[12][HIGH-DIMENSIONAL CONTINUOUS CONTROL USING GENERALIZED ADVANTAGE ESTIMATION](https://arxiv.org/pdf/1506.02438.pdf)
[13][Trust Region Policy Optimization](https://arxiv.org/pdf/1502.05477.pdf)
[14][Proximal Policy Optimization](https://arxiv.org/pdf/1707.06347.pdf)
[15][DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY](https://arxiv.org/pdf/1803.00933.pdf)
[16][RECURRENT EXPERIENCE REPLAY IN DISTRIBUTED REINFORCEMENT LEARNING](https://openreview.net/pdf?id=r1lyTjAqYX)
[17][EXPLORATION BY RANDOM NETWORK DISTILLATION](https://openreview.net/pdf?id=H1lJJnR5Ym)
[18][Distributional Reinforcement Learning with Quantile Regression](https://arxiv.org/pdf/1710.10044.pdf)
[19][Implicit Quantile Networks for Distributional Reinforcement Learning](https://arxiv.org/pdf/1806.06923.pdf)
[20][A Natural Policy Gradient](https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf)
[21][SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY](https://arxiv.org/pdf/1611.01224.pdf)
[22][Curiosity-driven Exploration by Self-supervised Prediction](https://arxiv.org/pdf/1705.05363.pdf)
[23][IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures](https://arxiv.org/pdf/1802.01561.pdf)
[24][Deep Recurrent Q-Learning for Partially Observable MDPs](https://arxiv.org/pdf/1507.06527.pdf)
[25][Playing FPS Games with Deep Reinforcement Learning](https://arxiv.org/pdf/1609.05521.pdf)## Acknowledgements
- https://github.com/openai/baselines
- https://github.com/reinforcement-learning-kr/pg_travel
- https://github.com/reinforcement-learning-kr/distributional_rl
- https://github.com/Kaixhin/Rainbow
- https://github.com/Kaixhin/ACER
- https://github.com/higgsfield/RL-Adventure-2## Use Cuda
check this issue. https://github.com/g6ling/Reinforcement-Learning-Pytorch-Cartpole/issues/1