https://github.com/choru-k/reinforcement-learning-pytorch-cartpole

Simple Cartpole example writed with pytorch.
https://github.com/choru-k/reinforcement-learning-pytorch-cartpole

cartpole deep-reinforcement-learning pytorch pytorch-cartpole reinforcement-learning

Last synced: 6 months ago
JSON representation

Simple Cartpole example writed with pytorch.

Host: GitHub
URL: https://github.com/choru-k/reinforcement-learning-pytorch-cartpole
Owner: choru-k
License: mit
Created: 2018-11-19T08:58:25.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2019-10-29T02:51:02.000Z (almost 6 years ago)
Last Synced: 2025-04-12T21:12:03.337Z (6 months ago)
Topics: cartpole, deep-reinforcement-learning, pytorch, pytorch-cartpole, reinforcement-learning
Language: Python
Homepage:
Size: 288 KB
Stars: 167
Watchers: 8
Forks: 23
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # PyTorch CartPole Example

Simple Cartpole example writed with pytorch.

## Why Cartpole?

Cartpole is very easy problem and is converged very fast in many case.

So you can run this example in your computer(maybe it take just only 1~2 minitue).

## Rainbow

- [x] DQN [[1]](#reference)

- [x] Double [[2]](#reference)

- [x] Duel [[3]](#reference)

- [x] Multi-step [[4]](#reference)

- [x] PER(Prioritized Experience Replay) [[5]](#reference)

- [x] Nosiy-Net [[6]](#reference)

- [x] Distributional(C51) [[7]](#reference)

- [x] Rainbow [[8]](#reference)

## PG(Policy Gradient)

- [x] REINFORCE [[9]](#reference)

- [x] Actor Critic [[10]](#reference)

- [x] Advantage Actor Critic

- [x] GAE(Generalized Advantage Estimation) [[12]](#reference)

- [x] TNPG [[20]](#reference)

- [x] TRPO [[13]](#reference)

- [x] PPO - Single Version [[14]](#reference)

## Parallel

- [x] Asynchronous Q-learning [[11]](#reference)

- [x] A3C (Asynchronous Advantage Actor Critic) [[11]](#reference)

- [x] ACER [[21]](#reference)

- [ ] PPO [[14]](#reference)

- [x] APE-X DQN [[15]](#reference)

- [ ] IMPALA [[23]](#reference)

- [ ] R2D2 [[16]](#reference)

## Distributional DQN

- [x] QRDQN [[18]](#reference)

- [x] IQN [[19]](#reference)

## Exploration

- [ ] ICM [[22]](#refercence)

- [ ] RND [[17]](#reference)

## POMDP (With RNN)

- [x] DQN (use state stack)

- [x] DRQN [[24]](#reference) [[25]](#reference)

- [x] DRQN (use state stack)

- [x] DRQN (store Rnn State) [[16]](#reference)

- [x] R2D2 - Single Version [[16]](#reference)

## Reference

[1][Playing Atari with Deep Reinforcement Learning](http://arxiv.org/abs/1312.5602)  

[2][Deep Reinforcement Learning with Double Q-learning](http://arxiv.org/abs/1509.06461)  

[3][Dueling Network Architectures for Deep Reinforcement Learning](http://arxiv.org/abs/1511.06581)  

[4][Reinforcement Learning: An Introduction](http://www.incompleteideas.net/sutton/book/ebook/the-book.html)  

[5][Prioritized Experience Replay](http://arxiv.org/abs/1511.05952)  

[6][Noisy Networks for Exploration](https://arxiv.org/abs/1706.10295)  

[7][A Distributional Perspective on Reinforcement Learning](https://arxiv.org/abs/1707.06887)  

[8][Rainbow: Combining Improvements in Deep Reinforcement Learning](https://arxiv.org/abs/1710.02298)  

[9][Policy Gradient Methods for Reinforcement Learning with Function Approximation ](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)  

[10][Actor-Critic Algorithms](https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf)  

[11][Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1602.01783.pdf)  

[12][HIGH-DIMENSIONAL CONTINUOUS CONTROL USING GENERALIZED ADVANTAGE ESTIMATION](https://arxiv.org/pdf/1506.02438.pdf)  

[13][Trust Region Policy Optimization](https://arxiv.org/pdf/1502.05477.pdf)  

[14][Proximal Policy Optimization](https://arxiv.org/pdf/1707.06347.pdf)  

[15][DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY](https://arxiv.org/pdf/1803.00933.pdf)  

[16][RECURRENT EXPERIENCE REPLAY IN DISTRIBUTED REINFORCEMENT LEARNING](https://openreview.net/pdf?id=r1lyTjAqYX)  

[17][EXPLORATION BY RANDOM NETWORK DISTILLATION](https://openreview.net/pdf?id=H1lJJnR5Ym)  

[18][Distributional Reinforcement Learning with Quantile Regression](https://arxiv.org/pdf/1710.10044.pdf)  

[19][Implicit Quantile Networks for Distributional Reinforcement Learning](https://arxiv.org/pdf/1806.06923.pdf)  

[20][A Natural Policy Gradient](https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf)  

[21][SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY](https://arxiv.org/pdf/1611.01224.pdf)  

[22][Curiosity-driven Exploration by Self-supervised Prediction](https://arxiv.org/pdf/1705.05363.pdf)  

[23][IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures](https://arxiv.org/pdf/1802.01561.pdf)  

[24][Deep Recurrent Q-Learning for Partially Observable MDPs](https://arxiv.org/pdf/1507.06527.pdf)  

[25][Playing FPS Games with Deep Reinforcement Learning](https://arxiv.org/pdf/1609.05521.pdf)  

## Acknowledgements

- https://github.com/openai/baselines

- https://github.com/reinforcement-learning-kr/pg_travel

- https://github.com/reinforcement-learning-kr/distributional_rl

- https://github.com/Kaixhin/Rainbow

- https://github.com/Kaixhin/ACER

- https://github.com/higgsfield/RL-Adventure-2

## Use Cuda

check this issue. https://github.com/g6ling/Reinforcement-Learning-Pytorch-Cartpole/issues/1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/choru-k/reinforcement-learning-pytorch-cartpole

Awesome Lists containing this project

README