https://github.com/dongminlee94/reinforcement-learning-code
A repository for code of reinforcement learning algorithms with PyTorch
https://github.com/dongminlee94/reinforcement-learning-code
algorithms inverse-reinforcement-learning model-free-rl pytorch pytorch-rl reinforcement-learning
Last synced: 3 months ago
JSON representation
A repository for code of reinforcement learning algorithms with PyTorch
- Host: GitHub
- URL: https://github.com/dongminlee94/reinforcement-learning-code
- Owner: dongminlee94
- License: mit
- Created: 2018-12-15T19:44:08.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2021-09-20T04:17:59.000Z (about 4 years ago)
- Last Synced: 2025-05-01T11:37:10.339Z (5 months ago)
- Topics: algorithms, inverse-reinforcement-learning, model-free-rl, pytorch, pytorch-rl, reinforcement-learning
- Language: Python
- Homepage:
- Size: 12.1 MB
- Stars: 30
- Watchers: 2
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Reinforcement Learning Code with PyTorch
## Papers
- [Deep Q-Network (DQN)](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)
- [Double DQN (DDQN)](https://arxiv.org/pdf/1509.06461.pdf)
- [Advantage Actor-Critic (A2C)](http://incompleteideas.net/book/RLbook2018.pdf)
- [Asynchronous Advantage Actor-Critic (A3C)](https://arxiv.org/pdf/1602.01783.pdf)
- [Deep Deterministic Policy Gradient (DDPG)](https://arxiv.org/pdf/1509.02971.pdf)
- [Truncated Natural Policy Gradient (TNPG)](https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf)
- [Trust Region Policy Optimization (TRPO)](https://arxiv.org/pdf/1502.05477.pdf)
- [Generalized Advantage Estimator (GAE)](https://arxiv.org/pdf/1506.02438.pdf)
- [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf)
- [Soft Actor-Critic (SAC)](https://arxiv.org/pdf/1812.05905.pdf)
- [Apprenticeship Learning via Inverse Reinforcement Learning (APP)](http://people.eecs.berkeley.edu/~russell/classes/cs294/s11/readings/Abbeel+Ng:2004.pdf)
- [Maximum Entropy Inverse Reinforcement Learning (MaxEnt)](http://new.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf)
- [Generative Adversarial Imitation Learning (GAIL)](https://papers.nips.cc/paper/6391-generative-adversarial-imitation-learning.pdf)
- [Variational Adversarial Imitation Learning (VAIL)](https://arxiv.org/pdf/1810.00821.pdf)## Algorithms
### 01. Model-Free Reinforcement Learning
#### Deep Q-Network (DQN)
- [CartPole(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/cartpole/dqn)
#### Double DQN (DDQN)
- [CartPole(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/cartpole/ddqn)
#### Advantage Actor-Critic (A2C)
- [CartPole(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/cartpole/a2c)
#### Asynchronous Advantage Actor-Critic (A3C)
- [CartPole(Classic control)]()
#### Deep Deterministic Policy Gradient (DDPG)
- [Pendulum(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/pendulum/ddpg)
#### Truncated Natural Policy Gradient (TNPG)
- [Pendulum(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/pendulum/tnpg)
- [Hopper(MoJoCo)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/mujoco/tnpg)#### Trust Region Policy Optimization (TRPO)
- [Pendulum(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/pendulum/trpo)
#### TRPO + Generalized Advantage Estimator (GAE)
- [Pendulum(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/pendulum/trpo_gae)
- [Hopper(MoJoCo)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/mujoco/trpo)#### Proximal Policy Optimization (PPO)
- [Pendulum(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/pendulum/ppo)
#### PPO + Generalized Advantage Estimator (GAE)
- [Pendulum(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/pendulum/ppo_gae)
- [Hopper(MoJoCo)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/mujoco/ppo)#### Soft Actor-Critic (SAC)
- [Pendulum(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/pendulum/sac)
- [Hopper(MoJoCo)]()---
### 02. Inverse Reinforcement Learning
#### Apprenticeship Learning via Inverse Reinforcement Learning (APP)
- [MountainCar(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/mountaincar/app)
#### Maximum Entropy Inverse Reinforcement Learning (MaxEnt)
- [MountainCar(Classic control)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/mountaincar/maxent)
#### Generative Adversarial Imitation Learning (GAIL)
- [Hopper(MoJoCo)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/mujoco/gail)
#### Variational Adversarial Imitation Learning (VAIL)
- [Hopper(MoJoCo)](https://github.com/dongminleeai/Reinforcement-Learning-Code/tree/master/mujoco/vail)
---
## Learning curve
### CartPole
### Pendulum
### Hopper
---
## Reference
- [Minimal and Clean Reinforcement Learning Examples in PyTorch](https://github.com/reinforcement-learning-kr/reinforcement-learning-pytorch)
- [Pytorch implementation for Policy Gradient algorithms (REINFORCE, NPG, TRPO, PPO)](https://github.com/reinforcement-learning-kr/pg_travel)
- [Implementation of APP](https://github.com/jangirrishabh/toyCarIRL)
- [Implementation of MaxEnt](https://github.com/MatthewJA/Inverse-Reinforcement-Learning)
- [Pytorch implementation of GAIL](https://github.com/Khrylx/PyTorch-RL)
- [Pytorch implementation of SAC1](https://github.com/vitchyr/rlkit/tree/master/rlkit/torch/sac)
- [Pytorch implementation of SAC2](https://github.com/pranz24/pytorch-soft-actor-critic)