https://github.com/dongminlee94/samsung-drl-code
A repository for implementation of deep reinforcement learning lectured at Samsung
https://github.com/dongminlee94/samsung-drl-code
deep-reinforcement-learning model-free-rl pytorch pytorch-rl reinforcement-learning
Last synced: about 2 months ago
JSON representation
A repository for implementation of deep reinforcement learning lectured at Samsung
- Host: GitHub
- URL: https://github.com/dongminlee94/samsung-drl-code
- Owner: dongminlee94
- License: mit
- Created: 2019-07-02T04:52:52.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-09-20T04:18:52.000Z (about 4 years ago)
- Last Synced: 2025-05-01T11:37:11.497Z (5 months ago)
- Topics: deep-reinforcement-learning, model-free-rl, pytorch, pytorch-rl, reinforcement-learning
- Language: Python
- Homepage:
- Size: 92.2 MB
- Stars: 108
- Watchers: 4
- Forks: 25
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deep Reinforcement Learning, Summer 2019 (Samsung)
This repository contains codes for **Deep Reinforcement Learning (DRL) algorithms** with PyTorch (v0.4.1). It also provides lecture slides that explain codes in detail.
The agents with the DRL algorithms have been implemented and trained using classic control environments in OpenAI Gym.
- [CartPole](https://gym.openai.com/envs/CartPole-v1/)
- [Pendulum](https://gym.openai.com/envs/Pendulum-v0/)## Table of Contents
### 00. Prerequisite
1. [Install from Anaconda to OpenAI Gym (Window Ver. & MacOS Ver.)](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/0_Prerequisite/01_Install)
2. [Numpy](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/0_Prerequisite/02_Numpy)### 01. Deep Learning with PyTorch
- [Slide](https://github.com/dongminlee94/Samsung-DRL-Code/blob/master/1_DL_Pytorch/DL_PyTorch.pdf)
- [Code](https://github.com/dongminlee94/Samsung-DRL-Code/blob/master/1_DL_Pytorch/PyTorch.py)### 02. Deep Q-Network (DQN) & Double DQN (DDQN)
- [Slide](https://github.com/dongminlee94/Samsung-DRL-Code/blob/master/2_DQN_DDQN/DDQN.pdf)
- [DQN Code](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/2_DQN_DDQN/dqn)
- [DDQN Code](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/2_DQN_DDQN/ddqn)### 03. Advantage Actor-Critic (A2C) & Deep Deterministic Policy Gradient (DDPG)
1. A2C
- [Slide](https://github.com/dongminlee94/Samsung-DRL-Code/blob/master/3_A2C_DDPG/A2C.pdf)
- [Code](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/3_A2C_DDPG/a2c)2. DDPG
- [Slide](https://github.com/dongminlee94/Samsung-DRL-Code/blob/master/3_A2C_DDPG/DDPG.pdf)
- [Code](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/3_A2C_DDPG/ddpg)### 04. Trust Region Policy Optimization (TRPO) & Proximal Policy Optimization (PPO)
1. TRPO
- [Slide](https://github.com/dongminlee94/Samsung-DRL-Code/blob/master/4_TRPO_PPO/TRPO.pdf)
- [Code](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/4_TRPO_PPO/trpo)2. TRPO + GAE
- [Slide](https://github.com/dongminlee94/Samsung-DRL-Code/blob/master/4_TRPO_PPO/GAE.pdf)
- [Code](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/4_TRPO_PPO/trpo_gae)3. PPO
- [Code](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/4_TRPO_PPO/ppo)4. PPO + GAE
- [Slide](https://github.com/dongminlee94/Samsung-DRL-Code/blob/master/4_TRPO_PPO/PPO.pdf)
- [Code](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/4_TRPO_PPO/ppo_gae)### 05. Soft Actor-Critic (SAC)
- [Slide](https://github.com/dongminlee94/Samsung-DRL-Code/blob/master/5_SAC/SAC.pdf)
- [Code](https://github.com/dongminlee94/Samsung-DRL-Code/tree/master/5_SAC/sac)## Learning curve
### CartPole
### Pendulum
## Paper
- [Deep Q-Network (DQN)](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)
- [Double DQN (DDQN)](https://arxiv.org/pdf/1509.06461.pdf)
- [Advantage Actor-Critic (A2C)](http://incompleteideas.net/book/RLbook2018.pdf)
- [Asynchronous Advantage Actor-Critic (A3C)](https://arxiv.org/pdf/1602.01783.pdf)
- [Deep Deterministic Policy Gradient (DDPG)](https://arxiv.org/pdf/1509.02971.pdf)
- [Trust Region Policy Optimization (TRPO)](https://arxiv.org/pdf/1502.05477.pdf)
- [Generalized Advantage Estimator (GAE)](https://arxiv.org/pdf/1506.02438.pdf)
- [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf)
- [Soft Actor-Critic (SAC)](https://arxiv.org/pdf/1812.05905.pdf)## Reference
- [Minimal and Clean Reinforcement Learning Examples in PyTorch](https://github.com/reinforcement-learning-kr/reinforcement-learning-pytorch)
- [Pytorch implementation for Policy Gradient algorithms (REINFORCE, NPG, TRPO, PPO)](https://github.com/reinforcement-learning-kr/pg_travel)
- [Pytorch implementation of SAC1](https://github.com/vitchyr/rlkit/tree/master/rlkit/torch/sac)
- [Pytorch implementation of SAC2](https://github.com/pranz24/pytorch-soft-actor-critic)