https://github.com/dongminlee94/deep_rl

PyTorch implementation of deep reinforcement learning algorithms
https://github.com/dongminlee94/deep_rl

a2c ddpg ddqn deep-reinforcement-learning dqn model-free-rl npg ppo pytorch sac sac-aea td3 trpo vpg

Last synced: 3 months ago
JSON representation

PyTorch implementation of deep reinforcement learning algorithms

Host: GitHub
URL: https://github.com/dongminlee94/deep_rl
Owner: dongminlee94
License: mit
Created: 2019-09-17T07:12:40.000Z (almost 6 years ago)
Default Branch: main
Last Pushed: 2021-11-19T14:22:50.000Z (over 3 years ago)
Last Synced: 2025-03-29T17:12:06.576Z (3 months ago)
Topics: a2c, ddpg, ddqn, deep-reinforcement-learning, dqn, model-free-rl, npg, ppo, pytorch, sac, sac-aea, td3, trpo, vpg
Language: Python
Homepage:
Size: 30.2 MB
Stars: 496
Watchers: 12
Forks: 59
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

        # Deep Reinforcement Learning (DRL) Algorithms with PyTorch

This repository contains PyTorch implementations of deep reinforcement learning algorithms. **The repository will soon be updated including the PyBullet environments!**

## Algorithms Implemented

1. Deep Q-Network (DQN) _{^{([V. Mnih et al. 2015](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf))}}

2. Double DQN (DDQN) _{^{([H. Van Hasselt et al. 2015](https://arxiv.org/abs/1509.06461))}}

3. Advantage Actor Critic (A2C)

4. Vanilla Policy Gradient (VPG)

5. Natural Policy Gradient (NPG) _{^{([S. Kakade et al. 2002](http://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf))}}

6. Trust Region Policy Optimization (TRPO) _{^{([J. Schulman et al. 2015](https://arxiv.org/abs/1502.05477))}}

7. Proximal Policy Optimization (PPO) _{^{([J. Schulman et al. 2017](https://arxiv.org/abs/1707.06347))}}

8. Deep Deterministic Policy Gradient (DDPG) _{^{([T. Lillicrap et al. 2015](https://arxiv.org/abs/1509.02971))}}

9. Twin Delayed DDPG (TD3) _{^{([S. Fujimoto et al. 2018](https://arxiv.org/abs/1802.09477))}}

10. Soft Actor-Critic (SAC) _{^{([T. Haarnoja et al. 2018](https://arxiv.org/abs/1801.01290))}}

11. SAC with automatic entropy adjustment (SAC-AEA) _{^{([T. Haarnoja et al. 2018](https://arxiv.org/abs/1812.05905))}}

## Environments Implemented

1. Classic control environments (CartPole-v1, Pendulum-v0, etc.) _{^{(as described in [here](https://gym.openai.com/envs/#classic_control))}}

2. MuJoCo environments (Hopper-v2, HalfCheetah-v2, Ant-v2, Humanoid-v2, etc.) _{^{(as described in [here](https://gym.openai.com/envs/#mujoco))}}

3. **PyBullet environments (HopperBulletEnv-v0, HalfCheetahBulletEnv-v0, AntBulletEnv-v0, HumanoidDeepMimicWalkBulletEnv-v1 etc.)** _{^{(as described in [here](https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs))}}

## Results (MuJoCo, PyBullet)

### MuJoCo environments

#### Hopper-v2

- Observation space: 8

- Action space: 3

#### HalfCheetah-v2

- Observation space: 17

- Action space: 6

#### Ant-v2

- Observation space: 111

- Action space: 8

#### Humanoid-v2

- Observation space: 376

- Action space: 17

### PyBullet environments

#### HopperBulletEnv-v0

- Observation space: 15

- Action space: 3

#### HalfCheetahBulletEnv-v0

- Observation space: 26

- Action space: 6

#### AntBulletEnv-v0

- Observation space: 28

- Action space: 8

#### HumanoidDeepMimicWalkBulletEnv-v1

- Observation space: 197

- Action space: 36

## Requirements

- [PyTorch](https://pytorch.org)

- [TensorBoard](https://pytorch.org/docs/stable/tensorboard.html)

- [gym](https://github.com/openai/gym)

- [mujoco-py](https://github.com/openai/mujoco-py)

- [PyBullet](https://pybullet.org/wordpress/)

## Usage

The repository's high-level structure is:

    ├── agents                    

        └── common 

    ├── results  

        ├── data 

        └── graphs        

    └── save_model

### 1) To train the agents on the environments

To train all the different agents on PyBullet environments, follow these steps:

```commandline

git clone https://github.com/dongminlee94/deep_rl.git

cd deep_rl

python run_bullet.py

```

For other environments, change the last line to `run_cartpole.py`, `run_pendulum.py`, `run_mujoco.py`.

If you want to change configurations of the agents, follow this step:

```commandline

python run_bullet.py \

    --env=HumanoidDeepMimicWalkBulletEnv-v1 \

    --algo=sac-aea \

    --phase=train \

    --render=False \

    --load=None \

    --seed=0 \

    --iterations=200 \

    --steps_per_iter=5000 \

    --max_step=1000 \

    --tensorboard=True \

    --gpu_index=0

```

### 2) To watch the learned agents on the above environments

To watch all the learned agents on PyBullet environments, follow these steps:

```commandline

python run_bullet.py \

    --env=HumanoidDeepMimicWalkBulletEnv-v1 \

    --algo=sac-aea \

    --phase=test \

    --render=True \

    --load=envname_algoname_... \

    --seed=0 \

    --iterations=200 \

    --steps_per_iter=5000 \

    --max_step=1000 \

    --tensorboard=False \

    --gpu_index=0

```

You should copy the saved model name in `save_model/envname_algoname_...` and paste the copied name in `envname_algoname_...`. So the saved model will be load.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dongminlee94/deep_rl

Awesome Lists containing this project

README