https://github.com/jason-cky/deeprl-pytorch
Pytorch implementations of various Deep Reinforcement Learning algorithms on pybullet environments.
https://github.com/jason-cky/deeprl-pytorch
ddpg ppo pybullet-environments python3 pytorch-implementation reinforcement-learning-algorithms rlbench td3 trpo
Last synced: 7 months ago
JSON representation
Pytorch implementations of various Deep Reinforcement Learning algorithms on pybullet environments.
- Host: GitHub
- URL: https://github.com/jason-cky/deeprl-pytorch
- Owner: Jason-CKY
- Created: 2020-09-12T12:44:30.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-02-18T16:46:26.000Z (over 3 years ago)
- Last Synced: 2024-04-28T05:11:40.678Z (about 1 year ago)
- Topics: ddpg, ppo, pybullet-environments, python3, pytorch-implementation, reinforcement-learning-algorithms, rlbench, td3, trpo
- Language: Python
- Homepage:
- Size: 258 MB
- Stars: 21
- Watchers: 3
- Forks: 6
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Project Deprecated
Please note that there will not be any updates to this project in the foreseeable future. Please do not add any issues to this repo expecting a fix or explanation. Some of the libraries have had breaking updates (gym) and my `requirements.txt` did not state the version requirements and its pretty much impossible to reproduce the experiments now. However, there is value in looking at the implementation of the various RL algorithms.
Please consider forking this project if you want to continue working on it and provide support with newer environments and libraries.
# Deep RL policies on Pybullet Environments
This repo is a pytorch implementation of various deep RL algorithms, trained and evaluated on pybullet robotic environments.
## Dependencies:
* CUDA >= 10.2
* [RLBench](https://github.com/stepjam/RLBench)## Implemented Algorithms:
Name
Discrete actions
Continuous actions
Stochastic policy
Deterministic policy
DDPG
:x:
:heavy_check_mark:
:x:
:heavy_check_mark:
TD3
:x:
:heavy_check_mark:
:x:
:heavy_check_mark:
TRPO
:heavy_check_mark:
:heavy_check_mark:
:heavy_check_mark:
:x:
PPO
:heavy_check_mark:
:heavy_check_mark:
:heavy_check_mark:
:x:
Option-Critic
:heavy_check_mark:
:heavy_check_mark:
:heavy_check_mark:
:x:
DAC_PPO
:heavy_check_mark:
:heavy_check_mark:
:heavy_check_mark:
:x:
## Environments Supported
The following gym environments are supported on this repo.
* OpenAI gym's environments
* Pybullet gym environments
* RLBench gym environments## Types of Networks Implemented:
* Multi-Layered Perceptron (MLP)
* Convolutional Neural Network (CNN)
* Variational Autoencoders (VAE)* hidden_sizes are the number of neurons in each of the dense layer of the MLP.
* conv_layer_sizes is a list containing the parameters of each convolutional layer, i.e. [output_channel, kernel_size, stride]To use mlp neural net, set ac_kwargs['model_type'] to 'mlp'
```
"ac_kwargs": {
"model_type": "mlp"
"hidden_sizes": [256, 256]
}
```To use cnn neural net, set ac_kwargs['model_type'] to 'cnn'
```
"ac_kwargs": {
"model_type": "cnn"
"hidden_sizes": [512, 256],
"conv_layer_sizes": [[16, 5, 2],
[32, 5, 2],
[64, 5, 2],
[64, 3, 1]]
}
```To use cnn neural net, set ac_kwargs['model_type'] to 'vae'.
```
"ac_kwargs": {
"model_type": "vae",
"vae_weights_path": "VAE/output/vae_reach_target-vision-v0_wrist_rgb.pth",
"hidden_sizes": [512, 256]
}
```## VAE network
VAE network needs to be pretrained on the environment's images before being used on the RL algorithm. The data generation and training code are provided at [VAE directory](VAE/README.md)## Comparison of results in PyBullet Environments
Environment
Learning Curve
Episode Recording
CartPole Continuous BulletEnv-v0
![]()
![]()
Hopper BulletEnv-v0
![]()
![]()
AntBulletEnv-v0
![]()
![]()
HalfCheetahBulletEnv-v0
![]()
![]()
## Results of Option-Critic on RLBench Environments
The agents are trained on the front-rgb camera view to solve the RLBench Manipulation Tasks.
Environment
Learning Curve
Episode Recording
open-box
![]()
![]()
close-box
![]()
![]()
## How to use
* Clone this repo
* pip install -r requirements.txt### Training model for openai gym environment
* Edit training parameters in ./Algorithms//_config.json
```
python train.py
usage: train.py [-h] [--env ENV] [--agent {ddpg,trpo,ppo,td3,random}]
[--arch {mlp,cnn}] --timesteps TIMESTEPS [--seed SEED]
[--num_trials NUM_TRIALS] [--normalize] [--rlbench] [--image]optional arguments:
-h, --help show this help message and exit
--env ENV environment_id
--agent {ddpg,trpo,ppo,td3,random}
specify type of agent
--arch {mlp,cnn} specify architecture of neural net
--timesteps TIMESTEPS
specify number of timesteps to train for
--seed SEED seed number for reproducibility
--num_trials NUM_TRIALS
Number of times to train the algo
--normalize if true, normalize environment observations
--rlbench if true, use rlbench environment wrappers
--image if true, use rlbench environment wrappers
```### Testing trained model performance
```
python test.py
usage: test.py [-h] [--env ENV] [--agent {ddpg,trpo,ppo,td3,random}]
[--arch {mlp,cnn}] [--render] [--gif] [--timesteps TIMESTEPS]
[--seed SEED] [--normalize] [--rlbench] [--image]optional arguments:
-h, --help show this help message and exit
--env ENV environment_id
--agent {ddpg,trpo,ppo,td3,random}
specify type of agent
--arch {mlp,cnn} specify architecture of neural net
--render if true, display human renders of the environment
--gif if true, make gif of the trained agent
--timesteps TIMESTEPS
specify number of timesteps to train for
--seed SEED seed number for reproducibility
--normalize if true, normalize environment observations
--rlbench if true, use rlbench environment wrappers
--image if true, use rlbench environment wrappers
```