https://github.com/rmst/rlrd

PyTorch implementation of our paper Reinforcement Learning with Random Delays (ICLR 2020)
https://github.com/rmst/rlrd

deep-learning deep-reinforcement-learning pytorch reinforcement-learning

Last synced: about 1 year ago
JSON representation

PyTorch implementation of our paper Reinforcement Learning with Random Delays (ICLR 2020)

Host: GitHub
URL: https://github.com/rmst/rlrd
Owner: rmst
License: mit
Created: 2021-01-09T10:23:54.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-05-25T19:45:36.000Z (about 4 years ago)
Last Synced: 2025-04-20T05:32:14.402Z (about 1 year ago)
Topics: deep-learning, deep-reinforcement-learning, pytorch, reinforcement-learning
Language: Python
Homepage:
Size: 1.47 MB
Stars: 40
Watchers: 5
Forks: 9
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Reinforcement Learning with Random Delays

PyTorch implementation of our paper [Reinforcement Learning with Random Delays (ICLR 2020)](https://openreview.net/forum?id=QFYnKlBJYR) – [[Arxiv]](https://arxiv.org/abs/2010.02966)

### Getting Started
This repository can be pip-installed via:
```bash
pip install git+https://github.com/rmst/rlrd.git
```

DC/AC can be run on a simple 1-step delayed `Pendulum-v0` task via:
```bash
python -m rlrd run rlrd:DcacTraining Env.id=Pendulum-v0
```

Hyperparameters can be set via command line. E.g.:
```bash
python -m rlrd run rlrd:DcacTraining \
Env.id=Pendulum-v0 \
Env.min_observation_delay=0 \
Env.sup_observation_delay=2 \
Env.min_action_delay=0 \
Env.sup_action_delay=3 \
Agent.batchsize=128 \
Agent.memory_size=1000000 \
Agent.lr=0.0003 \
Agent.discount=0.99 \
Agent.target_update=0.005 \
Agent.reward_scale=5.0 \
Agent.entropy_scale=1.0 \
Agent.start_training=10000 \
Agent.device=cuda \
Agent.training_steps=1.0 \
Agent.loss_alpha=0.2 \
Agent.Model.hidden_units=256 \
Agent.Model.num_critics=2
```

Note that our gym wrapper adds a constant 1-step delay to the action delay, i.e. ```Env.min_action_delay=0``` actually means that the minimum action delay is 1 whereas ```Env.min_observation_delay=0``` means that the minimum observation delay is 0 (we assume that the action delay cannot be less than 1 time-step, e.g. for action inference).
For instance:
- ```Env.min_observation_delay=0 Env.sup_observation_delay=2``` means that the observation delay is randomly 0 or 1.
- ```Env.min_action_delay=0 Env.sup_action_delay=2``` means that the action delay is randomly 1 or 2.
- ```Env.min_observation_delay=1 Env.sup_observation_delay=2``` means that the observation delay is always 1.
- ```Env.min_observation_delay=0 Env.sup_observation_delay=3``` means that the observation delay is randomly 0, 1 or 2.
- etc.

### Mujoco Experiments
To install Mujoco, follow the instructions at [openai/gym](https://github.com/openai/gym).
The following environments were used in the paper:

![MuJoCo](resources/mujoco_horizontal.png)

To train DC/AC on a 1-step delayed version of `HalfCheetah-v2`, run:
```bash
python -m rlrd run rlrd:DcacTraining Env.id=HalfCheetah-v2
```

To train SAC on a 1-step delayed version of `Ant-v2` run:
```bash
python -m rlrd run rlrd:DelayedSacTraining Env.id=Ant-v2
```

### Weights and Biases API
Your curves can be exported directly to the Weights and Biases (wandb) website by using `run-wandb`.
For example, to run DC/AC on Pendulum with a 1-step delay and export the curves to your wanb project:

```terminal
python -m rlrd run-wandb \
yourWandbID \
yourWandbProjectName \
aNameForTheWandbRun \
aFileNameForLocalCheckpoints \
rlrd:DcacTraining Env.id=Pendulum-v0
```

Use the optional hyperparameters descibed before to play with more meaningful delays.

### Contribute / known issues
Contributions are welcome.
Please submit a PR with your name in the contributors list.

We did not yet optimize our python implementation of DC/AC, this is the most important thing to do right now as it is quite slow.

In particular, a lot of time is wasted when artificially re-creating a batched tensor for computing the value estimates in one forward pass, and the replay buffer is inefficient.
See the `#FIXME` in [dcac.py](https://github.com/rmst/rlrd/blob/master/rlrd/dcac.py)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rmst/rlrd

Awesome Lists containing this project

README