Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/google-research/batch_rl

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
https://github.com/google-research/batch_rl

Last synced: about 1 month ago
JSON representation

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games

Awesome Lists containing this project

README

        

# [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ktlNni_vwFpFtCgUez-RHW0OdGc2U_Wv?usp=sharing) [![Website](https://img.shields.io/badge/www-Website-green)](https://offline-rl.github.io) [![Blog](https://img.shields.io/badge/b-Blog-blue)](https://ai.googleblog.com/2020/04/an-optimistic-perspective-on-offline.html) [![JAX Code](https://img.shields.io/badge/JAX-Code-orange)](https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl)

# An Optimistic Perspective on Offline Reinforcement Learning (ICML, 2020)

This project provides the open source implementation using the
[Dopamine][dopamine] framework for running experiments mentioned in [An Optimistic Perspective on Offline Reinforcement Learning][paper].
In this work, we use the logged experiences of a DQN agent for training off-policy
agents (shown below) in an offline setting (*i.e.*, [batch RL][batch_rl]) without any new
interaction with the environment during training. Refer to
[offline-rl.github.io][project_page] for the project page.

Architechture of different off-policy agents

[paper]: https://arxiv.org/pdf/1907.04543.pdf
[dopamine]: https://github.com/google/dopamine

# Important notes on Atari ROM versions

The DQN replay dataset is generated using [a legacy set of Atari ROMs](https://github.com/openai/atari-py/tree/0.2.5/atari_py/atari_roms) specified in [`atari-py<=0.2.5`](https://github.com/openai/atari-py/tree/0.2.5), which is different from the ones specified in [`atari-py>=0.2.6`](https://github.com/openai/atari-py/tree/0.2.6) or in recent versions of [`ale-py`](https://github.com/mgbellemare/Arcade-Learning-Environment). To avoid train/evaluation mismatches, it is important to use `atari-py<=0.2.5` and also `gym<=0.19.0`, as higher versions of `gym` no longer support `atari-py`.

Alternatively, if you prefer to use recent versions of `ale-py` and `gym`, you can manually download the legacy ROMs from [`atari-py==0.2.5`](https://github.com/openai/atari-py/tree/0.2.5/atari_py/atari_roms) and specify the ROM paths in `ale-py`. For example, assuming `atari_py_rom_breakout` is the path to the downloaded ROM file `breakout.bin`, you can do the following before creating the gym environment:

```
import ale_py.roms
ale_py.roms.Breakout = atari_py_rom_breakout
```

Note that this is an ad-hoc trick to circumvent the md5 checks in `ale-py<=0.7.5` and it may not work in future versions of `ale-py`. **Do not use this solution unless you know what you are doing**.

# How to train offline agents on 50M dataset without RAM errors?
Please refer to https://github.com/google-research/batch_rl/issues/10.

# JAX codebase
[https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl](https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl).

## DQN Replay Dataset (Logged DQN data)

The DQN Replay Dataset was collected as follows:
We first train a [DQN][nature_dqn] agent, on all 60 [Atari 2600 games][ale]
with [sticky actions][stochastic_ale] enabled for 200 million frames (standard protocol) and save all of the experience tuples
of *(observation, action, reward, next observation)* (approximately 50 million)
encountered during training.

This logged DQN data can be found in the public [GCP bucket][gcp_bucket]
`gs://atari-replay-datasets` which can be downloaded using [`gsutil`][gsutil].
To install gsutil, follow the instructions [here][gsutil_install].

After installing gsutil, run the command to copy the entire dataset:

```
gsutil -m cp -R gs://atari-replay-datasets/dqn ./
```

To run the dataset only for a specific Atari 2600 game (*e.g.*, replace `GAME_NAME`
by `Pong` to download the logged DQN replay datasets for the game of Pong),
run the command:

```
gsutil -m cp -R gs://atari-replay-datasets/dqn/[GAME_NAME] ./
```

This data can be generated by running the online agents using
[`batch_rl/baselines/train.py`](https://github.com/google-research/batch_rl/blob/master/batch_rl/baselines/train.py) for 200 million frames
(standard protocol). Note that the dataset consists of approximately 50 million
experience tuples due to frame skipping (*i.e.*, repeating a selected action for
`k` consecutive frames) of 4. The stickiness parameter is set to 0.25, *i.e.*,
there is 25% chance at every time step that the environment will execute the
agent's previous action again, instead of the agent's new action.

#### Some Publications using DQN Replay Dataset (please open a pull request for missing entries):
- [Revisiting Fundamentals of Experience Replay](https://arxiv.org/abs/2007.06700)
- [RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning](https://arxiv.org/abs/2006.13888)
- [Conservative Q-Learning for Offline Reinforcement Learning](https://arxiv.org/abs/2006.04779)
- [Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning](https://arxiv.org/abs/2010.14498)
- [Acme: A new framework for distributed reinforcement learning](https://arxiv.org/abs/2006.00979)
- [Regularized Behavior Value Estimation](https://arxiv.org/abs/2103.09575)
- [Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)
- [Provable Representation Learning for Imitation with Contrastive Fourier Features](https://arxiv.org/abs/2105.12272)
- [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345)
- [DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization](https://arxiv.org/abs/2112.04716)
- [Pretraining Representations for Data-Efficient Reinforcement Learning](https://arxiv.org/abs/2106.04799)
- [Multi-Game Decision Transformers](https://arxiv.org/abs/2205.15241)
- [Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes](https://arxiv.org/abs/2211.15144)

[nature_dqn]: https://www.nature.com/articles/nature14236?wm=book_wap_0005
[gsutil_install]: https://cloud.google.com/storage/docs/gsutil_install#install
[gsutil]: https://cloud.google.com/storage/docs/gsutil
[batch_rl]: http://tgabel.de/cms/fileadmin/user_upload/documents/Lange_Gabel_EtAl_RL-Book-12.pdf
[stochastic_ale]: https://arxiv.org/abs/1709.06009
[ale]: https://github.com/mgbellemare/Arcade-Learning-Environment
[gcp_bucket]: https://console.cloud.google.com/storage/browser/atari-replay-datasets
[project_page]: https://offline-rl.github.io

## Asymptotic Performance of offline agents on Atari-replay dataset


Number of games where a batch agent outperforms online DQN
Asymptotic Performance of offline agents on DQN data

## Installation
Install the dependencies below, based on your operating system, and then
install Dopamine, *e.g*.

```
pip install git+https://github.com/google/dopamine.git
```

Finally, download the source code for batch RL, *e.g.*

```
git clone https://github.com/google-research/batch_rl.git
```

### Ubuntu

If you don't have access to a GPU, then replace `tensorflow-gpu` with
`tensorflow` in the line below (see [Tensorflow
instructions](https://www.tensorflow.org/install/install_linux) for details).

```
sudo apt-get update && sudo apt-get install cmake zlib1g-dev
pip install absl-py atari-py gin-config gym opencv-python tensorflow-gpu
```

### Mac OS X

```
brew install cmake zlib
pip install absl-py atari-py gin-config gym opencv-python tensorflow
```

## Running Tests

Assuming that you have cloned the
[batch_rl](https://github.com/google-research/batch_rl.git) repository,
follow the instructions below to run unit tests.

#### Basic test
You can test whether basic code is working by running the following:

```
cd batch_rl
python -um batch_rl.tests.atari_init_test
```

#### Test for training an agent with fixed replay buffer
To test an agent using a fixed replay buffer, first generate the data for the
Atari 2600 game of `Pong` to `$DATA_DIR`.

```
export DATA_DIR="Insert directory name here"
mkdir -p $DATA_DIR/Pong
gsutil -m cp -R gs://atari-replay-datasets/dqn/Pong/1 $DATA_DIR/Pong
```

Assuming the replay data is present in `$DATA_DIR/Pong/1/replay_logs`, run the `FixedReplayDQNAgent` on `Pong` using the logged DQN data:

```
cd batch_rl
python -um batch_rl.tests.fixed_replay_runner_test \
--replay_dir=$DATA_DIR/Pong/1
```

## Training batch agents on DQN data

The entry point to the standard Atari 2600 experiment is
[`batch_rl/fixed_replay/train.py`](https://github.com/google-research/batch_rl/blob/master/batch_rl/fixed_replay/train.py).
Run the batch `DQN` agent using the following command:

```
python -um batch_rl.fixed_replay.train \
--base_dir=/tmp/batch_rl \
--replay_dir=$DATA_DIR/Pong/1 \
--gin_files='batch_rl/fixed_replay/configs/dqn.gin'
```

By default, this will kick off an experiment lasting 200 training iterations
(equivalent to experiencing 200 million frames for an online agent).

To get finer-grained information about the process,
you can adjust the experiment parameters in
[`batch_rl/fixed_replay/configs/dqn.gin`](https://github.com/google-research/batch_rl/blob/master/batch_rl/fixed_replay/configs/dqn.gin),
in particular by increasing the `FixedReplayRunner.num_iterations` to see
the asymptotic performance of the batch agents. For example,
run the batch `REM` agent for 800 training iterations on the game of Pong
using the following command:

```
python -um batch_rl.fixed_replay.train \
--base_dir=/tmp/batch_rl \
--replay_dir=$DATA_DIR/Pong/1 \
--agent_name=multi_head_dqn \
--gin_files='batch_rl/fixed_replay/configs/rem.gin' \
--gin_bindings='FixedReplayRunner.num_iterations=1000' \
--gin_bindings='atari_lib.create_atari_environment.game_name = "Pong"'
```

More generally, since this code is based on Dopamine, it can be
easily configured using the
[gin configuration framework](https://github.com/google/gin-config).

## Dependencies

The code was tested under Ubuntu 16 and uses these packages:

- tensorflow-gpu>=1.13
- absl-py
- atari-py
- gin-config
- opencv-python
- gym
- numpy

The python version upto `3.7.9` has been [reported to work](https://github.com/google-research/batch_rl/issues/21).

Citing
------
If you find this open source release useful, please reference in your paper:

> Agarwal, R., Schuurmans, D. & Norouzi, M.. (2020).
> An Optimistic Perspective on Offline Reinforcement Learning
> *International Conference on Machine Learning (ICML)*.

@inproceedings{agarwal2020optimistic,
title={An Optimistic Perspective on Offline Reinforcement Learning},
author={Agarwal, Rishabh and Schuurmans, Dale and Norouzi, Mohammad},
journal={International Conference on Machine Learning},
year={2020}
}

Note: A previous version of this work was titled "Striving for Simplicity in Off
Policy Deep Reinforcement Learning" and was presented as a contributed talk at
NeurIPS 2019 DRL Workshop.

Disclaimer: This is not an official Google product.