https://github.com/google-research/batch_rl

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
https://github.com/google-research/batch_rl
Last synced: 6 months ago
JSON representation
Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
Host: GitHub
URL: https://github.com/google-research/batch_rl
Owner: google-research
License: apache-2.0
Created: 2019-07-25T00:21:20.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2023-06-26T15:14:05.000Z (almost 2 years ago)
Last Synced: 2024-08-02T13:26:58.825Z (9 months ago)
Language: Python
Homepage: https://offline-rl.github.io/
Size: 85 KB
Stars: 523
Watchers: 13
Forks: 73
Open Issues: 11
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

StarryDivineSky - google-research/batch_rl
README

        
# [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ktlNni_vwFpFtCgUez-RHW0OdGc2U_Wv?usp=sharing) [![Website](https://img.shields.io/badge/www-Website-green)](https://offline-rl.github.io) [![Blog](https://img.shields.io/badge/b-Blog-blue)](https://ai.googleblog.com/2020/04/an-optimistic-perspective-on-offline.html) [![JAX Code](https://img.shields.io/badge/JAX-Code-orange)](https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl)  

# An Optimistic Perspective on Offline Reinforcement Learning (ICML, 2020)

This project provides the open source implementation using the

[Dopamine][dopamine] framework for running experiments mentioned in [An Optimistic Perspective on Offline Reinforcement Learning][paper].

In this work, we use the logged experiences of a DQN agent for training off-policy

agents (shown below) in an offline setting (*i.e.*, [batch RL][batch_rl]) without any new

interaction with the environment during training. Refer to

[offline-rl.github.io][project_page] for the project page.



[paper]: https://arxiv.org/pdf/1907.04543.pdf

[dopamine]: https://github.com/google/dopamine

# Important notes on Atari ROM versions

The DQN replay dataset is generated using [a legacy set of Atari ROMs](https://github.com/openai/atari-py/tree/0.2.5/atari_py/atari_roms) specified in [`atari-py<=0.2.5`](https://github.com/openai/atari-py/tree/0.2.5), which is different from the ones specified in [`atari-py>=0.2.6`](https://github.com/openai/atari-py/tree/0.2.6) or in recent versions of [`ale-py`](https://github.com/mgbellemare/Arcade-Learning-Environment). To avoid train/evaluation mismatches, it is important to use `atari-py<=0.2.5` and also `gym<=0.19.0`, as higher versions of `gym` no longer support `atari-py`. 

Alternatively, if you prefer to use recent versions of `ale-py` and `gym`, you can manually download the legacy ROMs from [`atari-py==0.2.5`](https://github.com/openai/atari-py/tree/0.2.5/atari_py/atari_roms) and specify the ROM paths in `ale-py`. For example, assuming `atari_py_rom_breakout` is the path to the downloaded ROM file `breakout.bin`, you can do the following before creating the gym environment:

```

import ale_py.roms

ale_py.roms.Breakout = atari_py_rom_breakout

```

Note that this is an ad-hoc trick to circumvent the md5 checks in `ale-py<=0.7.5` and it may not work in future versions of `ale-py`. **Do not use this solution unless you know what you are doing**.

# How to train offline agents on 50M dataset without RAM errors?

Please refer to https://github.com/google-research/batch_rl/issues/10.

# JAX codebase 

[https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl](https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl).

## DQN Replay Dataset (Logged DQN data)

The DQN Replay Dataset was collected as follows:

We first train a [DQN][nature_dqn] agent, on all 60 [Atari 2600 games][ale]

with [sticky actions][stochastic_ale] enabled for 200 million frames (standard protocol) and save all of the experience tuples

of *(observation, action, reward, next observation)* (approximately 50 million)

encountered during training.

This logged DQN data can be found in the public [GCP bucket][gcp_bucket]

`gs://atari-replay-datasets` which can be downloaded using [`gsutil`][gsutil].

To install gsutil, follow the instructions [here][gsutil_install].

After installing gsutil, run the command to copy the entire dataset:

```

gsutil -m cp -R gs://atari-replay-datasets/dqn ./

```

To run the dataset only for a specific Atari 2600 game (*e.g.*, replace `GAME_NAME`

by `Pong` to download the logged DQN replay datasets for the game of Pong),

run the command:

```

gsutil -m cp -R gs://atari-replay-datasets/dqn/[GAME_NAME] ./

```

This data can be generated by running the online agents using

[`batch_rl/baselines/train.py`](https://github.com/google-research/batch_rl/blob/master/batch_rl/baselines/train.py) for 200 million frames

(standard protocol). Note that the dataset consists of approximately 50 million

experience tuples due to frame skipping (*i.e.*, repeating a selected action for

`k` consecutive frames) of 4. The stickiness parameter is set to 0.25, *i.e.*,

there is 25% chance at every time step that the environment will execute the

agent's previous action again, instead of the agent's new action.

#### Some Publications using DQN Replay Dataset (please open a pull request for missing entries):

- [Revisiting Fundamentals of Experience Replay](https://arxiv.org/abs/2007.06700) 

- [RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning](https://arxiv.org/abs/2006.13888)

- [Conservative Q-Learning for Offline Reinforcement Learning](https://arxiv.org/abs/2006.04779) 

- [Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning](https://arxiv.org/abs/2010.14498) 

- [Acme: A new framework for distributed reinforcement learning](https://arxiv.org/abs/2006.00979) 

- [Regularized Behavior Value Estimation](https://arxiv.org/abs/2103.09575)

- [Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)

- [Provable Representation Learning for Imitation with Contrastive Fourier Features](https://arxiv.org/abs/2105.12272)

- [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345)

- [DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization](https://arxiv.org/abs/2112.04716)

- [Pretraining Representations for Data-Efficient Reinforcement Learning](https://arxiv.org/abs/2106.04799)

- [Multi-Game Decision Transformers](https://arxiv.org/abs/2205.15241)

- [Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes](https://arxiv.org/abs/2211.15144)

[nature_dqn]: https://www.nature.com/articles/nature14236?wm=book_wap_0005

[gsutil_install]: https://cloud.google.com/storage/docs/gsutil_install#install

[gsutil]: https://cloud.google.com/storage/docs/gsutil

[batch_rl]: http://tgabel.de/cms/fileadmin/user_upload/documents/Lange_Gabel_EtAl_RL-Book-12.pdf

[stochastic_ale]: https://arxiv.org/abs/1709.06009

[ale]: https://github.com/mgbellemare/Arcade-Learning-Environment

[gcp_bucket]: https://console.cloud.google.com/storage/browser/atari-replay-datasets

[project_page]: https://offline-rl.github.io

## Asymptotic Performance of offline agents on Atari-replay dataset



  

  



## Installation

Install the dependencies below, based on your operating system, and then

install Dopamine, *e.g*.

```

pip install git+https://github.com/google/dopamine.git

```

Finally, download the source code for batch RL, *e.g.*

```

git clone https://github.com/google-research/batch_rl.git

```

### Ubuntu

If you don't have access to a GPU, then replace `tensorflow-gpu` with

`tensorflow` in the line below (see [Tensorflow

instructions](https://www.tensorflow.org/install/install_linux) for details).

```

sudo apt-get update && sudo apt-get install cmake zlib1g-dev

pip install absl-py atari-py gin-config gym opencv-python tensorflow-gpu

```

### Mac OS X

```

brew install cmake zlib

pip install absl-py atari-py gin-config gym opencv-python tensorflow

```

## Running Tests

Assuming that you have cloned the

[batch_rl](https://github.com/google-research/batch_rl.git) repository,

follow the instructions below to run unit tests.

#### Basic test

You can test whether basic code is working by running the following:

```

cd batch_rl

python -um batch_rl.tests.atari_init_test

```

#### Test for training an agent with fixed replay buffer

To test an agent using a fixed replay buffer, first generate the data for the

Atari 2600 game of `Pong` to `$DATA_DIR`.

```

export DATA_DIR="Insert directory name here"

mkdir -p $DATA_DIR/Pong

gsutil -m cp -R gs://atari-replay-datasets/dqn/Pong/1 $DATA_DIR/Pong

```

Assuming the replay data is present in `$DATA_DIR/Pong/1/replay_logs`, run the `FixedReplayDQNAgent` on `Pong` using the logged DQN data:

```

cd batch_rl

python -um batch_rl.tests.fixed_replay_runner_test \

  --replay_dir=$DATA_DIR/Pong/1

```

## Training batch agents on DQN data

The entry point to the standard Atari 2600 experiment is

[`batch_rl/fixed_replay/train.py`](https://github.com/google-research/batch_rl/blob/master/batch_rl/fixed_replay/train.py).

Run the batch `DQN` agent using the following command:

```

python -um batch_rl.fixed_replay.train \

  --base_dir=/tmp/batch_rl \

  --replay_dir=$DATA_DIR/Pong/1 \

  --gin_files='batch_rl/fixed_replay/configs/dqn.gin'

```

By default, this will kick off an experiment lasting 200 training iterations

(equivalent to experiencing 200 million frames for an online agent).

To get finer-grained information about the process,

you can adjust the experiment parameters in

[`batch_rl/fixed_replay/configs/dqn.gin`](https://github.com/google-research/batch_rl/blob/master/batch_rl/fixed_replay/configs/dqn.gin),

in particular by increasing the `FixedReplayRunner.num_iterations` to see

the asymptotic performance of the batch agents. For example,

run the batch `REM` agent for 800 training iterations on the game of Pong 

using the following command:

```

python -um batch_rl.fixed_replay.train \

  --base_dir=/tmp/batch_rl \

  --replay_dir=$DATA_DIR/Pong/1 \

  --agent_name=multi_head_dqn \

  --gin_files='batch_rl/fixed_replay/configs/rem.gin' \

  --gin_bindings='FixedReplayRunner.num_iterations=1000' \

  --gin_bindings='atari_lib.create_atari_environment.game_name = "Pong"'

```

More generally, since this code is based on Dopamine, it can be

easily configured using the

[gin configuration framework](https://github.com/google/gin-config).

## Dependencies

The code was tested under Ubuntu 16 and uses these packages:

- tensorflow-gpu>=1.13

- absl-py

- atari-py

- gin-config

- opencv-python

- gym

- numpy

The python version upto `3.7.9` has been [reported to work](https://github.com/google-research/batch_rl/issues/21).

Citing

------

If you find this open source release useful, please reference in your paper:

> Agarwal, R., Schuurmans, D. & Norouzi, M.. (2020).

> An Optimistic Perspective on Offline Reinforcement Learning

> *International Conference on Machine Learning (ICML)*.

    @inproceedings{agarwal2020optimistic,

      title={An Optimistic Perspective on Offline Reinforcement Learning},

      author={Agarwal, Rishabh and Schuurmans, Dale and Norouzi, Mohammad},

      journal={International Conference on Machine Learning},

      year={2020}

    }

Note: A previous version of this work was titled "Striving for Simplicity in Off

Policy Deep Reinforcement Learning" and was presented as a contributed talk at

NeurIPS 2019 DRL Workshop.

Disclaimer: This is not an official Google product.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/google-research/batch_rl

Awesome Lists containing this project

README