Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/google-research/batch_rl
Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
https://github.com/google-research/batch_rl
Last synced: about 1 month ago
JSON representation
Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
- Host: GitHub
- URL: https://github.com/google-research/batch_rl
- Owner: google-research
- License: apache-2.0
- Created: 2019-07-25T00:21:20.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-06-26T15:14:05.000Z (over 1 year ago)
- Last Synced: 2024-08-02T13:26:58.825Z (5 months ago)
- Language: Python
- Homepage: https://offline-rl.github.io/
- Size: 85 KB
- Stars: 523
- Watchers: 13
- Forks: 73
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - google-research/batch_rl
README
# [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ktlNni_vwFpFtCgUez-RHW0OdGc2U_Wv?usp=sharing) [![Website](https://img.shields.io/badge/www-Website-green)](https://offline-rl.github.io) [![Blog](https://img.shields.io/badge/b-Blog-blue)](https://ai.googleblog.com/2020/04/an-optimistic-perspective-on-offline.html) [![JAX Code](https://img.shields.io/badge/JAX-Code-orange)](https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl)
# An Optimistic Perspective on Offline Reinforcement Learning (ICML, 2020)
This project provides the open source implementation using the
[Dopamine][dopamine] framework for running experiments mentioned in [An Optimistic Perspective on Offline Reinforcement Learning][paper].
In this work, we use the logged experiences of a DQN agent for training off-policy
agents (shown below) in an offline setting (*i.e.*, [batch RL][batch_rl]) without any new
interaction with the environment during training. Refer to
[offline-rl.github.io][project_page] for the project page.[paper]: https://arxiv.org/pdf/1907.04543.pdf
[dopamine]: https://github.com/google/dopamine# Important notes on Atari ROM versions
The DQN replay dataset is generated using [a legacy set of Atari ROMs](https://github.com/openai/atari-py/tree/0.2.5/atari_py/atari_roms) specified in [`atari-py<=0.2.5`](https://github.com/openai/atari-py/tree/0.2.5), which is different from the ones specified in [`atari-py>=0.2.6`](https://github.com/openai/atari-py/tree/0.2.6) or in recent versions of [`ale-py`](https://github.com/mgbellemare/Arcade-Learning-Environment). To avoid train/evaluation mismatches, it is important to use `atari-py<=0.2.5` and also `gym<=0.19.0`, as higher versions of `gym` no longer support `atari-py`.
Alternatively, if you prefer to use recent versions of `ale-py` and `gym`, you can manually download the legacy ROMs from [`atari-py==0.2.5`](https://github.com/openai/atari-py/tree/0.2.5/atari_py/atari_roms) and specify the ROM paths in `ale-py`. For example, assuming `atari_py_rom_breakout` is the path to the downloaded ROM file `breakout.bin`, you can do the following before creating the gym environment:
```
import ale_py.roms
ale_py.roms.Breakout = atari_py_rom_breakout
```Note that this is an ad-hoc trick to circumvent the md5 checks in `ale-py<=0.7.5` and it may not work in future versions of `ale-py`. **Do not use this solution unless you know what you are doing**.
# How to train offline agents on 50M dataset without RAM errors?
Please refer to https://github.com/google-research/batch_rl/issues/10.# JAX codebase
[https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl](https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl).## DQN Replay Dataset (Logged DQN data)
The DQN Replay Dataset was collected as follows:
We first train a [DQN][nature_dqn] agent, on all 60 [Atari 2600 games][ale]
with [sticky actions][stochastic_ale] enabled for 200 million frames (standard protocol) and save all of the experience tuples
of *(observation, action, reward, next observation)* (approximately 50 million)
encountered during training.This logged DQN data can be found in the public [GCP bucket][gcp_bucket]
`gs://atari-replay-datasets` which can be downloaded using [`gsutil`][gsutil].
To install gsutil, follow the instructions [here][gsutil_install].After installing gsutil, run the command to copy the entire dataset:
```
gsutil -m cp -R gs://atari-replay-datasets/dqn ./
```To run the dataset only for a specific Atari 2600 game (*e.g.*, replace `GAME_NAME`
by `Pong` to download the logged DQN replay datasets for the game of Pong),
run the command:```
gsutil -m cp -R gs://atari-replay-datasets/dqn/[GAME_NAME] ./
```This data can be generated by running the online agents using
[`batch_rl/baselines/train.py`](https://github.com/google-research/batch_rl/blob/master/batch_rl/baselines/train.py) for 200 million frames
(standard protocol). Note that the dataset consists of approximately 50 million
experience tuples due to frame skipping (*i.e.*, repeating a selected action for
`k` consecutive frames) of 4. The stickiness parameter is set to 0.25, *i.e.*,
there is 25% chance at every time step that the environment will execute the
agent's previous action again, instead of the agent's new action.#### Some Publications using DQN Replay Dataset (please open a pull request for missing entries):
- [Revisiting Fundamentals of Experience Replay](https://arxiv.org/abs/2007.06700)
- [RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning](https://arxiv.org/abs/2006.13888)
- [Conservative Q-Learning for Offline Reinforcement Learning](https://arxiv.org/abs/2006.04779)
- [Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning](https://arxiv.org/abs/2010.14498)
- [Acme: A new framework for distributed reinforcement learning](https://arxiv.org/abs/2006.00979)
- [Regularized Behavior Value Estimation](https://arxiv.org/abs/2103.09575)
- [Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)
- [Provable Representation Learning for Imitation with Contrastive Fourier Features](https://arxiv.org/abs/2105.12272)
- [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345)
- [DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization](https://arxiv.org/abs/2112.04716)
- [Pretraining Representations for Data-Efficient Reinforcement Learning](https://arxiv.org/abs/2106.04799)
- [Multi-Game Decision Transformers](https://arxiv.org/abs/2205.15241)
- [Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes](https://arxiv.org/abs/2211.15144)[nature_dqn]: https://www.nature.com/articles/nature14236?wm=book_wap_0005
[gsutil_install]: https://cloud.google.com/storage/docs/gsutil_install#install
[gsutil]: https://cloud.google.com/storage/docs/gsutil
[batch_rl]: http://tgabel.de/cms/fileadmin/user_upload/documents/Lange_Gabel_EtAl_RL-Book-12.pdf
[stochastic_ale]: https://arxiv.org/abs/1709.06009
[ale]: https://github.com/mgbellemare/Arcade-Learning-Environment
[gcp_bucket]: https://console.cloud.google.com/storage/browser/atari-replay-datasets
[project_page]: https://offline-rl.github.io## Asymptotic Performance of offline agents on Atari-replay dataset
## Installation
Install the dependencies below, based on your operating system, and then
install Dopamine, *e.g*.```
pip install git+https://github.com/google/dopamine.git
```Finally, download the source code for batch RL, *e.g.*
```
git clone https://github.com/google-research/batch_rl.git
```### Ubuntu
If you don't have access to a GPU, then replace `tensorflow-gpu` with
`tensorflow` in the line below (see [Tensorflow
instructions](https://www.tensorflow.org/install/install_linux) for details).```
sudo apt-get update && sudo apt-get install cmake zlib1g-dev
pip install absl-py atari-py gin-config gym opencv-python tensorflow-gpu
```### Mac OS X
```
brew install cmake zlib
pip install absl-py atari-py gin-config gym opencv-python tensorflow
```## Running Tests
Assuming that you have cloned the
[batch_rl](https://github.com/google-research/batch_rl.git) repository,
follow the instructions below to run unit tests.#### Basic test
You can test whether basic code is working by running the following:```
cd batch_rl
python -um batch_rl.tests.atari_init_test
```#### Test for training an agent with fixed replay buffer
To test an agent using a fixed replay buffer, first generate the data for the
Atari 2600 game of `Pong` to `$DATA_DIR`.```
export DATA_DIR="Insert directory name here"
mkdir -p $DATA_DIR/Pong
gsutil -m cp -R gs://atari-replay-datasets/dqn/Pong/1 $DATA_DIR/Pong
```Assuming the replay data is present in `$DATA_DIR/Pong/1/replay_logs`, run the `FixedReplayDQNAgent` on `Pong` using the logged DQN data:
```
cd batch_rl
python -um batch_rl.tests.fixed_replay_runner_test \
--replay_dir=$DATA_DIR/Pong/1
```## Training batch agents on DQN data
The entry point to the standard Atari 2600 experiment is
[`batch_rl/fixed_replay/train.py`](https://github.com/google-research/batch_rl/blob/master/batch_rl/fixed_replay/train.py).
Run the batch `DQN` agent using the following command:```
python -um batch_rl.fixed_replay.train \
--base_dir=/tmp/batch_rl \
--replay_dir=$DATA_DIR/Pong/1 \
--gin_files='batch_rl/fixed_replay/configs/dqn.gin'
```By default, this will kick off an experiment lasting 200 training iterations
(equivalent to experiencing 200 million frames for an online agent).To get finer-grained information about the process,
you can adjust the experiment parameters in
[`batch_rl/fixed_replay/configs/dqn.gin`](https://github.com/google-research/batch_rl/blob/master/batch_rl/fixed_replay/configs/dqn.gin),
in particular by increasing the `FixedReplayRunner.num_iterations` to see
the asymptotic performance of the batch agents. For example,
run the batch `REM` agent for 800 training iterations on the game of Pong
using the following command:```
python -um batch_rl.fixed_replay.train \
--base_dir=/tmp/batch_rl \
--replay_dir=$DATA_DIR/Pong/1 \
--agent_name=multi_head_dqn \
--gin_files='batch_rl/fixed_replay/configs/rem.gin' \
--gin_bindings='FixedReplayRunner.num_iterations=1000' \
--gin_bindings='atari_lib.create_atari_environment.game_name = "Pong"'
```More generally, since this code is based on Dopamine, it can be
easily configured using the
[gin configuration framework](https://github.com/google/gin-config).## Dependencies
The code was tested under Ubuntu 16 and uses these packages:
- tensorflow-gpu>=1.13
- absl-py
- atari-py
- gin-config
- opencv-python
- gym
- numpyThe python version upto `3.7.9` has been [reported to work](https://github.com/google-research/batch_rl/issues/21).
Citing
------
If you find this open source release useful, please reference in your paper:> Agarwal, R., Schuurmans, D. & Norouzi, M.. (2020).
> An Optimistic Perspective on Offline Reinforcement Learning
> *International Conference on Machine Learning (ICML)*.@inproceedings{agarwal2020optimistic,
title={An Optimistic Perspective on Offline Reinforcement Learning},
author={Agarwal, Rishabh and Schuurmans, Dale and Norouzi, Mohammad},
journal={International Conference on Machine Learning},
year={2020}
}Note: A previous version of this work was titled "Striving for Simplicity in Off
Policy Deep Reinforcement Learning" and was presented as a contributed talk at
NeurIPS 2019 DRL Workshop.Disclaimer: This is not an official Google product.