{"id":13679335,"url":"https://github.com/google-research/batch_rl","last_synced_at":"2025-05-11T09:32:14.517Z","repository":{"id":40256308,"uuid":"198726920","full_name":"google-research/batch_rl","owner":"google-research","description":"Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games","archived":true,"fork":false,"pushed_at":"2023-06-26T15:14:05.000Z","size":87,"stargazers_count":548,"open_issues_count":11,"forks_count":75,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-03-18T07:23:00.461Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://offline-rl.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-07-25T00:21:20.000Z","updated_at":"2025-03-16T12:27:26.000Z","dependencies_parsed_at":"2024-01-14T15:23:12.977Z","dependency_job_id":"f6544b75-990f-413f-9736-402965d7ed01","html_url":"https://github.com/google-research/batch_rl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbatch_rl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbatch_rl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbatch_rl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbatch_rl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/batch_rl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253545132,"owners_count":21925340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T13:01:04.411Z","updated_at":"2025-05-11T09:32:14.215Z","avatar_url":"https://github.com/google-research.png","language":"Python","funding_links":[],"categories":["Python","时间序列"],"sub_categories":["网络服务_其他"],"readme":"\n# [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ktlNni_vwFpFtCgUez-RHW0OdGc2U_Wv?usp=sharing) [![Website](https://img.shields.io/badge/www-Website-green)](https://offline-rl.github.io) [![Blog](https://img.shields.io/badge/b-Blog-blue)](https://ai.googleblog.com/2020/04/an-optimistic-perspective-on-offline.html) [![JAX Code](https://img.shields.io/badge/JAX-Code-orange)](https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl)  \n\n\n\n# An Optimistic Perspective on Offline Reinforcement Learning (ICML, 2020)\n\n\n\nThis project provides the open source implementation using the\n[Dopamine][dopamine] framework for running experiments mentioned in [An Optimistic Perspective on Offline Reinforcement Learning][paper].\nIn this work, we use the logged experiences of a DQN agent for training off-policy\nagents (shown below) in an offline setting (*i.e.*, [batch RL][batch_rl]) without any new\ninteraction with the environment during training. Refer to\n[offline-rl.github.io][project_page] for the project page.\n\n\u003cimg src=\"https://i.imgur.com/Ntgcecq.png\" width=\"95%\"\n     alt=\"Architechture of different off-policy agents\" \u003e\n\n[paper]: https://arxiv.org/pdf/1907.04543.pdf\n[dopamine]: https://github.com/google/dopamine\n\n# Important notes on Atari ROM versions\n\nThe DQN replay dataset is generated using [a legacy set of Atari ROMs](https://github.com/openai/atari-py/tree/0.2.5/atari_py/atari_roms) specified in [`atari-py\u003c=0.2.5`](https://github.com/openai/atari-py/tree/0.2.5), which is different from the ones specified in [`atari-py\u003e=0.2.6`](https://github.com/openai/atari-py/tree/0.2.6) or in recent versions of [`ale-py`](https://github.com/mgbellemare/Arcade-Learning-Environment). To avoid train/evaluation mismatches, it is important to use `atari-py\u003c=0.2.5` and also `gym\u003c=0.19.0`, as higher versions of `gym` no longer support `atari-py`. \n\nAlternatively, if you prefer to use recent versions of `ale-py` and `gym`, you can manually download the legacy ROMs from [`atari-py==0.2.5`](https://github.com/openai/atari-py/tree/0.2.5/atari_py/atari_roms) and specify the ROM paths in `ale-py`. For example, assuming `atari_py_rom_breakout` is the path to the downloaded ROM file `breakout.bin`, you can do the following before creating the gym environment:\n\n```\nimport ale_py.roms\nale_py.roms.Breakout = atari_py_rom_breakout\n```\n\nNote that this is an ad-hoc trick to circumvent the md5 checks in `ale-py\u003c=0.7.5` and it may not work in future versions of `ale-py`. **Do not use this solution unless you know what you are doing**.\n\n# How to train offline agents on 50M dataset without RAM errors?\nPlease refer to https://github.com/google-research/batch_rl/issues/10.\n\n# JAX codebase \n[https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl](https://github.com/google/dopamine/tree/master/dopamine/labs/offline_rl).\n\n## DQN Replay Dataset (Logged DQN data)\n\nThe DQN Replay Dataset was collected as follows:\nWe first train a [DQN][nature_dqn] agent, on all 60 [Atari 2600 games][ale]\nwith [sticky actions][stochastic_ale] enabled for 200 million frames (standard protocol) and save all of the experience tuples\nof *(observation, action, reward, next observation)* (approximately 50 million)\nencountered during training.\n\nThis logged DQN data can be found in the public [GCP bucket][gcp_bucket]\n`gs://atari-replay-datasets` which can be downloaded using [`gsutil`][gsutil].\nTo install gsutil, follow the instructions [here][gsutil_install].\n\nAfter installing gsutil, run the command to copy the entire dataset:\n\n```\ngsutil -m cp -R gs://atari-replay-datasets/dqn ./\n```\n\nTo run the dataset only for a specific Atari 2600 game (*e.g.*, replace `GAME_NAME`\nby `Pong` to download the logged DQN replay datasets for the game of Pong),\nrun the command:\n\n```\ngsutil -m cp -R gs://atari-replay-datasets/dqn/[GAME_NAME] ./\n```\n\nThis data can be generated by running the online agents using\n[`batch_rl/baselines/train.py`](https://github.com/google-research/batch_rl/blob/master/batch_rl/baselines/train.py) for 200 million frames\n(standard protocol). Note that the dataset consists of approximately 50 million\nexperience tuples due to frame skipping (*i.e.*, repeating a selected action for\n`k` consecutive frames) of 4. The stickiness parameter is set to 0.25, *i.e.*,\nthere is 25% chance at every time step that the environment will execute the\nagent's previous action again, instead of the agent's new action.\n\n#### Some Publications using DQN Replay Dataset (please open a pull request for missing entries):\n- [Revisiting Fundamentals of Experience Replay](https://arxiv.org/abs/2007.06700) \n- [RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning](https://arxiv.org/abs/2006.13888)\n- [Conservative Q-Learning for Offline Reinforcement Learning](https://arxiv.org/abs/2006.04779) \n- [Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning](https://arxiv.org/abs/2010.14498) \n- [Acme: A new framework for distributed reinforcement learning](https://arxiv.org/abs/2006.00979) \n- [Regularized Behavior Value Estimation](https://arxiv.org/abs/2103.09575)\n- [Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)\n- [Provable Representation Learning for Imitation with Contrastive Fourier Features](https://arxiv.org/abs/2105.12272)\n- [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345)\n- [DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization](https://arxiv.org/abs/2112.04716)\n- [Pretraining Representations for Data-Efficient Reinforcement Learning](https://arxiv.org/abs/2106.04799)\n- [Multi-Game Decision Transformers](https://arxiv.org/abs/2205.15241)\n- [Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes](https://arxiv.org/abs/2211.15144)\n\n[nature_dqn]: https://www.nature.com/articles/nature14236?wm=book_wap_0005\n[gsutil_install]: https://cloud.google.com/storage/docs/gsutil_install#install\n[gsutil]: https://cloud.google.com/storage/docs/gsutil\n[batch_rl]: http://tgabel.de/cms/fileadmin/user_upload/documents/Lange_Gabel_EtAl_RL-Book-12.pdf\n[stochastic_ale]: https://arxiv.org/abs/1709.06009\n[ale]: https://github.com/mgbellemare/Arcade-Learning-Environment\n[gcp_bucket]: https://console.cloud.google.com/storage/browser/atari-replay-datasets\n[project_page]: https://offline-rl.github.io\n\n## Asymptotic Performance of offline agents on Atari-replay dataset\n\n\u003cdiv\u003e\n  \u003cimg src=\"https://i.imgur.com/gAWGgJx.png\" width=\"49%\" \n       alt=\"Number of games where a batch agent outperforms online DQN\"\u003e\n  \u003cimg src=\"https://i.imgur.com/QJiCg37.png\" width=\"49%\" \n       alt=\"Asymptotic Performance of offline agents on DQN data\"\u003e\n\u003c/div\u003e\n\n## Installation\nInstall the dependencies below, based on your operating system, and then\ninstall Dopamine, *e.g*.\n\n```\npip install git+https://github.com/google/dopamine.git\n```\n\nFinally, download the source code for batch RL, *e.g.*\n\n```\ngit clone https://github.com/google-research/batch_rl.git\n```\n\n### Ubuntu\n\nIf you don't have access to a GPU, then replace `tensorflow-gpu` with\n`tensorflow` in the line below (see [Tensorflow\ninstructions](https://www.tensorflow.org/install/install_linux) for details).\n\n```\nsudo apt-get update \u0026\u0026 sudo apt-get install cmake zlib1g-dev\npip install absl-py atari-py gin-config gym opencv-python tensorflow-gpu\n```\n\n### Mac OS X\n\n```\nbrew install cmake zlib\npip install absl-py atari-py gin-config gym opencv-python tensorflow\n```\n\n## Running Tests\n\nAssuming that you have cloned the\n[batch_rl](https://github.com/google-research/batch_rl.git) repository,\nfollow the instructions below to run unit tests.\n\n#### Basic test\nYou can test whether basic code is working by running the following:\n\n```\ncd batch_rl\npython -um batch_rl.tests.atari_init_test\n```\n\n#### Test for training an agent with fixed replay buffer\nTo test an agent using a fixed replay buffer, first generate the data for the\nAtari 2600 game of `Pong` to `$DATA_DIR`.\n\n```\nexport DATA_DIR=\"Insert directory name here\"\nmkdir -p $DATA_DIR/Pong\ngsutil -m cp -R gs://atari-replay-datasets/dqn/Pong/1 $DATA_DIR/Pong\n```\n\nAssuming the replay data is present in `$DATA_DIR/Pong/1/replay_logs`, run the `FixedReplayDQNAgent` on `Pong` using the logged DQN data:\n\n```\ncd batch_rl\npython -um batch_rl.tests.fixed_replay_runner_test \\\n  --replay_dir=$DATA_DIR/Pong/1\n```\n\n## Training batch agents on DQN data\n\nThe entry point to the standard Atari 2600 experiment is\n[`batch_rl/fixed_replay/train.py`](https://github.com/google-research/batch_rl/blob/master/batch_rl/fixed_replay/train.py).\nRun the batch `DQN` agent using the following command:\n\n```\npython -um batch_rl.fixed_replay.train \\\n  --base_dir=/tmp/batch_rl \\\n  --replay_dir=$DATA_DIR/Pong/1 \\\n  --gin_files='batch_rl/fixed_replay/configs/dqn.gin'\n```\n\nBy default, this will kick off an experiment lasting 200 training iterations\n(equivalent to experiencing 200 million frames for an online agent).\n\nTo get finer-grained information about the process,\nyou can adjust the experiment parameters in\n[`batch_rl/fixed_replay/configs/dqn.gin`](https://github.com/google-research/batch_rl/blob/master/batch_rl/fixed_replay/configs/dqn.gin),\nin particular by increasing the `FixedReplayRunner.num_iterations` to see\nthe asymptotic performance of the batch agents. For example,\nrun the batch `REM` agent for 800 training iterations on the game of Pong \nusing the following command:\n\n```\npython -um batch_rl.fixed_replay.train \\\n  --base_dir=/tmp/batch_rl \\\n  --replay_dir=$DATA_DIR/Pong/1 \\\n  --agent_name=multi_head_dqn \\\n  --gin_files='batch_rl/fixed_replay/configs/rem.gin' \\\n  --gin_bindings='FixedReplayRunner.num_iterations=1000' \\\n  --gin_bindings='atari_lib.create_atari_environment.game_name = \"Pong\"'\n```\n\nMore generally, since this code is based on Dopamine, it can be\neasily configured using the\n[gin configuration framework](https://github.com/google/gin-config).\n\n\n## Dependencies\n\nThe code was tested under Ubuntu 16 and uses these packages:\n\n- tensorflow-gpu\u003e=1.13\n- absl-py\n- atari-py\n- gin-config\n- opencv-python\n- gym\n- numpy\n\nThe python version upto `3.7.9` has been [reported to work](https://github.com/google-research/batch_rl/issues/21).\n\nCiting\n------\nIf you find this open source release useful, please reference in your paper:\n\n\u003e Agarwal, R., Schuurmans, D. \u0026 Norouzi, M.. (2020).\n\u003e An Optimistic Perspective on Offline Reinforcement Learning\n\u003e *International Conference on Machine Learning (ICML)*.\n\n    @inproceedings{agarwal2020optimistic,\n      title={An Optimistic Perspective on Offline Reinforcement Learning},\n      author={Agarwal, Rishabh and Schuurmans, Dale and Norouzi, Mohammad},\n      journal={International Conference on Machine Learning},\n      year={2020}\n    }\n\n\nNote: A previous version of this work was titled \"Striving for Simplicity in Off\nPolicy Deep Reinforcement Learning\" and was presented as a contributed talk at\nNeurIPS 2019 DRL Workshop.\n\nDisclaimer: This is not an official Google product.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fbatch_rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Fbatch_rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fbatch_rl/lists"}