https://github.com/google-research/deep_ope

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/google-research/deep_ope
Owner: google-research
License: apache-2.0
Created: 2020-09-29T22:36:51.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2024-07-30T21:23:59.000Z (12 months ago)
Last Synced: 2025-01-24T13:08:15.398Z (6 months ago)
Language: Jupyter Notebook
Size: 1.1 MB
Stars: 85
Watchers: 8
Forks: 9
Open Issues: 4
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

awesome-offline-rl - Benchmarks for Deep Off-Policy Evaluation

README

# [Benchmarks for Deep Off-Policy Evaluation](https://openreview.net/forum?id=kWSeGEeHvF8)
### Contact: [email protected]

In
[D4RL](https://sites.google.com/view/d4rl/home) and [RL Unplugged: Benchmarks for Offline Reinforcement Learning](https://arxiv.org/abs/2006.13888),
we released a suite of benchmarks for offline reinforcement learning. They are designed to facilitate ease of use, so we provided the datasets
with a unified API which makes it easy for the practitioner to work with all
data in the suite once a general pipeline has been established.

Here, we release policies which can be used in conjunction with the RL Unplugged
and D4RL datasets to facilitate off-policy evaluation and offline model selection
benchmarking.

In this release, we provide:

- Policies for the tasks in the D4RL, DeepMind Locomotion and Control Suite datasets
(described below).

- Policies trained with the following algorithms (D4PG, ABM, CRR, SAC, DAPG and BC) and snapshots along the training trajectory. This faciliates benchmarking offline model selection.

The policies are available under
[gs://gresearch/deep-ope](https://console.cloud.google.com/storage/browser/gresearch/deep-ope), with the RL Unplugged policies in the subdirectory [gs://gresearch/deep-ope/rlunplugged](https://console.cloud.google.com/storage/browser/gresearch/deep-ope/rlunplugged) and the D4RL policies in the subdirectory [gs://gresearch/deep-ope/d4rl](https://console.cloud.google.com/storage/browser/gresearch/deep-ope/d4rl).

## Task Descriptions

### DeepMind Locomotion Dataset

These tasks are made up of the corridor locomotion tasks involving the CMU
Humanoid, for which prior efforts have either used motion capture data
([Merel et al., 2019a], [Merel et al., 2019b]) or training from scratch
([Song et al., 2020]). In addition, the DM Locomotion repository contains a set of
tasks adapted to be suited to a virtual rodent ([Merel et al., 2020]). We
emphasize that the DM Locomotion tasks feature the combination of challenging
high-DoF continuous control along with perception from rich egocentric
observations. For details on how the dataset was generated, please refer to
[RL Unplugged: Benchmarks for Offline Reinforcement Learning](https://arxiv.org/abs/2006.13888).

### DeepMind Control Suite Dataset

DeepMind Control Suite ([Tassa et al., 2018]) is a set of control tasks
implemented in MuJoCo ([Todorov et al., 2012]). We consider a subset of the tasks
provided in the suite that cover a wide range of difficulties.

Most of the datasets in this domain are generated using D4PG. For the
environments Manipulator insert ball and Manipulator insert peg we use V-MPO
([Song et al., 2020]) to generate the data as D4PG is unable to solve these tasks.
We release datasets for 9 control suite tasks. For details on how the dataset
was generated, please refer to
[RL Unplugged: Benchmarks for Offline Reinforcement Learning](https://arxiv.org/abs/2006.13888).

### D4RL Dataset

A subset of the tasks within the D4RL ([Fu et. al. 2020]) for offline reinforcement
learning is included. These tasks include maze navigation with different robot
morphologies, hand manipulation tasks ([Rajeswaran et. al. 2017]), and tasks from
the OpenAI Gym bechmark ([Brockman et. al. 2016]).

Each task includes a variety of datasets in order to study the interaction between
dataset distributions and policies. For further information on what datasets
are available, please refer to [D4RL: Datasets for Deep Data-Driven Reinforcement Learning](https://arxiv.org/abs/2004.07219).

## Using the policies

The `rlunplugged_policies.json` file provides metadata about the policies in this dataset.
It is structured as a list of dictionaries, one for each policy, where the keys
contain metadata including:

- policy_path: The path to the policy on Google Cloud Storage.

- task.task_name: The task that the policy is trained for.

- agent_name: The training algorithm used to learn the policy.

- snapshot_name: Contains the learning step for this policy snapshot.

- return_mean: The mean return estimated with Monte Carlo rollouts.

- return_std: The standard error of the mean estimate.

The 'd4rl_policies.json' file contains metadata in a similar format:

- policy_path: The path to the policy on Google Cloud Storage.

- task.task_names: A list of tasks that the policy is trained for. (Each task represets a different dataset)

- agent_name: The training algorithm used to learn the policy.

- return_mean: The mean return estimated with Monte Carlo rollouts.

- return_std: The standard error of the mean estimate.

Requirements:

* Install dependencies: `pip install -r requirements.txt`
* (Optional) Setup MuJoCo license key for DM Control and D4RL environments
([instructions](https://github.com/deepmind/dm_control#requirements-and-installation)).

### Policy loading example

RLUnplugged policies are stored as
[TensorFlow SavedModels](https://www.tensorflow.org/guide/saved_model). Calling
the policy on an observation returns an action sample. See `load_rlunplugged_policy_example.py` for an example of loading a policy.

D4RL policies are stored as pickle files containing weights.
See `load_d4rl_policy_example.py` for an example of loading a policy.
### Compute evaluation metrics

```
TODO Fill in example computing groundtruth and evaluation metrics.
```

## Dataset Metadata

The following table is necessary for this dataset to be indexed by search
engines such as Google Dataset Search.

property
value

name
Benchmarks for Deep Off-Policy Evaluation

url
https://github.com/google-research/deep_ope

sameAs
https://github.com/google-research/deep_ope

description
Data accompanying [Benchmarks for Deep Off-Policy Evaluation]().

provider

property
value

name
Google

sameAs
https://en.wikipedia.org/wiki/Google

[Agarwal et al., 2020]: https://arxiv.org/abs/1907.04543
[Brockman et. al. 2016]: https://arxiv.org/abs/1606.01540
[Fu et. al. 2020]: https://arxiv.org/abs/2004.07219
[Machado et al., 2018]: https://arxiv.org/abs/1709.06009
[Merel et al., 2019a]: https://arxiv.org/abs/1811.09656
[Merel et al., 2019b]: https://arxiv.org/abs/1811.11711
[Merel et al., 2020]: https://arxiv.org/abs/1911.09451
[Rajeswaran et. al. 2017]: https://arxiv.org/abs/1709.10087
[Song et al., 2020]: https://arxiv.org/abs/1909.12238
[Tassa et al., 2018]: https://arxiv.org/abs/1801.00690
[Todorov et al., 2012]: https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/google-research/deep_ope

Awesome Lists containing this project

README