{"id":13774818,"url":"https://github.com/denisyarats/exorl","last_synced_at":"2025-05-11T07:30:38.493Z","repository":{"id":50363927,"uuid":"456205043","full_name":"denisyarats/exorl","owner":"denisyarats","description":"ExORL: Exploratory Data for Offline Reinforcement Learning","archived":false,"fork":false,"pushed_at":"2022-02-08T03:47:33.000Z","size":61,"stargazers_count":105,"open_issues_count":4,"forks_count":9,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-17T09:39:47.484Z","etag":null,"topics":["control","datasets","deep-learning","exporation","model-free","mujoco","off-policy","offline-rl","python","pytorch","reinforcement-learning","unsupevised"],"latest_commit_sha":null,"homepage":"https://sites.google.com/view/exorl","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/denisyarats.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-02-06T16:25:58.000Z","updated_at":"2024-11-16T19:33:03.000Z","dependencies_parsed_at":"2022-08-19T13:20:43.307Z","dependency_job_id":null,"html_url":"https://github.com/denisyarats/exorl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisyarats%2Fexorl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisyarats%2Fexorl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisyarats%2Fexorl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisyarats%2Fexorl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/denisyarats","download_url":"https://codeload.github.com/denisyarats/exorl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253532984,"owners_count":21923343,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["control","datasets","deep-learning","exporation","model-free","mujoco","off-policy","offline-rl","python","pytorch","reinforcement-learning","unsupevised"],"created_at":"2024-08-03T17:01:30.645Z","updated_at":"2025-05-11T07:30:38.275Z","avatar_url":"https://github.com/denisyarats.png","language":"Python","funding_links":[],"categories":["Open Source Software/Implementations"],"sub_categories":["Off-Policy Evaluation and Learning: Applications"],"readme":"\n\n# ExORL: Exploratory Data for Offline Reinforcement Learning\n\nThis is an original PyTorch implementation of the ExORL framework from\n\n[Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning](https://arxiv.org/abs/2201.13425) by\n\n[Denis Yarats*](https://cs.nyu.edu/~dy1042/), [David Brandfonbrener*](https://davidbrandfonbrener.github.io/), [Hao Liu](https://www.haoliu.site/), [Misha Laskin](https://www.mishalaskin.com/), [Pieter Abbeel](https://people.eecs.berkeley.edu/~pabbeel/), [Alessandro Lazaric](http://chercheurs.lille.inria.fr/~lazaric/Webpage/Home/Home.html), and [Lerrel Pinto](https://www.lerrelpinto.com).\n\n*Equal contribution.\n\n## Prerequisites\n\nInstall [MuJoCo](http://www.mujoco.org/) if it is not already the case:\n\n* Download MuJoCo binaries [here](https://mujoco.org/download).\n* Unzip the downloaded archive into `~/.mujoco/`.\n* Append the MuJoCo subdirectory bin path into the env variable `LD_LIBRARY_PATH`.\n\nInstall the following libraries:\n```sh\nsudo apt update\nsudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip\n```\n\nInstall dependencies:\n```sh\nconda env create -f conda_env.yml\nconda activate exorl\n```\n\n## Datasets\nWe provide exploratory datasets for 6 DeepMind Control Stuite domains\n| Domain | Dataset name | Available task names |\n|---|---|---|\n| Cartpole | `cartpole` | `cartpole_balance`, `cartpole_balance_sparse`, `cartpole_swingup`, `cartpole_swingup_sparse` |\n| Cheetah | `cheetah` | `cheetah_run`, `cheetah_run_backward` |\n| Jaco Arm | `jaco` | `jaco_reach_top_left`, `jaco_reach_top_right`, `jaco_reach_bottom_left`, `jaco_reach_bottom_right` |\n| Point Mass Maze | `point_mass_maze` | `point_mass_maze_reach_top_left`, `point_mass_maze_reach_top_right`, `point_mass_maze_reach_bottom_left`, `point_mass_maze_reach_bottom_right`  | \n| Quadruped | `quadruped` | `quadruped_walk`, `quadruped_run` |\n| Walker | `walker` | `walker_stand`, `walker_walk`, `walker_run` |\n\n\nFor each domain we collected datasets by running 9 unsupervised RL algorithms from [URLB](https://github.com/rll-research/url_benchmark) for total of `10M` steps. Here is the list of algorithms\n| Unsupervised RL method | Name | Paper |\n|---|---|---|\n| APS | `aps` |  [paper](http://proceedings.mlr.press/v139/liu21b.html)|\n| APT(ICM) | `icm_apt` |  [paper](https://arxiv.org/abs/2103.04551)|\n| DIAYN | `diayn` |[paper](https://arxiv.org/abs/1802.06070)|\n| Disagreement | `disagreement` | [paper](https://arxiv.org/abs/1906.04161) |\n| ICM | `icm` | [paper](https://arxiv.org/abs/1705.05363)|\n| ProtoRL | `proto` | [paper](https://arxiv.org/abs/2102.11271)|\n| Random | `random` |  N/A |\n| RND | `rnd` |  [paper](https://arxiv.org/abs/1810.12894) |\n| SMM | `smm` |  [paper](https://arxiv.org/abs/1906.05274) |\n\nYou can download a dataset by running `./download.sh \u003cDOMAIN\u003e \u003cALGO\u003e`, for example to download ProtoRL dataset for Walker, run\n```sh\n./download.sh walker proto\n```\nThe script will download the dataset from S3 and store it under `datasets/walker/proto/`, where you can find episodes (under `buffer`) and episode videos (under `video`).\n\n## Offline RL training\nWe also provide implementation of 5 offline RL algorithms for evaluating the datasets\n| Offline RL method | Name | Paper |\n|---|---|---|\n| Behavior Cloning | `bc` |  [paper](https://proceedings.neurips.cc/paper/1988/file/812b4ba287f5ee0bc9d43bbf5bbe87fb-Paper.pdf)|\n| CQL | `cql` |  [paper](https://arxiv.org/pdf/2006.04779.pdf)|\n| CRR | `crr` |[paper](https://arxiv.org/pdf/2006.15134.pdf)|\n| TD3+BC | `td3_bc` | [paper](https://arxiv.org/pdf/2106.06860.pdf) |\n| TD3 | `td3` | [paper](https://arxiv.org/pdf/1802.09477.pdf)|\n\nAfter downloading required datasets, you can evaluate it using offline RL methon for a specific task. For example, to evaluate a dataset collected by ProtoRL on Walker for the waling task using TD3+BC you can run\n```sh\npython train_offline.py agent=td3_bc expl_agent=proto task=walker_walk\n```\nLogs are stored in the `output` folder. To launch tensorboard run:\n```sh\ntensorboard --logdir output\n```\n\n## Citation\n\nIf you use this repo in your research, please consider citing the paper as follows:\n```\n@article{yarats2022exorl,\n  title={Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning},\n  author={Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto},\n  journal={arXiv preprint arXiv:2201.13425},\n  year={2022}\n}\n```\n\n\n## License\nThe majority of ExORL is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdenisyarats%2Fexorl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdenisyarats%2Fexorl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdenisyarats%2Fexorl/lists"}