{"id":13528423,"url":"https://github.com/rail-berkeley/rlkit","last_synced_at":"2025-04-11T06:28:23.299Z","repository":{"id":37689154,"uuid":"118840360","full_name":"rail-berkeley/rlkit","owner":"rail-berkeley","description":"Collection of reinforcement learning algorithms","archived":false,"fork":false,"pushed_at":"2024-06-17T17:33:51.000Z","size":1052,"stargazers_count":2641,"open_issues_count":39,"forks_count":557,"subscribers_count":61,"default_branch":"master","last_synced_at":"2025-04-09T14:06:53.457Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rail-berkeley.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-01-25T00:34:46.000Z","updated_at":"2025-04-09T06:52:39.000Z","dependencies_parsed_at":"2024-11-19T22:18:19.040Z","dependency_job_id":null,"html_url":"https://github.com/rail-berkeley/rlkit","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rail-berkeley%2Frlkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rail-berkeley%2Frlkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rail-berkeley%2Frlkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rail-berkeley%2Frlkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rail-berkeley","download_url":"https://codeload.github.com/rail-berkeley/rlkit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248353999,"owners_count":21089735,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T07:00:18.973Z","updated_at":"2025-04-11T06:28:23.275Z","avatar_url":"https://github.com/rail-berkeley.png","language":"Python","funding_links":[],"categories":["Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL)","Libraries","Python","Table of Contents"],"sub_categories":["RL/DRL Algorithm Implementations and Software Frameworks"],"readme":"# RLkit\nReinforcement learning framework and algorithms implemented in PyTorch.\n\nImplemented algorithms:\n - Semi-supervised Meta Actor Critic\n    - [example script](examples/smac/ant.py)\n    - [paper](https://arxiv.org/abs/2107.03974)\n    - [Documentation](docs/SMAC.md)\n - Skew-Fit\n    - [example script](examples/skewfit/sawyer_door.py)\n    - [paper](https://arxiv.org/abs/1903.03698)\n    - [Documentation](docs/SkewFit.md)\n    - Requires [multiworld](https://github.com/vitchyr/multiworld) to be installed\n - Reinforcement Learning with Imagined Goals (RIG)\n    - See [this version](https://github.com/vitchyr/rlkit/tree/v0.1.2) of this repository.\n    - [paper](https://arxiv.org/abs/1807.04742)\n - Temporal Difference Models (TDMs)\n    - Only implemented in [v0.1.2 of RLkit](https://github.com/vitchyr/rlkit/tree/v0.1.2). See Legacy Documentation section below.\n    - [paper](https://arxiv.org/abs/1802.09081)\n    - [Documentation](docs/TDMs.md)\n - Hindsight Experience Replay (HER)\n    - [example script](examples/her/her_sac_gym_fetch_reach.py)\n    - [paper](https://arxiv.org/abs/1707.01495)\n    - [Documentation](docs/HER.md)\n - (Double) Deep Q-Network (DQN)\n    - [example script](examples/dqn_and_double_dqn.py)\n    - [paper](https://arxiv.org/abs/1509.06461.pdf)\n    - [Double Q-learning paper](https://www.nature.com/articles/nature14236)\n - Soft Actor Critic (SAC)\n    - [example script](examples/sac.py)\n    - [original paper](https://arxiv.org/abs/1801.01290) and [updated\n    version](https://arxiv.org/abs/1812.05905)\n    - [TensorFlow implementation from author](https://github.com/rail-berkeley/softlearning)\n    - Includes the \"min of Q\" method, the entropy-constrained implementation,\n     reparameterization trick, and numerical tanh-Normal Jacbian calcuation.\n - Twin Delayed Deep Determinstic Policy Gradient (TD3)\n    - [example script](examples/td3.py)\n    - [paper](https://arxiv.org/abs/1802.09477)\n - Advantage Weighted Actor Critic (AWAC)\n    - [example scripts](examples/awac)\n    - [paper](https://arxiv.org/abs/2006.09359)\n - Implicit Q-Learning (IQL)\n    - [example scripts](examples/iql)\n    - [paper](https://arxiv.org/abs/2110.06169)\n\nTo get started, checkout the example scripts, linked above.\n\n## What's New\n### Version 0.2\n\n#### 04/25/2019\n - Use new `multiworld` code that requires explicit environment registration.\n - Make installation easier by adding `setup.py` and using default `conf.py`.\n\n#### 04/16/2019\n - Log how many train steps were called\n - Log `env_info` and `agent_info`.\n\n#### 04/05/2019-04/15/2019\n - Add rendering\n - Fix SAC bug to account for future entropy (#41, #43)\n - Add online algorithm mode (#42)\n\n#### 04/05/2019\n\nThe initial release for 0.2 has the following major changes:\n - Remove `Serializable` class and use default pickle scheme.\n - Remove `PyTorchModule` class and use native `torch.nn.Module` directly.\n - Switch to batch-style training rather than online training.\n   - Makes code more amenable to parallelization.\n   - Implementing the online-version is straightforward.\n - Refactor training code to be its own object, rather than being integrated\n inside of `RLAlgorithm`.\n - Refactor sampling code to be its own object, rather than being integrated\n inside of `RLAlgorithm`.\n - Implement [Skew-Fit:\nState-Covering Self-Supervised Reinforcement Learning](https://arxiv.org/abs/1903.03698),\na method for performing goal-directed exploration to maximize the entropy of\nvisited states.\n - Update soft actor-critic to more closely match TensorFlow implementation:\n   - Rename `TwinSAC` to just `SAC`.\n   - Only have Q networks.\n   - Remove unnecessary policy regualization terms.\n   - Use numerically stable Jacobian computation.\n\nOverall, the refactors are intended to make the code more modular and\nreadable than the previous versions.\n\n### Version 0.1\n#### 12/04/2018\n - Add RIG implementation\n\n#### 12/03/2018\n - Add HER implementation\n - Add doodad support\n\n#### 10/16/2018\n - Upgraded to PyTorch v0.4\n - Added Twin Soft Actor Critic Implementation\n - Various small refactor (e.g. logger, evaluate code)\n\n## Installation\n\n1. Install and use the included Ananconda environment\n```\n$ conda env create -f environment/[linux-cpu|linux-gpu|mac]-env.yml\n$ source activate rlkit\n(rlkit) $ python examples/ddpg.py\n```\nChoose the appropriate `.yml` file for your system.\nThese Anaconda environments use MuJoCo 1.5 and gym 0.10.5.\nYou'll need to [get your own MuJoCo key](https://www.roboti.us/license.html) if you want to use MuJoCo.\n\n2. Add this repo directory to your `PYTHONPATH` environment variable or simply\nrun:\n```\npip install -e .\n```\n\n3. (Optional) Copy `conf.py` to `conf_private.py` and edit to override defaults:\n```\ncp rlkit/launchers/conf.py rlkit/launchers/conf_private.py\n```\n\n4. (Optional) If you plan on running the Skew-Fit experiments or the HER\nexample with the Sawyer environment, then you need to install\n[multiworld](https://github.com/vitchyr/multiworld).\n\nDISCLAIMER: the mac environment has only been tested without a GPU.\n\nFor an even more portable solution, try using the docker image provided in `environment/docker`.\nThe Anaconda env should be enough, but this docker image addresses some of the rendering issues that may arise when using MuJoCo 1.5 and GPUs.\nThe docker image supports GPU, but it should work without a GPU.\nTo use a GPU with the image, you need to have [nvidia-docker installed](https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)).\n\n## Using a GPU\nYou can use a GPU by calling\n```\nimport rlkit.torch.pytorch_util as ptu\nptu.set_gpu_mode(True)\n```\nbefore launching the scripts.\n\nIf you are using `doodad` (see below), simply use the `use_gpu` flag:\n```\nrun_experiment(..., use_gpu=True)\n```\n\n## Visualizing a policy and seeing results\nDuring training, the results will be saved to a file called under\n```\nLOCAL_LOG_DIR/\u003cexp_prefix\u003e/\u003cfoldername\u003e\n```\n - `LOCAL_LOG_DIR` is the directory set by `rlkit.launchers.config.LOCAL_LOG_DIR`. Default name is 'output'.\n - `\u003cexp_prefix\u003e` is given either to `setup_logger`.\n - `\u003cfoldername\u003e` is auto-generated and based off of `exp_prefix`.\n - inside this folder, you should see a file called `params.pkl`. To visualize a policy, run\n\n```\n(rlkit) $ python scripts/run_policy.py LOCAL_LOG_DIR/\u003cexp_prefix\u003e/\u003cfoldername\u003e/params.pkl\n```\nor\n```\n(rlkit) $ python scripts/run_goal_conditioned_policy.py LOCAL_LOG_DIR/\u003cexp_prefix\u003e/\u003cfoldername\u003e/params.pkl\n```\ndepending on whether or not the policy is goal-conditioned.\n\nIf you have rllab installed, you can also visualize the results\nusing `rllab`'s viskit, described at\nthe bottom of [this page](http://rllab.readthedocs.io/en/latest/user/cluster.html)\n\ntl;dr run\n\n```bash\npython rllab/viskit/frontend.py LOCAL_LOG_DIR/\u003cexp_prefix\u003e/\n```\nto visualize all experiments with a prefix of `exp_prefix`. To only visualize a single run, you can do\n```bash\npython rllab/viskit/frontend.py LOCAL_LOG_DIR/\u003cexp_prefix\u003e/\u003cfolder name\u003e\n```\n\nAlternatively, if you don't want to clone all of `rllab`, a repository containing only viskit can be found [here](https://github.com/vitchyr/viskit). You can similarly visualize results with.\n```bash\npython viskit/viskit/frontend.py LOCAL_LOG_DIR/\u003cexp_prefix\u003e/\n```\nThis `viskit` repo also has a few extra nice features, like plotting multiple Y-axis values at once, figure-splitting on multiple keys, and being able to filter hyperparametrs out.\n\n## Visualizing a goal-conditioned policy\nTo visualize a goal-conditioned policy, run\n```\n(rlkit) $ python scripts/run_goal_conditioned_policy.py\nLOCAL_LOG_DIR/\u003cexp_prefix\u003e/\u003cfoldername\u003e/params.pkl\n```\n\n## Launching jobs with `doodad`\nThe `run_experiment` function makes it easy to run Python code on Amazon Web\nServices (AWS) or Google Cloud Platform (GCP) by using\n[this fork of doodad](https://github.com/vitchyr/doodad/tree/v0.2.1).\n\nIt's as easy as:\n```\nfrom rlkit.launchers.launcher_util import run_experiment\n\ndef function_to_run(variant):\n    learning_rate = variant['learning_rate']\n    ...\n\nrun_experiment(\n    function_to_run,\n    exp_prefix=\"my-experiment-name\",\n    mode='ec2',  # or 'gcp'\n    variant={'learning_rate': 1e-3},\n)\n```\nYou will need to set up parameters in config.py (see step one of Installation).\nThis requires some knowledge of AWS and/or GCP, which is beyond the scope of\nthis README.\nTo learn more, more about `doodad`, [go to the repository](https://github.com/vitchyr/doodad/), which is based on [this original repository](https://github.com/justinjfu/doodad/).\n\n# Requests for pull-requests\n - Implement policy-gradient algorithms.\n - Implement model-based algorithms.\n\n# Legacy Code (v0.1.2)\nFor Temporal Difference Models (TDMs) and the original implementation of\nReinforcement Learning with Imagined Goals (RIG), run\n`git checkout tags/v0.1.2`.\n\n# References\nThe algorithms are based on the following papers\n\n[Offline Meta-Reinforcement Learning with Online Self-Supervision](https://arxiv.org/abs/2107.03974)\nVitchyr H. Pong, Ashvin Nair, Laura Smith, Catherine Huang, Sergey Levine. arXiv preprint, 2021.\n\n[Skew-Fit: State-Covering Self-Supervised Reinforcement Learning](https://arxiv.org/abs/1903.03698).\nVitchyr H. Pong*, Murtaza Dalal*, Steven Lin*, Ashvin Nair, Shikhar Bahl, Sergey Levine. ICML, 2020.\n\n[Visual Reinforcement Learning with Imagined Goals](https://arxiv.org/abs/1807.04742).\nAshvin Nair*, Vitchyr Pong*, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine. NeurIPS 2018.\n\n[Temporal Difference Models: Model-Free Deep RL for Model-Based Control](https://arxiv.org/abs/1802.09081).\nVitchyr Pong*, Shixiang Gu*, Murtaza Dalal, Sergey Levine. ICLR 2018.\n\n[Hindsight Experience Replay](https://arxiv.org/abs/1707.01495).\nMarcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba. NeurIPS 2017.\n\n[Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461).\nHado van Hasselt, Arthur Guez, David Silver. AAAI 2016.\n\n[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236).\nVolodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis. Nature 2015.\n\n[Soft Actor-Critic Algorithms and Applications](https://arxiv.org/abs/1812.05905).\nTuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine. arXiv preprint, 2018.\n\n[Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/abs/1801.01290).\nTuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. ICML, 2018.\n\n[Addressing Function Approximation Error in Actor-Critic Methods](https://arxiv.org/abs/1802.09477)\nScott Fujimoto, Herke van Hoof, David Meger. ICML, 2018.\n\n# Credits\nThis repository was initially developed primarily by [Vitchyr Pong](https://github.com/vitchyr), until July 2021, at which point it was transferred to the RAIL Berkeley organization and is primarily maintained by [Ashvin Nair](https://github.com/anair13).\nOther major collaborators and contributions:\n - [Murtaza Dalal](https://github.com/mdalal2020)\n - [Steven Lin](https://github.com/stevenlin1111)\n\nA lot of the coding infrastructure is based on [rllab](https://github.com/rll/rllab).\nThe serialization and logger code are basically a carbon copy of the rllab versions.\n\nThe Dockerfile is based on the [OpenAI mujoco-py Dockerfile](https://github.com/openai/mujoco-py/blob/master/Dockerfile).\n\nThe SMAC code builds off of the [PEARL code](https://github.com/katerakelly/oyster), which built off of an older RLKit version.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frail-berkeley%2Frlkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frail-berkeley%2Frlkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frail-berkeley%2Frlkit/lists"}