{"id":13689059,"url":"https://github.com/DLR-RM/rl-baselines3-zoo","last_synced_at":"2025-05-01T23:31:55.741Z","repository":{"id":38087165,"uuid":"261373023","full_name":"DLR-RM/rl-baselines3-zoo","owner":"DLR-RM","description":"A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.","archived":false,"fork":false,"pushed_at":"2024-11-05T20:47:29.000Z","size":3947,"stargazers_count":2073,"open_issues_count":64,"forks_count":514,"subscribers_count":23,"default_branch":"master","last_synced_at":"2024-11-11T14:12:27.955Z","etag":null,"topics":["deep-reinforcement-learning","gym","hyperparameter-optimization","hyperparameter-search","hyperparameter-tuning","lab","openai","optimization","pybullet","pybullet-environments","pytorch","reinforcement-learning","rl","robotics","sde","stable-baselines","tuning-hyperparameters"],"latest_commit_sha":null,"homepage":"https://rl-baselines3-zoo.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DLR-RM.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-05T05:53:27.000Z","updated_at":"2024-11-11T14:00:13.000Z","dependencies_parsed_at":"2024-01-27T05:36:06.073Z","dependency_job_id":"496880b3-273e-458d-bc4c-f7ed2dc87c28","html_url":"https://github.com/DLR-RM/rl-baselines3-zoo","commit_stats":{"total_commits":361,"total_committers":38,"mean_commits":9.5,"dds":0.5623268698060941,"last_synced_commit":"8cecab429726d7e6aaebd261d26ed8fc23b7d948"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DLR-RM%2Frl-baselines3-zoo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DLR-RM%2Frl-baselines3-zoo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DLR-RM%2Frl-baselines3-zoo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DLR-RM%2Frl-baselines3-zoo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DLR-RM","download_url":"https://codeload.github.com/DLR-RM/rl-baselines3-zoo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224282097,"owners_count":17285770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-reinforcement-learning","gym","hyperparameter-optimization","hyperparameter-search","hyperparameter-tuning","lab","openai","optimization","pybullet","pybullet-environments","pytorch","reinforcement-learning","rl","robotics","sde","stable-baselines","tuning-hyperparameters"],"created_at":"2024-08-02T15:01:32.370Z","updated_at":"2025-05-01T23:31:55.733Z","avatar_url":"https://github.com/DLR-RM.png","language":"Python","funding_links":[],"categories":["Python","11. Specialized Domains"],"sub_categories":[],"readme":"\u003c!-- [![pipeline status](https://gitlab.com/araffin/rl-baselines3-zoo/badges/master/pipeline.svg)](https://gitlab.com/araffin/rl-baselines3-zoo/-/commits/master) --\u003e\n![CI](https://github.com/DLR-RM/rl-baselines3-zoo/workflows/CI/badge.svg)\n[![Documentation Status](https://readthedocs.org/projects/rl-baselines3-zoo/badge/?version=master)](https://rl-baselines3-zoo.readthedocs.io/en/master/?badge=master)\n![coverage report](https://img.shields.io/badge/coverage-68%25-brightgreen.svg?style=flat\") [![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n\n\n# RL Baselines3 Zoo: A Training Framework for Stable Baselines3 Reinforcement Learning Agents\n\n\u003cimg src=\"images/car.jpg\" align=\"right\" width=\"40%\"/\u003e\n\nRL Baselines3 Zoo is a training framework for Reinforcement Learning (RL), using [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3).\n\nIt provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.\n\nIn addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.\n\n\nWe are **looking for contributors** to complete the collection!\n\nGoals of this repository:\n\n1. Provide a simple interface to train and enjoy RL agents\n2. Benchmark the different Reinforcement Learning algorithms\n3. Provide tuned hyperparameters for each environment and RL algorithm\n4. Have fun with the trained agents!\n\nThis is the SB3 version of the original SB2 [rl-zoo](https://github.com/araffin/rl-baselines-zoo).\n\nNote: although SB3 and the RL Zoo are compatible with Numpy\u003e=2.0, you will need Numpy\u003c2 to run agents on pybullet envs (see [issue](https://github.com/bulletphysics/bullet3/issues/4649)).\n\n## Documentation\n\nDocumentation is available online: [https://rl-baselines3-zoo.readthedocs.io/](https://rl-baselines3-zoo.readthedocs.io)\n\n## Installation\n\n### Minimal installation\n\nFrom source:\n```\npip install -e .\n```\n\nAs a python package:\n```\npip install rl_zoo3\n```\n\nNote: you can do `python -m rl_zoo3.train` from any folder and you have access to `rl_zoo3` command line interface, for instance, `rl_zoo3 train` is equivalent to `python train.py`\n\n### Full installation (with extra envs and test dependencies)\n\n```\napt-get install swig cmake ffmpeg\npip install -r requirements.txt\npip install -e .[plots,tests]\n```\n\nPlease see [Stable Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/) for alternatives to install stable baselines3.\n\n## Train an Agent\n\nThe hyperparameters for each environment are defined in `hyperparameters/algo_name.yml`.\n\nIf the environment exists in this file, then you can train an agent using:\n```\npython train.py --algo algo_name --env env_id\n```\n\nEvaluate the agent every 10000 steps using 10 episodes for evaluation (using only one evaluation env):\n```\npython train.py --algo sac --env HalfCheetahBulletEnv-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1\n```\n\nMore examples are available in the [documentation](https://rl-baselines3-zoo.readthedocs.io).\n\n\n## Integrations\n\nThe RL Zoo has some integration with other libraries/services like Weights \u0026 Biases for experiment tracking or Hugging Face for storing/sharing trained models. You can find out more in the [dedicated section](https://rl-baselines3-zoo.readthedocs.io/en/master/guide/integrations.html) of the documentation.\n\n## Plot Scripts\n\nPlease see the [dedicated section](https://rl-baselines3-zoo.readthedocs.io/en/master/guide/plot.html) of the documentation.\n\n## Enjoy a Trained Agent\n\n**Note: to download the repo with the trained agents, you must use `git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo`** in order to clone the submodule too.\n\n\nIf the trained agent exists, then you can see it in action using:\n```\npython enjoy.py --algo algo_name --env env_id\n```\n\nFor example, enjoy A2C on Breakout during 5000 timesteps:\n```\npython enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder rl-trained-agents/ -n 5000\n```\n\n## Hyperparameters Tuning\n\nPlease see the [dedicated section](https://rl-baselines3-zoo.readthedocs.io/en/master/guide/tuning.html) of the documentation.\n\n## Custom Configuration\n\nPlease see the [dedicated section](https://rl-baselines3-zoo.readthedocs.io/en/master/guide/config.html) of the documentation.\n\n## Current Collection: 200+ Trained Agents!\n\nFinal performance of the trained agents can be found in [`benchmark.md`](./benchmark.md). To compute them, simply run `python -m rl_zoo3.benchmark`.\n\nList and videos of trained agents can be found on our Huggingface page: https://huggingface.co/sb3\n\n*NOTE: this is not a quantitative benchmark as it corresponds to only one run (cf [issue #38](https://github.com/araffin/rl-baselines-zoo/issues/38)). This benchmark is meant to check algorithm (maximal) performance, find potential bugs and also allow users to have access to pretrained agents.*\n\n### Atari Games\n\n7 atari games from OpenAI benchmark (NoFrameskip-v4 versions).\n\n|  RL Algo |  BeamRider         | Breakout           | Enduro             |  Pong | Qbert | Seaquest           | SpaceInvaders      |\n|----------|--------------------|--------------------|--------------------|-------|-------|--------------------|--------------------|\n| A2C      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| PPO      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| DQN      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| QR-DQN   | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n\nAdditional Atari Games (to be completed):\n\n|  RL Algo |  MsPacman   | Asteroids | RoadRunner |\n|----------|-------------|-----------|------------|\n| A2C      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| PPO      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| DQN      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| QR-DQN   | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n\n\n### Classic Control Environments\n\n|  RL Algo |  CartPole-v1 | MountainCar-v0 | Acrobot-v1 | Pendulum-v1 | MountainCarContinuous-v0 |\n|----------|--------------|----------------|------------|--------------------|--------------------------|\n| ARS      | :heavy_check_mark: | :heavy_check_mark:  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| A2C      | :heavy_check_mark: | :heavy_check_mark:  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| PPO      | :heavy_check_mark: | :heavy_check_mark:  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| DQN      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | N/A                | N/A |\n| QR-DQN   | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | N/A                | N/A |\n| DDPG     |  N/A |  N/A  | N/A | :heavy_check_mark: | :heavy_check_mark: |\n| SAC      |  N/A |  N/A  | N/A | :heavy_check_mark: | :heavy_check_mark: |\n| TD3      |  N/A |  N/A  | N/A | :heavy_check_mark: | :heavy_check_mark: |\n| TQC      |  N/A |  N/A  | N/A | :heavy_check_mark: | :heavy_check_mark: |\n| TRPO     | :heavy_check_mark:  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n\n\n### Box2D Environments\n\n|  RL Algo |  BipedalWalker-v3 | LunarLander-v2 | LunarLanderContinuous-v2 |  BipedalWalkerHardcore-v3 | CarRacing-v0 |\n|----------|--------------|----------------|------------|--------------|--------------------------|\n| ARS      |  | :heavy_check_mark: | | :heavy_check_mark: | |\n| A2C      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n| PPO      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n| DQN      | N/A | :heavy_check_mark: | N/A | N/A | N/A |\n| QR-DQN   | N/A | :heavy_check_mark: | N/A | N/A | N/A |\n| DDPG     | :heavy_check_mark: | N/A | :heavy_check_mark: | | |\n| SAC      | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: | |\n| TD3      | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: | |\n| TQC      | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: | |\n| TRPO     | | :heavy_check_mark: | :heavy_check_mark: | | |\n\n### PyBullet Environments\n\nSee https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs.\nSimilar to [MuJoCo Envs](https://gym.openai.com/envs/#mujoco) but with a ~free~ (MuJoCo 2.1.0+ is now free!) easy to install simulator: pybullet. We are using `BulletEnv-v0` version.\n\nNote: those environments are derived from [Roboschool](https://github.com/openai/roboschool) and are harder than the Mujoco version (see [Pybullet issue](https://github.com/bulletphysics/bullet3/issues/1718#issuecomment-393198883))\n\n|  RL Algo |  Walker2D | HalfCheetah | Ant | Reacher |  Hopper | Humanoid |\n|----------|-----------|-------------|-----|---------|---------|----------|\n| ARS      |  |  |  |  |  | |\n| A2C      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n| PPO      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n| DDPG     | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n| SAC      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n| TD3      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n| TQC      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n| TRPO     | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n\nPyBullet Envs (Continued)\n\n|  RL Algo |  Minitaur | MinitaurDuck | InvertedDoublePendulum | InvertedPendulumSwingup |\n|----------|-----------|-------------|-----|---------|\n| A2C      | | | | |\n| PPO      | | | | |\n| DDPG     | | | | |\n| SAC      | | | | |\n| TD3      | | | | |\n| TQC      | | | | |\n\n### MuJoCo Environments\n\n|  RL Algo |  Walker2d | HalfCheetah | Ant | Swimmer |  Hopper | Humanoid |\n|----------|-----------|-------------|-----|---------|---------|----------|\n| ARS      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:  | :heavy_check_mark: | :heavy_check_mark: |  |\n| A2C      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| PPO      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |\n| DDPG     |  |  |  |  |  | |\n| SAC      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| TD3      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| TQC      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| TRPO      | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |  |\n\n### Robotics Environments\n\nSee https://gym.openai.com/envs/#robotics and https://github.com/DLR-RM/rl-baselines3-zoo/pull/71\n\nMuJoCo version: 1.50.1.0\nGym version: 0.18.0\n\nWe used the v1 environments.\n\n|  RL Algo |  FetchReach | FetchPickAndPlace | FetchPush | FetchSlide |\n|----------|-------------|-------------------|-----------|------------|\n| HER+TQC  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n\n\n### Panda robot Environments\n\nSee https://github.com/qgallouedec/panda-gym/.\n\nSimilar to [MuJoCo Robotics Envs](https://gym.openai.com/envs/#robotics) but with a ~free~ easy to install simulator: pybullet.\n\nWe used the v1 environments.\n\n|  RL Algo |  PandaReach | PandaPickAndPlace | PandaPush | PandaSlide | PandaStack |\n|----------|-------------|-------------------|-----------|------------|------------|\n| HER+TQC | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n\n\n### MiniGrid Envs\n\nSee https://github.com/Farama-Foundation/Minigrid.\nA simple, lightweight and fast Gym environments implementation of the famous gridworld.\n\n| RL Algo | Empty-Random-5x5   | FourRooms          | DoorKey-5x5        | MultiRoom-N4-S5    | Fetch-5x5-N2       | GoToDoor-5x5       | PutNear-6x6-N2     | RedBlueDoors-6x6   | LockedRoom         | KeyCorridorS3R1    | Unlock             | ObstructedMaze-2Dlh |\n| ------- | ------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------- |\n| A2C     |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                     |\n| PPO     | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:  |\n| DQN     |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                     |\n| QR-DQN  |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                     |\n| TRPO    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                     |\n\nThere are 22 environment groups (variations for each) in total.\n\n\n## Colab Notebook: Try it Online!\n\nYou can train agents online using [Colab notebook](https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/rl-baselines-zoo.ipynb).\n\n### Passing arguments in an interactive session\n\nThe zoo is not meant to be executed from an interactive session (e.g: Jupyter Notebooks, IPython), however, it can be done by modifying `sys.argv` and adding the desired arguments.\n\n*Example*\n```python\nimport sys\nfrom rl_zoo3.train import train\n\nsys.argv = [\"python\", \"--algo\", \"ppo\", \"--env\", \"MountainCar-v0\"]\n\ntrain()\n```\n\n\n## Tests\n\nTo run tests, first install pytest, then:\n```\nmake pytest\n```\n\nSame for type checking with pytype:\n```\nmake type\n```\n\n\n## Citing the Project\n\nTo cite this repository in publications:\n\n```bibtex\n@misc{rl-zoo3,\n  author = {Raffin, Antonin},\n  title = {RL Baselines3 Zoo},\n  year = {2020},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/DLR-RM/rl-baselines3-zoo}},\n}\n```\n\n## Contributing\n\nIf you trained an agent that is not present in the RL Zoo, please submit a Pull Request (containing the hyperparameters and the score too).\n\n## Contributors\n\nWe would like to thank our contributors: [@iandanforth](https://github.com/iandanforth), [@tatsubori](https://github.com/tatsubori) [@Shade5](https://github.com/Shade5) [@mcres](https://github.com/mcres), [@ernestum](https://github.com/ernestum), [@qgallouedec](https://github.com/qgallouedec)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDLR-RM%2Frl-baselines3-zoo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDLR-RM%2Frl-baselines3-zoo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDLR-RM%2Frl-baselines3-zoo/lists"}